feat: collapse NSC semesters into enrollment periods via gaps-and-isl… by anthonygwalters · Pull Request #3415 · TEAMSchools/teamster

anthonygwalters · 2026-03-11T20:48:44Z

…ands

Adds int_nsc__enrollments, which collapses semester-level NSC rows into continuous enrollment periods. Consecutive semesters are merged into one enrollment; a gap > 200 days (summer ≈ 90 days, missing semester ≈ 270 days) signals a new enrollment period.

Updates rpt_gsheets__nsc_enrollment_new and rpt_gsheets__nsc_enrollment_updates to reference the intermediate model and match against Salesforce enrollments using ±180-day date proximity rather than calendar year equality, fixing the case where a Fall+Spring enrollment was incorrectly split into two records.

https://claude.ai/code/session_01FcUXjXQDFFHKo4E6d3oG6S

Pull Request

Summary & Motivation

Self-review

General

If this is a same-day request, please flag that in the #data-team Slack
Update due date and assignee on the
TEAMster Asana Project
Run trunk fmt on all modified files

Dagster

Run uv run dagster definitions validate for any modified code location
New assets follow the [code_location, integration, table_name] key
pattern and use the appropriate IO manager (pickle, avro, or file)
New integrations include both a libraries/<integration>/assets.py
factory function and a config/assets-*.yaml in the code location

dbt

Include a corresponding [model name].yml properties file for all models.
These can be generated by running
dbt run-operation generate_model_yaml --args '{"model_names": ["MODEL NAME"]}'
and saving the console output to a YAML file.

models:
  - name: [model name]
    config:
      contract:
        enforced: false  # required while creating a contracted model, delete before merging
    columns:  # required for contracted models
      - name: ...
        data_type: ...
        data_tests:  # column tests, optional
          - ...
    data_tests:  # model tests, optional
      - ...
      - dbt_utils.unique_combination_of_columns:  # recommended for intermediate models
          arguments:
            combination_of_columns:
              - ...
          config:
            store_failures: true

Include (or update) an
exposure for all
models that will be consumed by a dashboard, analysis, or application:

exposures:
  - name: [exposure name, snake_case]
    label: [exposure name, Title Case]
    type: dashboard | notebook | analysis | ml | application
    owner:
      name: Data Team
    depends_on:
      - ref("[model name]")
      - ...
    url: ...  # optional
    config:
      meta:
        dagster:
          kinds:
            - tableau | googlesheets | ...
            - ...
          asset:
            metadata:
              id: [lsid]  # required for Tableau Server workbooks
              cron_schedule:  # required for Dagster automation
                - * * * * *
                - ...

Dagster "kinds" Reference

SQL

Use the union_dataset_join_clause() macro for queries that employ models
that use regional datasets
Do not use group by without any aggregations when you mean to use
distinct
All distinct usage must be accompanied by a comment explaining its
necessity
Do not use order by for select statements. That should be done in the
reporting layer.

If you are adding a new external source, before building, run:

dbt run-operation stage_external_sources --vars "{'ext_full_refresh': 'true'}" --args select: [model name(s)]

Troubleshooting

…ands Adds int_nsc__enrollments, which collapses semester-level NSC rows into continuous enrollment periods. Consecutive semesters are merged into one enrollment; a gap > 200 days (summer ≈ 90 days, missing semester ≈ 270 days) signals a new enrollment period. Updates rpt_gsheets__nsc_enrollment_new and rpt_gsheets__nsc_enrollment_updates to reference the intermediate model and match against Salesforce enrollments using ±180-day date proximity rather than calendar year equality, fixing the case where a Fall+Spring enrollment was incorrectly split into two records. https://claude.ai/code/session_01FcUXjXQDFFHKo4E6d3oG6S

…laude/salesforce-alumni-tracking-vtKNu

anthonygwalters · 2026-03-13T19:17:45Z

@claude please review this PR

github-actions · 2026-03-13T19:18:00Z

Claude finished @anthonygwalters's task in 2m 36s —— View job

PR Review

Overall: The gaps-and-islands approach is correct and well-structured. The 200-day threshold is well-documented and the deduplication logic is sound. A few issues to address below.

`int_nsc__enrollments.sql`

Logic is solid. The three-step gaps-and-islands pattern is clean:

nsc_with_account — deduplicates overlapping NSC rows for the same (contact_id, account_id, enrollment_begin) by keeping the most recent search_date. Good.
semesters_with_lag — computes the gap to the previous semester's end. The lag() is correctly ordered by enrollment_begin within the right partition.
enrollment_groups — uses a running sum() to assign group IDs, which is the standard gaps-and-islands idiom.

The final array_agg(... order by enrollment_begin desc limit 1)[offset(0)] to pick the most recent semester's fields is a valid BigQuery pattern, and is deterministic here because enrollment_begin is unique within each (contact_id, account_id) pair after the deduplication step.

No issues.

`rpt_gsheets__nsc_enrollment_updates.sql` — potential duplicate `id` bug

The uniqueness test on this model asserts id is unique. However, it's possible for a single Salesforce enrollment (e.id) to match two NSC enrollment periods via the ±180-day window, producing duplicate rows:

Example scenario:

NSC enrollment period 1: enrollment_begin = 2020-01-01, enrollment_end = 2020-05-15
NSC enrollment period 2: enrollment_begin = 2020-12-01 (gap = ~200 days → separate group)
Salesforce enrollment: start_date = 2020-07-01

Both NSC enrollment_begin values fall within ±180 days of 2020-07-01, so both join to the same e.id. The uniqueness test would fail at run time.

This is a theoretical edge case but the ±180-day window is wide enough to make it realistic. A safer join might add a ROW_NUMBER() in nsc_enrollment_derived to pick the closest NSC period per Salesforce enrollment, or narrow the window. Worth an explicit decision on the right behavior here.

Fix this →

`rpt_gsheets__nsc_enrollment_new.sql` — double-quoted string literals

Lines 23–24 use double quotes for string literals:

then "Bachelor's (4-year)"
...
then "Associate's (2 year)"

The .sqlfluff config enforces preferred_quoted_literal_style = single_quotes. In BigQuery, double quotes are for identifiers, not literals. These should be escaped single-quote strings:

then 'Bachelor''s (4-year)'
...
then 'Associate''s (2 year)'

Fix this →

`rpt_gsheets__nsc_enrollment_new.yml` — stale column description

The start_date__c description still says:

"Date the enrollment began (first day of the academic year)."

With the new intermediate model, enrollment_begin is the first day of the earliest semester in the enrollment period — not necessarily the first day of an academic year. The description should be updated to match (e.g., "First day of the earliest NSC semester in this enrollment period.").

`int_nsc__enrollments.yml`

Well-formed. The dbt_utils.unique_combination_of_columns test on (contact_id, account_id, enrollment_group) is correct and required for intermediate models per project conventions. All columns are documented with accurate descriptions.

Minor: enrollment_group is described as "monotonically increasing" but technically it increments per group boundary detection — not guaranteed to be strictly sequential across all rows. The description is accurate enough, no change needed.

Summary

Severity	File	Issue
🔴 Medium	`rpt_gsheets__nsc_enrollment_updates.sql`	Potential duplicate `e.id` rows when two NSC periods match the same SF enrollment via ±180-day window
🟡 Low	`rpt_gsheets__nsc_enrollment_new.sql`	Double-quoted string literals violate sqlfluff single-quote rule
🟢 Trivial	`rpt_gsheets__nsc_enrollment_new.yml`	`start_date__c` description is stale

The core gaps-and-islands logic in int_nsc__enrollments is correct and the refactor meaningfully simplifies the extract models. The duplicate-ID risk in the updates model is the main thing to resolve before merging.
|

- Add QUALIFY ROW_NUMBER() to rpt_gsheets__nsc_enrollment_updates to prevent duplicate e.id rows when multiple NSC enrollment periods fall within ±180 days of the same Salesforce enrollment start date; picks the period with enrollment_begin closest to start_date - Fix double-quoted string literals in rpt_gsheets__nsc_enrollment_new to use escaped single quotes per sqlfluff config - Correct stale start_date__c description in properties YAML https://claude.ai/code/session_01FcUXjXQDFFHKo4E6d3oG6S

…laude/salesforce-alumni-tracking-vtKNu

anthonygwalters requested a review from a team as a code owner March 11, 2026 20:48

anthonygwalters requested a review from a team March 11, 2026 20:48

anthonygwalters force-pushed the claude/salesforce-alumni-tracking-vtKNu branch from 6a80366 to 29871a3 Compare March 11, 2026 20:54

Anthony Walters added 10 commits March 11, 2026 21:00

trunk etc

8b09b26

n

0f5e00a

trunk

f8359bb

trunk

b4a8ba1

throwing stuff against the wall

683f551

revert

b1761c5

Merge branch 'main' of https://github.com/TEAMSchools/teamster into c…

61b2483

…laude/salesforce-alumni-tracking-vtKNu

Merge branch 'main' of https://github.com/TEAMSchools/teamster into c…

b385704

…laude/salesforce-alumni-tracking-vtKNu

yml

460447e

Merge branch 'main' of https://github.com/TEAMSchools/teamster into c…

653ab42

…laude/salesforce-alumni-tracking-vtKNu

claude and others added 2 commits March 13, 2026 19:23

Merge branch 'main' of https://github.com/TEAMSchools/teamster into c…

5b0d928

…laude/salesforce-alumni-tracking-vtKNu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: collapse NSC semesters into enrollment periods via gaps-and-isl…#3415

feat: collapse NSC semesters into enrollment periods via gaps-and-isl…#3415
anthonygwalters wants to merge 13 commits intomainfrom
claude/salesforce-alumni-tracking-vtKNu

anthonygwalters commented Mar 11, 2026

Uh oh!

anthonygwalters commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anthonygwalters commented Mar 11, 2026

Pull Request

Summary & Motivation

Self-review

General

Dagster

dbt

SQL

Troubleshooting

Uh oh!

anthonygwalters commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

int_nsc__enrollments.sql

rpt_gsheets__nsc_enrollment_updates.sql — potential duplicate id bug

rpt_gsheets__nsc_enrollment_new.sql — double-quoted string literals

rpt_gsheets__nsc_enrollment_new.yml — stale column description

int_nsc__enrollments.yml

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 13, 2026 •

edited

Loading

`int_nsc__enrollments.sql`

`rpt_gsheets__nsc_enrollment_updates.sql` — potential duplicate `id` bug

`rpt_gsheets__nsc_enrollment_new.sql` — double-quoted string literals

`rpt_gsheets__nsc_enrollment_new.yml` — stale column description

`int_nsc__enrollments.yml`