feat: collapse NSC semesters into enrollment periods via gaps-and-isl…#3415
feat: collapse NSC semesters into enrollment periods via gaps-and-isl…#3415anthonygwalters wants to merge 13 commits intomainfrom
Conversation
…ands Adds int_nsc__enrollments, which collapses semester-level NSC rows into continuous enrollment periods. Consecutive semesters are merged into one enrollment; a gap > 200 days (summer ≈ 90 days, missing semester ≈ 270 days) signals a new enrollment period. Updates rpt_gsheets__nsc_enrollment_new and rpt_gsheets__nsc_enrollment_updates to reference the intermediate model and match against Salesforce enrollments using ±180-day date proximity rather than calendar year equality, fixing the case where a Fall+Spring enrollment was incorrectly split into two records. https://claude.ai/code/session_01FcUXjXQDFFHKo4E6d3oG6S
6a80366 to
29871a3
Compare
…laude/salesforce-alumni-tracking-vtKNu
…laude/salesforce-alumni-tracking-vtKNu
…laude/salesforce-alumni-tracking-vtKNu
|
@claude please review this PR |
|
Claude finished @anthonygwalters's task in 2m 36s —— View job PR Review
Overall: The gaps-and-islands approach is correct and well-structured. The 200-day threshold is well-documented and the deduplication logic is sound. A few issues to address below.
|
| Severity | File | Issue |
|---|---|---|
| 🔴 Medium | rpt_gsheets__nsc_enrollment_updates.sql |
Potential duplicate e.id rows when two NSC periods match the same SF enrollment via ±180-day window |
| 🟡 Low | rpt_gsheets__nsc_enrollment_new.sql |
Double-quoted string literals violate sqlfluff single-quote rule |
| 🟢 Trivial | rpt_gsheets__nsc_enrollment_new.yml |
start_date__c description is stale |
The core gaps-and-islands logic in int_nsc__enrollments is correct and the refactor meaningfully simplifies the extract models. The duplicate-ID risk in the updates model is the main thing to resolve before merging.
|
- Add QUALIFY ROW_NUMBER() to rpt_gsheets__nsc_enrollment_updates to prevent duplicate e.id rows when multiple NSC enrollment periods fall within ±180 days of the same Salesforce enrollment start date; picks the period with enrollment_begin closest to start_date - Fix double-quoted string literals in rpt_gsheets__nsc_enrollment_new to use escaped single quotes per sqlfluff config - Correct stale start_date__c description in properties YAML https://claude.ai/code/session_01FcUXjXQDFFHKo4E6d3oG6S
…laude/salesforce-alumni-tracking-vtKNu
…ands
Adds int_nsc__enrollments, which collapses semester-level NSC rows into continuous enrollment periods. Consecutive semesters are merged into one enrollment; a gap > 200 days (summer ≈ 90 days, missing semester ≈ 270 days) signals a new enrollment period.
Updates rpt_gsheets__nsc_enrollment_new and rpt_gsheets__nsc_enrollment_updates to reference the intermediate model and match against Salesforce enrollments using ±180-day date proximity rather than calendar year equality, fixing the case where a Fall+Spring enrollment was incorrectly split into two records.
https://claude.ai/code/session_01FcUXjXQDFFHKo4E6d3oG6S
Pull Request
Summary & Motivation
Self-review
General
TEAMster Asana Project
Dagster
uv run dagster definitions validatefor any modified code location[code_location, integration, table_name]keypattern and use the appropriate IO manager (
pickle,avro, orfile)libraries/<integration>/assets.pyfactory function and a
config/assets-*.yamlin the code locationdbt
Include a corresponding
[model name].ymlproperties file for all models.These can be generated by running
dbt run-operation generate_model_yaml --args '{"model_names": ["MODEL NAME"]}'and saving the console output to a YAML file.
Include (or update) an
exposure for all
models that will be consumed by a dashboard, analysis, or application:
Dagster "kinds" Reference
SQL
Use the
union_dataset_join_clause()macro for queries that employ modelsthat use regional datasets
Do not use
group bywithout any aggregations when you mean to usedistinctAll
distinctusage must be accompanied by a comment explaining itsnecessity
Do not use
order byforselectstatements. That should be done in thereporting layer.
If you are adding a new external source, before building, run:
Troubleshooting