Skip to content

refactor(dbt): state assessments rebuild#3553

Draft
GabyRangelB wants to merge 10 commits intomainfrom
claude/feat/stat-rebuild
Draft

refactor(dbt): state assessments rebuild#3553
GabyRangelB wants to merge 10 commits intomainfrom
claude/feat/stat-rebuild

Conversation

@GabyRangelB
Copy link
Copy Markdown
Contributor

@GabyRangelB GabyRangelB commented Mar 30, 2026

Pull Request

Summary & Motivation

"When merged, this pull request will..."

Rebuild the state assessments reporting pipeline — pushing transformation logic upstream, optimizing the demographic comparisons model, and enabling demographic-level benchmarking.

Work completed so far:

  • Push proficiency banding, metadata columns, and subject mapping from reporting into assessment intermediate models (int_pearson__all_assessments, int_fldoe__all_assessments, int_iready__diagnostic_results)
  • Replace GROUP BY CUBE with explicit GROUPING SETS (~85x reduction in computed groups) and consolidate the demographic comps intermediate chain into a single model
  • Fix self-join bug in rpt_tableau__state_assessments_dashboard_comps that made region match/outperform flags dead columns
  • Update dim_state_assessment_benchmarks to use demographics source with expanded grain (includes demographic group/subgroup, season, school_level, etc.)
  • Stage new _v2 sheet range for state test comparison demographics
  • Refactor int_tableau__state_assessments_demographic_comps into three union branches (NJ official, NJ prelim, FL official); fix broken column references; replace inline school_level/grade_range_band/discipline CASE blocks with a test_code_metadata CTE sourced from stg_google_sheets__state_test_comparison_demographics; replace rolling 7-year window with fixed 2018 floor across all branches
  • Add prelim score gating: valid_prelim_assessments CTE automatically excludes NJ student list data for any (academic_year, assessment_name) already present in int_pearson__all_assessments Spring — no more manual comment/uncomment each cycle
  • Add int_pearson__student_list_report intermediate model (new) with SQL fix and full YML
  • Add full YML coverage: int_pearson__all_assessments, int_fldoe__all_assessments (kipptaf), stg_google_sheets__state_test_comparison_demographics, int_extracts__student_enrollments
  • Add aligned_gender to int_extracts__student_enrollments

Still open:

  • int_pearson__all_assessments contract enforcement is temporarily disabled — needs re-enabling or explicit documentation before merge
  • Crosswalk fix for int_pearson__student_list_report — blocked on adding a crosswalk-compatible field to the student list CSV source (in progress, pending source file update)
  • Testing and validation of refactored models against production data

AI Assistance

Claude Code co-authored this PR across multiple interactive sessions. Human-directed: architectural decisions, model design, column naming. AI-assisted: implementation, impact analysis, optimization.

Self-review

dbt (skip if no dbt changes)

  • Include a [model_name].yml properties file for all new models
  • Include (or update) an exposure — no new exposures needed
  • Breaking change? rpt_tableau__state_assessments_dashboard_comps match flag columns now produce actual values instead of all-false. No columns added/removed/renamed.
  • If adding a new external source, run stage_external_sources before building

CI checks

  • Trunk — passes
  • dbt Cloud — passes
  • Dagster Cloud — passes or not triggered

GabyRangelB and others added 10 commits March 26, 2026 19:05
…c upstream

- Add proficiency banding, subject mapping, and metadata columns to
  int_pearson__all_assessments, int_fldoe__all_assessments, and
  int_iready__diagnostic_results
- Add new int_pearson__student_list_report intermediate model
- Add stg_google_sheets__state_test_comparison_demographics; disable old
  stg_google_sheets__state_test_comparison
- Add standardized_discipline to base_powerschool__course_enrollments
- Add new int_extracts__student_enrollments_courses model
- Refactor int_extracts__student_enrollments_subjects to use upstream columns
- Simplify rpt_tableau__state_assessments_dashboard and _comps by replacing
  inline CASE blocks with upstream column references
- Update int_tableau__state_assessments_demographic_comps lineage to use
  int_pearson__student_list_report

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…g CTE

- Switch state_comps CTE to stg_google_sheets__state_test_comparison_demographics
- Move results_type, admin, season, subject, test_code upstream to int models
- Rename test_code to aligned_test_code in int_pearson__student_list_report
- Add admin and subject aliases to int_fldoe__all_assessments
- Swap stg_pearson__student_list_report ref to int_pearson__student_list_report

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ch flags

Replace GROUP BY CUBE (1,024 combos) with explicit GROUPING SETS (12 combos)
for ~85x reduction in computed groups. Consolidate the demographic comps
intermediate chain from 2 models + macro into 1 model. Push demographic labels,
comparison_entity, and test_code-derived columns upstream into the intermediate
to simplify the reporting layer. Fix self-join bug that made region_matched/
region_outperformed flags dead columns. Add uniqueness tests to stg, int, and
rpt models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dim_state_assessment_benchmarks.yml: keep expanded surrogate key description,
  use bare `- unique` test (inherits severity/store_failures from dbt_project.yml
  project-level defaults instead of repeating per-test)
- base_powerschool__course_enrollments: keep both courses_credittype normalization
  (from main) and standardized_discipline (from this branch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add results_type, district_state, admin, subject, illuminate_subject, and
fast_aggregated_proficiency as computed columns in the kipptaf
int_fldoe__all_assessments model instead of depending on the kippmiami
upstream to provide them. This allows rpt_tableau__state_assessments_dashboard
to build without waiting for a kippmiami deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Refactor int_tableau__state_assessments_demographic_comps: replace
  self-contained assessment_scores CTE with three union branches
  (NJ official, NJ prelim, FL official); add test_code_metadata CTE
  from stg_google_sheets__state_test_comparison_demographics to replace
  inline school_level/grade_range_band/discipline CASE statements;
  fix aligned column references and unqualified ON clause columns

- Add YML for int_fldoe__all_assessments (kipptaf): uniqueness test,
  model description, and full column definitions including new metadata
  columns (results_type, district_state, aligned_level_test_code,
  illuminate_subject, fast_aggregated_proficiency, is_proficient_int,
  admin, subject)

- Add YML for int_pearson__all_assessments: uniqueness test, model
  description, full column definitions; fix stale columns
  (englishlearnerel, studentwithdisabilities removed; aligned_*
  demographic columns, is_proficient_int, season, admin added)

- Add YML for int_pearson__student_list_report: uniqueness test on
  (source_relation, academic_year, administration, state_id,
  aligned_test_code), model description, full column definitions;
  fix missing trailing comma in SQL

- Update stg_google_sheets__state_test_comparison_demographics YML:
  move data_tests before columns, remove redundant store_failures,
  add missing aligned_level_test_code column

- Add aligned_gender to int_extracts__student_enrollments YML

- Remove design spec (work complete)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add prelim score gating to int_tableau__state_assessments_demographic_comps:
  new prelim_assessments and valid_prelim_assessments CTEs automatically
  exclude NJ student list data for any (academic_year, assessment_name)
  already present in int_pearson__all_assessments Spring, eliminating
  the need to manually comment/uncomment the prelim branch each cycle
- Qualify all column references in final SELECT with scores alias (s.)
  to satisfy RF02 after test_code_metadata join was introduced
- Replace rolling 7-year window with fixed 2018 floor across all three
  score branches — 2018 is the earliest year with available comps data
- Fix rpt_tableau__state_assessments_dashboard: inline-alias
  administration_window and assessment_subject from int_fldoe__all_assessments
  instead of expecting pre-computed admin/subject columns

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Strip stg_google_sheets__state_test_comparison_demographics to SELECT *
  so all columns (school_level, grade_range_band, discipline,
  aligned_level_test_code, etc.) come directly from the _v2 sheet range
- Update sources-external.yml with _v2 sheet range

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…hics rename

- Rename test_name → assessment_name, test_code → aligned_test_code in
  dim_state_assessment_benchmarks (SQL + contract YML) and state_comps
  CTE in rpt_tableau__state_assessments_dashboard
- Rename comparison_demographic_group/subgroup_aligned →
  aligned_comparison_demographic_group/subgroup across all three downstream
  models
- Fix join conditions in rpt_tableau__state_assessments_dashboard against
  state_comps CTE (all three score sections)
- Add school_level to uniqueness test on
  stg_google_sheets__state_test_comparison_demographics — differentiates
  MS_HS source rows from synthetic HS ALG01 aggregate rows

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…raphics

- Add custom_rows CTE to union source data with synthetic HS ALG01
  totals (aggregated from HS_09 and HS_10 rows, which official comp
  sources do not provide as a combined HS total)
- Rename derived columns: comparison_demographic_group/subgroup_aligned
  → aligned_comparison_demographic_group/subgroup
- Update aligned_level_test_code derivation to use aligned_test_code
  instead of test_code

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant