refactor(dbt): state assessments rebuild#3553
Draft
GabyRangelB wants to merge 10 commits intomainfrom
Draft
Conversation
…c upstream - Add proficiency banding, subject mapping, and metadata columns to int_pearson__all_assessments, int_fldoe__all_assessments, and int_iready__diagnostic_results - Add new int_pearson__student_list_report intermediate model - Add stg_google_sheets__state_test_comparison_demographics; disable old stg_google_sheets__state_test_comparison - Add standardized_discipline to base_powerschool__course_enrollments - Add new int_extracts__student_enrollments_courses model - Refactor int_extracts__student_enrollments_subjects to use upstream columns - Simplify rpt_tableau__state_assessments_dashboard and _comps by replacing inline CASE blocks with upstream column references - Update int_tableau__state_assessments_demographic_comps lineage to use int_pearson__student_list_report Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…g CTE - Switch state_comps CTE to stg_google_sheets__state_test_comparison_demographics - Move results_type, admin, season, subject, test_code upstream to int models - Rename test_code to aligned_test_code in int_pearson__student_list_report - Add admin and subject aliases to int_fldoe__all_assessments - Swap stg_pearson__student_list_report ref to int_pearson__student_list_report Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ch flags Replace GROUP BY CUBE (1,024 combos) with explicit GROUPING SETS (12 combos) for ~85x reduction in computed groups. Consolidate the demographic comps intermediate chain from 2 models + macro into 1 model. Push demographic labels, comparison_entity, and test_code-derived columns upstream into the intermediate to simplify the reporting layer. Fix self-join bug that made region_matched/ region_outperformed flags dead columns. Add uniqueness tests to stg, int, and rpt models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dim_state_assessment_benchmarks.yml: keep expanded surrogate key description, use bare `- unique` test (inherits severity/store_failures from dbt_project.yml project-level defaults instead of repeating per-test) - base_powerschool__course_enrollments: keep both courses_credittype normalization (from main) and standardized_discipline (from this branch) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add results_type, district_state, admin, subject, illuminate_subject, and fast_aggregated_proficiency as computed columns in the kipptaf int_fldoe__all_assessments model instead of depending on the kippmiami upstream to provide them. This allows rpt_tableau__state_assessments_dashboard to build without waiting for a kippmiami deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Refactor int_tableau__state_assessments_demographic_comps: replace self-contained assessment_scores CTE with three union branches (NJ official, NJ prelim, FL official); add test_code_metadata CTE from stg_google_sheets__state_test_comparison_demographics to replace inline school_level/grade_range_band/discipline CASE statements; fix aligned column references and unqualified ON clause columns - Add YML for int_fldoe__all_assessments (kipptaf): uniqueness test, model description, and full column definitions including new metadata columns (results_type, district_state, aligned_level_test_code, illuminate_subject, fast_aggregated_proficiency, is_proficient_int, admin, subject) - Add YML for int_pearson__all_assessments: uniqueness test, model description, full column definitions; fix stale columns (englishlearnerel, studentwithdisabilities removed; aligned_* demographic columns, is_proficient_int, season, admin added) - Add YML for int_pearson__student_list_report: uniqueness test on (source_relation, academic_year, administration, state_id, aligned_test_code), model description, full column definitions; fix missing trailing comma in SQL - Update stg_google_sheets__state_test_comparison_demographics YML: move data_tests before columns, remove redundant store_failures, add missing aligned_level_test_code column - Add aligned_gender to int_extracts__student_enrollments YML - Remove design spec (work complete) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add prelim score gating to int_tableau__state_assessments_demographic_comps: new prelim_assessments and valid_prelim_assessments CTEs automatically exclude NJ student list data for any (academic_year, assessment_name) already present in int_pearson__all_assessments Spring, eliminating the need to manually comment/uncomment the prelim branch each cycle - Qualify all column references in final SELECT with scores alias (s.) to satisfy RF02 after test_code_metadata join was introduced - Replace rolling 7-year window with fixed 2018 floor across all three score branches — 2018 is the earliest year with available comps data - Fix rpt_tableau__state_assessments_dashboard: inline-alias administration_window and assessment_subject from int_fldoe__all_assessments instead of expecting pre-computed admin/subject columns Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Strip stg_google_sheets__state_test_comparison_demographics to SELECT * so all columns (school_level, grade_range_band, discipline, aligned_level_test_code, etc.) come directly from the _v2 sheet range - Update sources-external.yml with _v2 sheet range Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…hics rename - Rename test_name → assessment_name, test_code → aligned_test_code in dim_state_assessment_benchmarks (SQL + contract YML) and state_comps CTE in rpt_tableau__state_assessments_dashboard - Rename comparison_demographic_group/subgroup_aligned → aligned_comparison_demographic_group/subgroup across all three downstream models - Fix join conditions in rpt_tableau__state_assessments_dashboard against state_comps CTE (all three score sections) - Add school_level to uniqueness test on stg_google_sheets__state_test_comparison_demographics — differentiates MS_HS source rows from synthetic HS ALG01 aggregate rows Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…raphics - Add custom_rows CTE to union source data with synthetic HS ALG01 totals (aggregated from HS_09 and HS_10 rows, which official comp sources do not provide as a combined HS total) - Rename derived columns: comparison_demographic_group/subgroup_aligned → aligned_comparison_demographic_group/subgroup - Update aligned_level_test_code derivation to use aligned_test_code instead of test_code Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Summary & Motivation
Rebuild the state assessments reporting pipeline — pushing transformation logic upstream, optimizing the demographic comparisons model, and enabling demographic-level benchmarking.
Work completed so far:
int_pearson__all_assessments,int_fldoe__all_assessments,int_iready__diagnostic_results)GROUP BY CUBEwith explicitGROUPING SETS(~85x reduction in computed groups) and consolidate the demographic comps intermediate chain into a single modelrpt_tableau__state_assessments_dashboard_compsthat made region match/outperform flags dead columnsdim_state_assessment_benchmarksto use demographics source with expanded grain (includes demographic group/subgroup, season, school_level, etc.)_v2sheet range for state test comparison demographicsint_tableau__state_assessments_demographic_compsinto three union branches (NJ official, NJ prelim, FL official); fix broken column references; replace inlineschool_level/grade_range_band/disciplineCASE blocks with atest_code_metadataCTE sourced fromstg_google_sheets__state_test_comparison_demographics; replace rolling 7-year window with fixed 2018 floor across all branchesvalid_prelim_assessmentsCTE automatically excludes NJ student list data for any(academic_year, assessment_name)already present inint_pearson__all_assessmentsSpring — no more manual comment/uncomment each cycleint_pearson__student_list_reportintermediate model (new) with SQL fix and full YMLint_pearson__all_assessments,int_fldoe__all_assessments(kipptaf),stg_google_sheets__state_test_comparison_demographics,int_extracts__student_enrollmentsaligned_gendertoint_extracts__student_enrollmentsStill open:
int_pearson__all_assessmentscontract enforcement is temporarily disabled — needs re-enabling or explicit documentation before mergeint_pearson__student_list_report— blocked on adding a crosswalk-compatible field to the student list CSV source (in progress, pending source file update)AI Assistance
Claude Code co-authored this PR across multiple interactive sessions. Human-directed: architectural decisions, model design, column naming. AI-assisted: implementation, impact analysis, optimization.
Self-review
dbt (skip if no dbt changes)
[model_name].ymlproperties file for all new modelsrpt_tableau__state_assessments_dashboard_compsmatch flag columns now produce actual values instead of all-false. No columns added/removed/renamed.stage_external_sourcesbefore buildingCI checks