fix: join to sam_account_name for google_email changers#3596
fix: join to sam_account_name for google_email changers#3596kevinverhoff merged 5 commits intomainfrom
Conversation
|
Claude finished @cristinabaldor's task in 4m 47s —— View job Code Review: fix join to
|
… addresses Replace two-hop LDAP → staff_roster join in int_surveys__survey_responses with a date-bounded join to int_people__staff_roster_history on sam_account_name for both the Google Forms and Alchemer union branches. Resolves NULL respondent identity columns for staff whose email addresses were migrated from @apps.teamschools.org, and recovers 174 employees' worth of historical Alchemer responses (2019–2022) that were silently dropped by a case-sensitive join. Downstream rpt_tableau__survey_responses and rpt_tableau__school_community_diagnostic updated to join on respondent_employee_number, removing the repeated email-based join pattern. rpt_tableau__operations_pm fixed independently with lower() + sam_account_name extraction since it bypasses the int model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude, made some changes, review again |
|
Claude finished @cristinabaldor's task in 1m 55s —— View job Code Review (updated): fix join to
|
| File | Line | Pattern |
|---|---|---|
int_surveys__survey_responses.sql |
42 | fr.respondent_email = ldap.google_email |
rpt_tableau__school_community_diagnostic.sql |
79 | sr.respondent_email = srh.google_email |
rpt_tableau__operations_pm.sql |
109 | responses_pivoted.respondent_email = full_roster.google_email |
What looks good
regexp_extractreturns NULL on NULL input — NULL emails safely fall out of the INNER JOIN, preserving previous behaviour.- The temporal guard (
date_submitted BETWEEN effective_date_start_timestamp AND effective_date_end_timestamp) andprimary_indicatorfilter are intact. sam_account_nameis network-unique, so join cardinality is equivalent to the old email join.- SQL formatting follows project conventions.
Summary
The fix is correct and ready to merge as-is. Adding lower() around the regexp_extract call would be a small defensive improvement but is not required. If the "changes" you made weren't reflected here, check that the push went through (git push) — the branch tip is still 17686bef0.
| Branch
|
@claude pushed more changes, take a look |
|
Claude finished @cristinabaldor's task in 2m 35s —— View job Code Review (updated): all-files fix for google_email changers
This is a substantial improvement over the previous commit. All three files flagged in the earlier review are now fixed, and the approach is architecturally cleaner. What changed
Google Forms branch (lines 39–50): The two-hop Alchemer branch (lines 100–105): Previously all identity columns (
All previous suggestions addressed
One minor note
SummaryAll items from the previous review are resolved. The architecture is cleaner — identity resolution happens once in the intermediate model and flows cleanly into downstream extracts. Ready to merge. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
src/dbt/kipptaf/models/surveys/intermediate/int_surveys__survey_responses.sql
Show resolved
Hide resolved
11 staff have google_email local parts that don't match sam_account_name due to accents, hyphens, dots, or name changes. Add OR respondent_email = google_email fallback to both union branches in int_surveys__survey_responses to recover their 2,725 survey responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pull Request
Summary & Motivation
Edit the join to capture teammates whose google_email addresses have changed from @apps.teamschools.org to @kippmiami.org match on sam_account_names instead, as they will be the same before and after.
Self-review
General
TEAMster Asana Project
Dagster
uv run dagster definitions validatefor any modified code location[code_location, integration, table_name]keypattern and use the appropriate IO manager (
pickle,avro, orfile)libraries/<integration>/assets.pyfactory function and a
config/assets-*.yamlin the code locationdbt
Include a corresponding
[model name].ymlproperties file for all models.These can be generated by running
dbt run-operation generate_model_yaml --args '{"model_names": ["MODEL NAME"]}'and saving the console output to a YAML file.
Include (or update) an
exposure for all
models that will be consumed by a dashboard, analysis, or application:
Dagster "kinds" Reference
SQL
Use the
union_dataset_join_clause()macro for queries that employ modelsthat use regional datasets
Do not use
group bywithout any aggregations when you mean to usedistinctAll
distinctusage must be accompanied by a comment explaining itsnecessity
Do not use
order byforselectstatements. That should be done in thereporting layer.
If you are adding a new external source, before building, run:
Troubleshooting