-
Notifications
You must be signed in to change notification settings - Fork 3
handle new zoning districts in GFT #2119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4f6a3f0 to
618de90
Compare
| CASE | ||
| WHEN zd IS null THEN 'NONE' | ||
| WHEN zd LIKE 'M%' OR zd LIKE 'C%' THEN LEFT(zd, 1) | ||
| -- match the first group of characters that end with a number | ||
| WHEN zd LIKE 'R%' THEN (REGEXP_MATCH(zd, '^(\w\d+)'))[1] | ||
| ELSE zd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this clause is why all relevant new lots end up being Commercial or Manufacturing. they all have zonedist1 values like M1-8A/R12, so the R11/R12 parts never contribute to the lot's GFT zoning category
| ELSE 'high_res' | ||
| WHEN has_high_res THEN 'high_res' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to not suppress actual NULLs or any other value we aren't handling
| GROUP BY bbl, zoning_district_type | ||
| ORDER BY bbl, zoning_district_type | ||
| ORDER BY bbl, zd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's really nice being able to see the results of parsing zonedist values to zoning_district_type values. these are few rows from this model after grouping by zd, zoning_district_type in DBeaver:
| zd | zoning_district_type | count |
|---|---|---|
| C8-3 | C | 321 |
| C8-4 | C | 111 |
| M1-1 | M | 12521 |
| M1-1/R5 | M | 65 |
| M1-1/R6A | M | 4 |
84218c1 to
dc00003
Compare
Arguments to generic tests should be nested under the `arguments` property.
This maps the new districts to Other but the relevant lots have incorrect values because we ignore district values after forward slashes.
dc00003 to
619d7fe
Compare
| # dict format doesn't work because it tries to do cast(null as USER-DEFINED) as "geom" | ||
| # rows: | ||
| # - {bbl: 123, zonedist1: M1, zonedist2: NULL, zonedist3: NULL, zonedist4: NULL} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sql format below doesn't feel great but I didn't wanna get hung up on getting the dict or csv format to work for this first pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the thoroughness of this. When does it get run? During a build? During PR tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during a build but since they're unit tests it'd be great to run in PR tests. but there's a weird requirement that the upstream models exist (values don't matter) so it might be a little convoluted
I'm guessing it's because the unit test gets columns types form the upstream models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh from the docs you can do dbt run --select "parent_model_name" --empty to "build an empty version of the models to save warehouse spend"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a nit on the sql - using VALUES will save a bunch of lines - don't need to do the unions, can just list the tuples
VALUES
('simple_m', 'M1', NULL, NULL, NULL),
('multiple_districts', 'M1', 'M2', NULL, NULL),
...
;Though then you lose labeling the individual fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely be in favor of getting them working during CI and skipping during a build
https://docs.getdbt.com/docs/build/unit-tests#when-to-run-unit-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a nit on the sql - using
VALUESwill save a bunch of lines - don't need to do the unions, can just list the tuplesThough then you lose labeling the individual fields
losing the field names would be a shame but this seems worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep them in the query at least like this
SELECT * FROM (
VALUES
(1, 'one'),
(2, 'two'),
(3, 'three')
) AS t (num, letter);| -- to preserve lots with no zoning since STRING_TO_ARRAY returns an empty (zero-element) array | ||
| -- when the result of UNNEST(ARRAY[ ... ] is a string of zero length. this UNION ALL approach | ||
| -- is simpler than using joins or complicated nesting of array functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost got away with just a one-line change to handle splitting and unesting forward slash districts. but I like the clarity of using UNION ALL to just combine two very different types of lots, instead of twisting array logic for the "all null" edge case
fvankrieken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question to resolve but looks great
e40dc35 to
bd6be1f
Compare
looks like maybe there's a simple way to only select models that are parents of models with unit tests? if not, let's punt on running these in CI |
I don't think upstream models need to be defined - checked out this branch and ran this aimed at my schema in db-cscl:
|
|
Oh weird though that the docs are so explicit that the upstream tables DO need to exist Maybe specifically because you've used the input with sql format, upstream models aren't needed? |
bd6be1f to
7427a14
Compare
worked! just dropped the |

resolves #2032
all builds on this branch
see linked issue for data details and motivations for the logic changes here. worth looking at the commit messages for clarity.
dbt unit test docs: Unit tests, unit tests properties
new tests failing before relevant fix
top 5 most frequent new districts in the outputs
all lots in
green_fast_track_bblswherezoning_district like '%R11%' or zoning_district like '%R12%'have azoning_categoryofOther