Skip to content

Conversation

@damonmcc
Copy link
Member

@damonmcc damonmcc commented Dec 17, 2025

resolves #2032

all builds on this branch

see linked issue for data details and motivations for the logic changes here. worth looking at the commit messages for clarity.

dbt unit test docs: Unit tests, unit tests properties

new tests failing before relevant fix

Screenshot 2025-12-29 at 10 21 59 PM Screenshot 2025-12-29 at 10 24 49 PM

top 5 most frequent new districts in the outputs

all lots in green_fast_track_bbls where zoning_district like '%R11%' or zoning_district like '%R12%' have a zoning_category of Other

zoning_district zoning_category count
M1-9A/R12 Other 281
M1-8A/R11 Other 189
M1-8A/R12 Other 93
C5-2, M1-8A/R12 Other 9
M1-8A/R11, C6-4X Other 7

@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.68%. Comparing base (ab686f5) to head (7427a14).
⚠️ Report is 10 commits behind head on main.

Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 26 to 45
CASE
WHEN zd IS null THEN 'NONE'
WHEN zd LIKE 'M%' OR zd LIKE 'C%' THEN LEFT(zd, 1)
-- match the first group of characters that end with a number
WHEN zd LIKE 'R%' THEN (REGEXP_MATCH(zd, '^(\w\d+)'))[1]
ELSE zd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this clause is why all relevant new lots end up being Commercial or Manufacturing. they all have zonedist1 values like M1-8A/R12, so the R11/R12 parts never contribute to the lot's GFT zoning category

Comment on lines -56 to +68
ELSE 'high_res'
WHEN has_high_res THEN 'high_res'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to not suppress actual NULLs or any other value we aren't handling

Comment on lines -36 to +51
GROUP BY bbl, zoning_district_type
ORDER BY bbl, zoning_district_type
ORDER BY bbl, zd
Copy link
Member Author

@damonmcc damonmcc Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's really nice being able to see the results of parsing zonedist values to zoning_district_type values. these are few rows from this model after grouping by zd, zoning_district_type in DBeaver:

zd zoning_district_type count
C8-3 C 321
C8-4 C 111
M1-1 M 12521
M1-1/R5 M 65
M1-1/R6A M 4

@damonmcc damonmcc force-pushed the gft-new-districts branch 3 times, most recently from 84218c1 to dc00003 Compare December 30, 2025 17:25
Arguments to generic tests
should be nested under the `arguments` property.
This maps the new districts to Other but the relevant lots have incorrect values because we ignore district values after forward slashes.
@damonmcc damonmcc marked this pull request as ready for review December 30, 2025 18:31
@damonmcc damonmcc requested a review from a team December 30, 2025 18:35
Comment on lines +174 to +176
# dict format doesn't work because it tries to do cast(null as USER-DEFINED) as "geom"
# rows:
# - {bbl: 123, zonedist1: M1, zonedist2: NULL, zonedist3: NULL, zonedist4: NULL}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sql format below doesn't feel great but I didn't wanna get hung up on getting the dict or csv format to work for this first pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the thoroughness of this. When does it get run? During a build? During PR tests?

Copy link
Member Author

@damonmcc damonmcc Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during a build but since they're unit tests it'd be great to run in PR tests. but there's a weird requirement that the upstream models exist (values don't matter) so it might be a little convoluted

I'm guessing it's because the unit test gets columns types form the upstream models

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh from the docs you can do dbt run --select "parent_model_name" --empty to "build an empty version of the models to save warehouse spend"

Copy link
Contributor

@fvankrieken fvankrieken Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a nit on the sql - using VALUES will save a bunch of lines - don't need to do the unions, can just list the tuples

VALUES 
    ('simple_m', 'M1', NULL, NULL, NULL),
    ('multiple_districts', 'M1', 'M2', NULL, NULL),
    ...
;

Though then you lose labeling the individual fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely be in favor of getting them working during CI and skipping during a build

https://docs.getdbt.com/docs/build/unit-tests#when-to-run-unit-tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a nit on the sql - using VALUES will save a bunch of lines - don't need to do the unions, can just list the tuples

Though then you lose labeling the individual fields

losing the field names would be a shame but this seems worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep them in the query at least like this

SELECT * FROM (
    VALUES
        (1, 'one'),
        (2, 'two'),
        (3, 'three')
) AS t (num, letter);

Comment on lines +8 to +10
-- to preserve lots with no zoning since STRING_TO_ARRAY returns an empty (zero-element) array
-- when the result of UNNEST(ARRAY[ ... ] is a string of zero length. this UNION ALL approach
-- is simpler than using joins or complicated nesting of array functions
Copy link
Member Author

@damonmcc damonmcc Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost got away with just a one-line change to handle splitting and unesting forward slash districts. but I like the clarity of using UNION ALL to just combine two very different types of lots, instead of twisting array logic for the "all null" edge case

Copy link
Contributor

@fvankrieken fvankrieken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question to resolve but looks great

@damonmcc
Copy link
Member Author

damonmcc commented Dec 31, 2025

@fvankrieken

I'd definitely be in favor of getting them working during CI and skipping during a build

https://docs.getdbt.com/docs/build/unit-tests#when-to-run-unit-tests

looks like dbt run --empty needs sources to exist in the db and it seems messy to do something like dbt run --empty --select stg__pluto if ${{ matrix.project }} = 'green_fast_tract'

maybe there's a simple way to only select models that are parents of models with unit tests? if not, let's punt on running these in CI

@fvankrieken
Copy link
Contributor

fvankrieken commented Dec 31, 2025

@fvankrieken

I'd definitely be in favor of getting them working during CI and skipping during a build
https://docs.getdbt.com/docs/build/unit-tests#when-to-run-unit-tests

looks like dbt run --empty needs sources to exist in the db and it seems messy to do something like dbt run --empty --select stg__pluto if ${{ matrix.project }} = 'green_fast_tract'

maybe there's a simple way to only select models that are parents of models with unit tests? if not, let's punt on running these in CI

I don't think upstream models need to be defined - checked out this branch and ran this aimed at my schema in db-cscl: dbt test --resource-type unit_test, no dbt run --empty beforehand

image

@fvankrieken
Copy link
Contributor

fvankrieken commented Dec 31, 2025

Oh weird though that the docs are so explicit that the upstream tables DO need to exist

Maybe specifically because you've used the input with sql format, upstream models aren't needed?

@damonmcc
Copy link
Member Author

damonmcc commented Dec 31, 2025

Oh weird though that the docs are so explicit that the upstream tables DO need to exist

Maybe specifically because you've used the input with sql format, upstream models aren't needed?

worked! just dropped the dbt run --empty part. shame the docs make it sounds so necessary: "The direct parents of the model that you’re unit testing need to exist in the warehouse before you can execute the unit test."

@damonmcc damonmcc merged commit ee1d76e into main Dec 31, 2025
23 checks passed
@damonmcc damonmcc deleted the gft-new-districts branch January 1, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GFT mapping R11 or R12 districts to "Other"

3 participants