-
Notifications
You must be signed in to change notification settings - Fork 3
CSCL SAF #2098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSCL SAF #2098
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d69c9b5 to
b763d54
Compare
7517fd7 to
0a1981c
Compare
475b2fe to
43531f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally was using this to create some QA tables. However, it seems overkill to start actually making qa tables for each file output for these smaller files, I've just been running this ad-hoc in dbeaver.
|
Have done some QA of the tables with the query included in that macro WITH combined AS (
SELECT
'dev' as source,
*,
md5(CAST(dev AS text)) AS row_hash
FROM saf_{abcegnpx_generic}_by_field as dev
UNION ALL
SELECT
'prod' as source,
*,
md5(CAST(prod AS text)) AS row_hash
FROM production_outputs.saf_{abcegnpx_generic} as prod
),
counts AS (
SELECT
*,
COUNT(*) OVER (PARTITION BY row_hash) AS match_count,
COUNT(CASE WHEN source = 'dev' THEN 1 END) OVER (PARTITION BY row_hash) AS dev_count,
COUNT(CASE WHEN source = 'prod' THEN 1 END) OVER (PARTITION BY row_hash) AS prod_count
FROM combined
)
select counts.*
--, gnx.atomicid, gnx.geom, gnx.ap_geom
FROM counts
--left join int__saf_gnx gnx on counts.segmentid::int = gnx.segmentid and counts.source = 'dev'
WHERE dev_count <> prod_count -- Different counts = unmatched rows
order by boroughcode, face_code, segment_seqnum, sourceABCEGNPX131 diffs (both generic and roadbed).
DNo diffs between dev and prod INo diffs between dev and prod OVNo diffs between dev and prod S32 diffs, some weirdness. Want to punt this to a new issue
|
| models: | ||
| # Models with column per field | ||
| - name: saf_abcegnpx_generic_by_field | ||
| columns: &safa_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started to lose a little faith in this while writing it - I feel like it's a lot of effort to write/maintain tests like these when really, I'd rather just trust the formatting seed and the macro that generates all these columns (and also, these tests take a while to run - for LION, I think it's like a minute). And then, we test that the final text output (the "dat" table with row concatenated as text) has the right length, so that would catch most issues here.
So I stopped part way through and thought I'd open it up to thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea seems like a pain to maintain the column length tests. always doing the final "dat" table testing sounds good to me
030a5e4 to
e0b956e
Compare
bd1d4e3 to
8855898
Compare
| CASE | ||
| WHEN street_names.snd_feature_type IN ('E', 'F') AND address_points.house_number_suffix IS NOT NULL | ||
| THEN | ||
| -- Last character is A, B, etc -> converted to 1, 2, etc | ||
| ( | ||
| 10000 * COALESCE(ASCII(RIGHT(address_points.house_number_suffix, 1)) - 64, 0) | ||
| + address_points.house_number::INT | ||
| )::TEXT | ||
| WHEN address_points.hyphen_type = 'R' | ||
| THEN TRIM(SPLIT_PART(address_points.house_number, '-', 1)) | ||
| ELSE address_points.house_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a way to generalize the logic for Edgewater without having to explicitly write a WHEN for just that area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? WHEN street_names.snd_feature_type IN ('E', 'F') is explicitly for Edgewater
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry I forgot that's how it's flagged in the source data, and that's in the doc!
#1836
A lot of the lines added are docs!! I add them with relevant commits, but easiest spot to view the full file is here
latest build
Background
(most of this is just copied from design_doc.md)
The Special Address File (SAF) contains address information for some street segments supplementary to the address information for the same segments contained in the LION file. There are 9 outputs - 5 different let's call them "output formats", 4 of which have 2 different outputs.
SAF records come from
The different flags are:
Outputs are really just grouped by what data/fields are expected to be associated with an output, as far as I can tell
Review
Commits are generally split up by output file. Though for ABCEGNPX, since it's the largest and first output I tackled, I split it up into a few commits for clarity. The other outputs just have a single commit for essentially all their logic.
READ THE DESIGN DOC CHANGES IN EACH COMMIT FIRST (for commits that edit design_doc.md). This is the documentation for those transformations, so if that isn't giving you proper context for review, that's an issue that I need to address.
All feedback welcome on the documentation! Making it more readable, expanding on jargon, structuring it better.