CSCL SAF #2098

fvankrieken · 2025-12-04T19:31:07Z

A lot of the lines added are docs!! I add them with relevant commits, but easiest spot to view the full file is here

Background

(most of this is just copied from design_doc.md)

The Special Address File (SAF) contains address information for some street segments supplementary to the address information for the same segments contained in the LION file. There are 9 outputs - 5 different let's call them "output formats", 4 of which have 2 different outputs.

SAF records come from

altsegmentdata - these are relatively "fully-fledged" and are basically designed to generate these output saf records
certain CommonPlace records
certain AddressPoint records
NamedIntersection records

The different flags are:

Flag	Source	Corresponds to	Meaning	Output
A	altsegmentdata	segment	Alternative address ranges	ABCEGNPX
B	altsegmentdata	segment	Alternative street names	ABCEGNPX
C	altsegmentdata	segment	Special record for Ruby Street on Brooklyn-Queens border	ABCEGNPX
D	altsegmentdata	segment	Duplicate or overlapping address ranges on the same street	D
E	altsegmentdata	segment	Neighborhood name usable in place of street name in addresses	ABCEGNPX
G	commonplace	segment	Non-Addressable place name of complexes	ABCEGNPX
I	namedintersection	intersection	Contain intersection names	I
N	commonplace	segment	Non-Addressable placename	ABCEGNPX
O	altsegmentdata	segment	Contain out-of-sequence addresses and opposite-parity addresse	OV
P	altsegmentdata	segment	Addressable placename	ABCEGNPX
S	addresspoint	segment	Suffixed house number(s) occurring at an intersection	S
V	addresspoint	segment	Contain Vanity Addresses	OV
X	commonplace	segment	Non-Addressable Place name of constituent entities of complexes	ABCEGNPX

Outputs are really just grouped by what data/fields are expected to be associated with an output, as far as I can tell

Review

Commits are generally split up by output file. Though for ABCEGNPX, since it's the largest and first output I tackled, I split it up into a few commits for clarity. The other outputs just have a single commit for essentially all their logic.

READ THE DESIGN DOC CHANGES IN EACH COMMIT FIRST (for commits that edit design_doc.md). This is the documentation for those transformations, so if that isn't giving you proper context for review, that's an issue that I need to address.

All feedback welcome on the documentation! Making it more readable, expanding on jargon, structuring it better.

codecov · 2025-12-04T19:36:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.68%. Comparing base (fc5a230) to head (a770170).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fvankrieken · 2025-12-30T18:04:23Z

products/cscl/macros/qa_compare_by_row.sql

I originally was using this to create some QA tables. However, it seems overkill to start actually making qa tables for each file output for these smaller files, I've just been running this ad-hoc in dbeaver.

fvankrieken · 2025-12-30T19:40:45Z

Have done some QA of the tables with the query included in that macro

WITH combined AS (
    SELECT 
    	'dev' as source,
        *,
        md5(CAST(dev AS text)) AS row_hash
    FROM saf_{abcegnpx_generic}_by_field as dev
    UNION ALL
    SELECT 
    	'prod' as source,
        *,
        md5(CAST(prod AS text)) AS row_hash
    FROM production_outputs.saf_{abcegnpx_generic} as prod
),
counts AS (
    SELECT 
        *,
        COUNT(*) OVER (PARTITION BY row_hash) AS match_count,
        COUNT(CASE WHEN source = 'dev' THEN 1 END) OVER (PARTITION BY row_hash) AS dev_count,
        COUNT(CASE WHEN source = 'prod' THEN 1 END) OVER (PARTITION BY row_hash) AS prod_count
    FROM combined
)
select counts.*
--, gnx.atomicid, gnx.geom, gnx.ap_geom
FROM counts
--left join int__saf_gnx gnx on counts.segmentid::int = gnx.segmentid and counts.source = 'dev'
WHERE dev_count <> prod_count  -- Different counts = unmatched rows
order by boroughcode, face_code, segment_seqnum, source

ABCEGNPX

131 diffs (both generic and roadbed).

majority of cases are different APs being joined. I've done a little querying - all records are gnx address points that are essentially on top of segments. However, in all cases, they're slightly off center, and ours seems to be correct. Will send list to GR
some "keys" that are in dev but not prod and vice-versa. Going to punt this issue. Seems like some have different face codes but same other fields, which would imply some segments are not being assigned correctly.

D

No diffs between dev and prod

I

No diffs between dev and prod

OV

No diffs between dev and prod

S

32 diffs, some weirdness. Want to punt this to a new issue

hn - some seem to have the interesting Edgewater logic applied, but others don't
lgc - don't seem to line up. Prod code is a little weird - looks fine, but it's implemented differently from standard segment lgc lookup for some reason

fvankrieken · 2025-12-30T20:06:22Z

products/cscl/models/product/saf/_saf.yml

+models:
+# Models with column per field
+- name: saf_abcegnpx_generic_by_field
+  columns: &safa_columns


Started to lose a little faith in this while writing it - I feel like it's a lot of effort to write/maintain tests like these when really, I'd rather just trust the formatting seed and the macro that generates all these columns (and also, these tests take a while to run - for LION, I think it's like a minute). And then, we test that the final text output (the "dat" table with row concatenated as text) has the right length, so that would catch most issues here.

So I stopped part way through and thought I'd open it up to thoughts.

yea seems like a pain to maintain the column length tests. always doing the final "dat" table testing sounds good to me

products/cscl/design_doc.md

products/cscl/models/intermediate/saf/int__saf_segments.sql

damonmcc · 2026-01-06T16:25:50Z

products/cscl/models/intermediate/saf/int__address_points.sql

+    CASE
+        WHEN street_names.snd_feature_type IN ('E', 'F') AND address_points.house_number_suffix IS NOT NULL
+            THEN
+                -- Last character is A, B, etc -> converted to 1, 2, etc
+                (
+                    10000 * COALESCE(ASCII(RIGHT(address_points.house_number_suffix, 1)) - 64, 0)
+                    + address_points.house_number::INT
+                )::TEXT
+        WHEN address_points.hyphen_type = 'R'
+            THEN TRIM(SPLIT_PART(address_points.house_number, '-', 1))
+        ELSE address_points.house_number


this is a way to generalize the logic for Edgewater without having to explicitly write a WHEN for just that area?

Can you elaborate? WHEN street_names.snd_feature_type IN ('E', 'F') is explicitly for Edgewater

oh sorry I forgot that's how it's flagged in the source data, and that's in the doc!

fvankrieken force-pushed the fvk-cscl-saf branch 3 times, most recently from d69c9b5 to b763d54 Compare December 9, 2025 21:42

fvankrieken force-pushed the fvk-cscl-saf branch 26 times, most recently from 7517fd7 to 0a1981c Compare December 23, 2025 18:50

fvankrieken force-pushed the fvk-cscl-saf branch from 475b2fe to 43531f3 Compare December 30, 2025 17:28

fvankrieken commented Dec 30, 2025

View reviewed changes

fvankrieken changed the title ~~WIP: cscl saf~~ CSCL SAF Dec 30, 2025

fvankrieken marked this pull request as ready for review December 30, 2025 18:12

fvankrieken assigned damonmcc and alexrichey Dec 30, 2025

fvankrieken commented Dec 30, 2025

View reviewed changes

fvankrieken force-pushed the fvk-cscl-saf branch from 030a5e4 to e0b956e Compare December 31, 2025 17:05

damonmcc reviewed Jan 5, 2026

View reviewed changes

products/cscl/design_doc.md Outdated Show resolved Hide resolved

fvankrieken added 13 commits January 6, 2026 11:13

add rpl to ingest, recipe, sources

fd5ebfb

add foundational saf_segments model

53cd4ba

saf_abcegnpx setup

58f86e8

add saf abcep, gnx sources, abcegnpx outputs

e3b6781

saf_abcegnpx product/qa

0cdca83

macro to compare tables by row

f0d77eb

saf d

424cf8e

saf_i

f75585f

saf_ov

58d7897

incorrect docs

b8c5481

saf_s

bd3a777

fix sort order place names

34dd098

design_doc typos/clarity

8855898

fvankrieken force-pushed the fvk-cscl-saf branch from bd1d4e3 to 8855898 Compare January 6, 2026 16:13

damonmcc reviewed Jan 6, 2026

View reviewed changes

products/cscl/models/intermediate/saf/int__saf_segments.sql Outdated Show resolved Hide resolved

damonmcc reviewed Jan 6, 2026

View reviewed changes

POST-REVIEW: snip

a770170

damonmcc approved these changes Jan 6, 2026

View reviewed changes

fvankrieken merged commit a0072b6 into main Jan 6, 2026
23 checks passed

fvankrieken deleted the fvk-cscl-saf branch January 6, 2026 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CSCL SAF #2098

CSCL SAF #2098

Uh oh!

fvankrieken commented Dec 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

fvankrieken Dec 30, 2025

Uh oh!

fvankrieken commented Dec 30, 2025 •

edited

Loading

Uh oh!

fvankrieken Dec 30, 2025 •

edited

Loading

Uh oh!

damonmcc Jan 6, 2026

Uh oh!

Uh oh!

Uh oh!

damonmcc Jan 6, 2026

Uh oh!

fvankrieken Jan 6, 2026

Uh oh!

damonmcc Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CSCL SAF #2098

CSCL SAF #2098

Uh oh!

Conversation

fvankrieken commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Review

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fvankrieken Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

fvankrieken commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ABCEGNPX

D

I

OV

S

Uh oh!

fvankrieken Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damonmcc Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

damonmcc Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

fvankrieken Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

damonmcc Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fvankrieken commented Dec 4, 2025 •

edited

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading

fvankrieken commented Dec 30, 2025 •

edited

Loading

fvankrieken Dec 30, 2025 •

edited

Loading