refactor: Extract shared DataFrame normalization logic#6
Open
dshkol wants to merge 6 commits intorefactor/resilient-session-integrationfrom
Open
refactor: Extract shared DataFrame normalization logic#6dshkol wants to merge 6 commits intorefactor/resilient-session-integrationfrom
dshkol wants to merge 6 commits intorefactor/resilient-session-integrationfrom
Conversation
This commit makes two code organization improvements: 1. Move all inline imports to the top of their respective files - Follows Python convention (PEP 8) - Improves static analysis and IDE support - Makes dependencies visible at a glance 2. Centralize API URLs in settings.py - CENSUSMAPPER_API_URL for api/v1 endpoints - CENSUSMAPPER_DATA_URL for data_sets endpoints - Single source of truth eliminates drift risk Files modified: - settings.py: Add URL constants - core.py: Move json, io, hashlib imports; use URL constant - vectors.py: Move io, warnings imports; use URL constant - regions.py: Move io import; use URL constant - datasets.py: Move re import; use URL constant - hierarchy.py: Move re import to top - intersect_geometry.py: Reorganize imports; use URL constant - resilience.py: Move atexit import to top All existing tests pass unchanged, confirming no behavioral impact. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The unconditional import of RPythonBridge was causing CI failures since the cross_validation module is only available locally. Wrap the import in try/except like other test files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3b1502e to
e3d38ac
Compare
f7955c0 to
13ef466
Compare
- Update actions/checkout@v3 → v4 in validate_examples.yml - Update actions/upload-artifact@v3 → v4 in validate_examples.yml - Update codecov/codecov-action@v3 → v4 in ci.yml These v3 actions are deprecated and cause CI warnings/failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace direct requests.get/post calls with get_session().get/post to leverage the existing resilience infrastructure that provides: - Connection pooling for improved performance - Automatic retries with exponential backoff - Rate limiting (100ms minimum between requests) - Consistent error handling Changes: - core.py: 3 requests.post calls → get_session().post - datasets.py: 1 requests.get call → get_session().get - vectors.py: 1 requests.get call → get_session().get - regions.py: 1 requests.get call → get_session().get - intersect_geometry.py: 1 requests.post call → get_session().post (kept 60s timeout for geometry operations) Removed redundant raise_for_status() calls since ResilientSession handles HTTP errors internally. Updated test mocks to patch get_session instead of requests module. Added test_resilient_session_is_used to verify the session is called. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
e3d38ac to
f18c0c8
Compare
Consolidate duplicate code between _process_csv_response and
_process_geojson_response into a shared _normalize_census_dataframe()
function.
The new function handles:
- Census NA value conversion ('x', 'X', 'F', '...', '-', '')
- Numeric column dtype conversion (Population, Households, etc.)
- Categorical column dtype conversion (Type, Region Name)
- Vector metadata extraction
- Both CSV endpoint names (Population) and GeoJSON short names (pop)
This reduces code duplication by ~60 lines and ensures consistent
data handling across both CSV and GeoJSON endpoints.
Added 3 new unit tests:
- test_normalize_census_dataframe_census_na_values
- test_normalize_census_dataframe_geojson_short_names
- test_normalize_produces_equivalent_results
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
13ef466 to
1a67981
Compare
f18c0c8 to
df550d4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidate duplicate code between
_process_csv_responseand_process_geojson_responseinto a shared_normalize_census_dataframe()function.Changes
_normalize_census_dataframe()that handles:'x','X','F','...','-','')Population) and GeoJSON short names (pop)_process_csv_responsefrom ~60 lines to ~10 lines_process_geojson_responsefrom ~70 lines to ~10 linesWhy
Test plan
test_normalize_census_dataframe_census_na_valuestest_normalize_census_dataframe_geojson_short_namestest_normalize_produces_equivalent_results🤖 Generated with Claude Code