Add bulk download 2a #28
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces several enhancements and improvements to the
oda_readerpackage, focusing on bulk downloading and reading of the DAC2A dataset, automatic file type and delimiter detection, and improved developer experience. The most significant changes are the addition of thebulk_download_dac2a()function, auto-detection of file types and delimiters, and deprecation of theis_txtparameter. Testing support is also improved with new dependencies and unit tests.DAC2A Bulk Download Improvements
bulk_download_dac2a()function to enable bulk downloading of the full DAC2A dataset, with support for saving to disk or streaming as an iterator. (src/oda_reader/dac2a.py,src/oda_reader/__init__.py, [1] [2] [3] [4]get_full_dac2a_parquet_id()helper to retrieve the correct file ID for the DAC2A bulk download. (src/oda_reader/dac2a.py, src/oda_reader/dac2a.pyR1-R35)tests/datasets/dac2a/unit/test_dac2a_bulk.py, tests/datasets/dac2a/unit/test_dac2a_bulk.pyR1-R98)File Type and Delimiter Auto-Detection
is_txtparameter and supporting both formats transparently. (src/oda_reader/download/download_tools.py, [1] [2] [3] [4] [5] [6] [7]_detect_delimiter()utility to automatically detect CSV delimiters (comma, pipe, tab, semicolon) when reading txt files from bulk downloads. (src/oda_reader/download/download_tools.py, src/oda_reader/download/download_tools.pyR55-R84)API Changes and Deprecations
is_txtparameter inbulk_download_parquet(), emitting a warning when used and updating documentation to reflect auto-detection. (src/oda_reader/download/download_tools.py, src/oda_reader/download/download_tools.pyL400-R526)bulk_download_parquet()in other modules to remove theis_txtargument. (src/oda_reader/crs.py, [1];src/oda_reader/multisystem.py, [2]Developer Experience and Testing
pytestandpytest-mockto development dependencies for improved testing support. (pyproject.toml, pyproject.tomlR45-R46)1.4.0to reflect new features and changes. (CHANGELOG.md, [1];pyproject.toml, [2]References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]