Docs improvements #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

jm-rivera merged 4 commits into main from docs-tweaks

Oct 27, 2025

.github/workflows/tests.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -22,12 +22,15 @@ jobs: @@
             uses: astral-sh/setup-uv@v6
             with:
               python-version: ${{ matrix.python-version }}
+              enable-cache: false
           - name: Install dependencies
             run: uv sync --all-groups
           - name: Run unit tests
             run: uv run pytest tests/ -n auto -m "not integration" -v
+            env:
+              ODA_READER_CACHE_DIR: ${{ runner.temp }}/oda_cache
           - name: Integration Tests
             if: github.event_name == 'pull_request' && matrix.python-version == '3.12' && matrix.os == 'ubuntu-latest'
@@ Expand Down @@

CHANGELOG.md

-Original file line number
+Diff line change
@@ -1,5 +1,8 @@
     # Changelog for oda_reader
+    ## 1.3.1 (2025-06-27)
+    - Improves cache management for very large files. Introduces tests and improved documentation
     ## 1.3.0 (2025-06-16)
     - Improves cache management.
@@ Expand Down @@

docs/docs/advanced.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -45,9 +45,9 @@ OECD occasionally changes dataflow versions (schema updates). ODA Reader handles
  
    When a dataflow version returns 404 (not found), ODA Reader automatically:

    1. Tries the configured version (e.g., `1.0`)

    2. If 404, retries with `0.9`

    3. Continues decrementing: `0.8`, `0.7`, `0.6`

    1. Tries the configured version (e.g., `1.5`)

    2. If 404, retries with `1.4`

    3. Continues decrementing: `1.3`, `1.2`, `1.1`

    4. Returns data from first successful version (up to 5 attempts)

    This means your code keeps working even when OECD makes breaking schema changes.

    @@ -58,9 +58,9 @@ This means your code keeps working even when OECD makes breaking schema changes.
  
    from oda_reader import download_dac1

    # ODA Reader will automatically try:

    # 1.0 -> 404

    # 0.9 -> 404

    # 0.8 -> Success! Returns data with version 0.8

    # 1.5 -> 404

    # 1.4 -> 404

    # 1.3 -> Success! Returns data with version 1.3

    data = download_dac1(start_year=2022, end_year=2022)

    ```

    @@ -71,11 +71,11 @@ You'll see a message indicating which version succeeded.
  
    You can specify an exact dataflow version:

    ```python

    # Force use of version 0.8

    # Force use of version 1.3

    data = download_dac1(

        start_year=2022,

        end_year=2022,

        dataflow_version="0.8"

        dataflow_version="1.3"

    )

    ```

    @@ -177,26 +177,6 @@ combined = pd.merge(
  
    - Column names and codes must align

    - Filter carefully to avoid double-counting

    ## Custom Schema Handling

    If you need custom schema translation beyond built-in options:

    ### Access Raw Data and Translate Manually

    ```python

    # Get raw API data

    data = download_dac1(

        start_year=2022,

        end_year=2022,

        pre_process=False,

        dotstat_codes=False

    )

    # Apply custom transformations

    data = data.rename(columns={'DONOR': 'donor_custom'})

    data['donor_custom'] = data['donor_custom'].map(my_custom_mapping)

    ```

    ### Load Schema Mapping Files

    ```python

    @@ -234,51 +214,6 @@ def get_crs_data():
  
        return pd.read_parquet("/data/crs_full.parquet")

    ```

    ### Refresh Strategy

    ```python

    from pathlib import Path

    from datetime import datetime, timedelta

    def refresh_if_old(file_path, max_age_days=7):

        """Re-download if file is older than max_age_days"""

        path = Path(file_path)

        if not path.exists():

            print("File doesn't exist, downloading...")

            bulk_download_crs(save_to_path=file_path)

            return

        file_age = datetime.now() - datetime.fromtimestamp(path.stat().st_mtime)

        if file_age > timedelta(days=max_age_days):

            print(f"File is {file_age.days} days old, refreshing...")

            bulk_download_crs(save_to_path=file_path)

        else:

            print(f"File is recent ({file_age.days} days old), using cached version")

    # Use in pipeline

    refresh_if_old("/data/crs_full.parquet", max_age_days=7)

    crs_data = pd.read_parquet("/data/crs_full.parquet")

    ```

    ### Memory-Efficient Aggregation

    ```python

    # Process bulk CRS in chunks, aggregate results

    sector_totals = {}

    for chunk in bulk_download_crs(as_iterator=True):

        # Aggregate by sector

        sector_sums = chunk.groupby('purpose_code')['usd_commitment'].sum()

        # Accumulate

        for sector, amount in sector_sums.items():

            sector_totals[sector] = sector_totals.get(sector, 0) + amount

    print(f"Total sectors: {len(sector_totals)}")

    ```

    ## Debugging Tips

    ### Enable Verbose Logging

docs/docs/bulk-downloads.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -71,8 +71,6 @@ bulk_download_crs( @@
     )
     ```
-    The reduced version omits some descriptive columns but retains all flow amounts and key dimensions.
     ## Memory-Efficient Processing with Iterators
     For very large files, process in chunks to avoid loading the entire dataset into memory:
@@ Expand Down Expand Up / @@ -206,37 +204,6 @@ Bulk downloads already have: @@
     See [Schema Translation](schema-translation.md) for detailed comparison.
-    ## Combining Bulk and API Downloads
-    You can mix approaches:
-    ```python
-    # Download full CRS as bulk file
-    crs_full = bulk_download_crs()
-    # Use API for recent updates or specific queries
-    crs_recent = download_crs(
-        start_year=2023,
-        end_year=2023,
-        filters={"donor": "USA"}
-    )
-    # Combine if schemas match
-    # (you may need to harmonize column names first)
-    ```
-    ## Performance Comparison
-    Approximate times (varies by network speed and OECD server load):
-    | Method | Dataset Size | Time |
-    |--------|-------------|------|
-    | API download (filtered) | 10,000 rows | 10-30 seconds |
-    | API download (large query) | 100,000 rows | 2-5 minutes |
-    | Bulk download CRS | ~2 million rows | 1-2 minutes |
-    | Bulk + iterator (filter) | Process 2 million rows | 2-5 minutes |
-    Bulk downloads are consistently fast regardless of query complexity, while API times vary significantly with query size.
     ## Troubleshooting
@@ Expand Down @@

docs/docs/datasets.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -7,23 +7,25 @@ ODA Reader provides access to five datasets covering official development assist
  
    | Dataset | What It Contains | Use When |

    |---------|------------------|----------|

    | **DAC1** | Aggregate flows by donor | Analyzing overall ODA trends, donor performance |

    | **DAC2a** | Bilateral flows by donor-recipient | Recipient-level analysis, who gives to whom |

    | **DAC2a** | Bilateral flows by donor-recipient | Recipient-level analysis |

    | **CRS** | Project-level microdata | Sector analysis, project details, activity-level data |

    | **Multisystem** | Multilateral system usage | Analyzing multilateral channels and contributions |

    | **AidData** | Chinese development finance | Non-DAC donor analysis, Chinese aid flows |

    | **AidData** | Chinese development finance | Chinese aid flows |

    ## DAC1: Aggregate Flows

    **What it contains**: Total ODA and OOF by donor, aggregated across all recipients and sectors. This is the highest-level view of development assistance.

    **Key dimensions**:

    - Donor (bilateral donors and multilateral organizations)

    - Measure type (ODA, OOF, grants, loans, etc.)

    - Flow type (commitments, disbursements, grant equivalents)

    - Price base (current or constant prices)

    - Unit measure (USD millions, national currency, etc.)

    **Use when**:

    - You need donor-level totals

    - Analyzing overall ODA trends over time

    - Comparing donor performance

    @@ -56,12 +58,14 @@ oda_constant = download_dac1(
  
    **What it contains**: Bilateral ODA and OOF flows broken down by both donor and recipient country. Shows who gives to whom.

    **Key dimensions**:

    - Donor (bilateral donors)

    - Recipient (receiving countries and regions)

    - Measure type (bilateral ODA, imputed multilateral, etc.)

    - Price base (current or constant)

    **Use when**:

    - Analyzing flows to specific recipient countries

    - Understanding bilateral relationships

    - Studying geographic distribution of aid

    @@ -95,6 +99,7 @@ germany_eastafrica = download_dac2a(
  
    **What it contains**: Individual project and activity-level data with detailed information about each development assistance activity. This is the most granular dataset.

    **Key dimensions**:

    - Donor

    - Recipient

    - Sector (purpose codes at various levels of detail)

    @@ -104,6 +109,7 @@ germany_eastafrica = download_dac2a(
  
    - Microdata flag (True for project-level, False for semi-aggregates)

    **Use when**:

    - You need project-level details (descriptions, amounts, sectors)

    - Analyzing sector-specific flows

    - Understanding implementation channels

    @@ -153,13 +159,15 @@ semi_agg = download_crs(
  
    **What it contains**: Data on how DAC members use the multilateral aid system, including core contributions to multilateral organizations and earmarked funding.

    **Key dimensions**:

    - Donor

    - Recipient (multilateral organizations)

    - Channel (specific multilateral organizations)

    - Flow type (commitments, disbursements)

    - Measure type

    **Use when**:

    - Analyzing multilateral contributions

    - Understanding core vs. earmarked funding

    - Studying specific multilateral channels (World Bank, UN agencies, etc.)

    @@ -191,16 +199,17 @@ ida_contributions = download_multisystem(
  
    **What it contains**: Project-level data on Chinese development finance activities, compiled by AidData. Covers official finance from China that may not be reported to the OECD.

    **Key dimensions**:

    - Commitment year

    - Recipient country

    - Sector

    - Project descriptions

    - Flow amounts and types

    **Use when**:

    - Analyzing Chinese development finance

    - Comparing traditional DAC donors with China

    - Studying non-DAC donor activities

    - Comparing DAC donors with China

    **Example**:

    @@ -213,7 +222,7 @@ chinese_aid = download_aiddata(start_year=2015, end_year=2020)
  
    # AidData is downloaded as bulk file, filtered by year after download

    ```

    **Note**: AidData comes from Excel files, not the OECD API. It uses a different schema than DAC datasets.

    **Note**: AidData comes from Excel files from the Aid Data website, not the OECD API. It uses a different schema than DAC datasets.

    ## Discovering Available Filters

docs/docs/filtering.md

-Original file line number
+Diff line change
@@ Expand Up @@
     ### DAC1 and DAC2a
     Common dimensions:
     - `donor` - Donor country (ISO3 codes like "USA", "GBR", "FRA")
     - `recipient` - Recipient country or region (DAC2a only)
     - `measure` - Type of flow (ODA, OOF, grants, loans, etc.)
     - `flow_type` - Commitments, disbursements, net flows, etc.
     - `price_base` - "V" for current prices, "Q" for constant prices
-    - `unit_measure` - "USD" for US dollars, "XDC" for national currency
+    - `unit_measure` - "USD" for US dollars
     **Example**: Get net ODA disbursements in constant prices:
@@ Expand All / @@ -127,6 +128,7 @@ data = download_dac1( @@
     ### CRS (Creditor Reporting System)
     CRS has additional dimensions:
     - `sector` - Purpose codes (5-digit codes like "12220" for basic health)
     - `channel` - Implementing organization (government, NGO, multilateral, etc.)
     - `modality` - Grant, loan, equity, etc.
@@ Expand Down Expand Up @@
     ### Multisystem
     Multisystem tracks multilateral contributions:
     - `donor` - Contributing country
     - `channel` - Specific multilateral organization (e.g., "44002" for World Bank IDA)
     - `flow_type` - Commitments, disbursements
@@ Expand Down Expand Up / @@ -214,7 +217,8 @@ print(data['measure'].unique()) # See all measure codes @@
 . **Use trial and error**: Download a small query and examine column values
-    **Note**: Codes differ between API schema and .Stat schema. By default, ODA Reader returns .Stat codes. See [Schema Translation](schema-translation.md) for details.
+    **Note**: Codes differ between API schema and .Stat schema. When making API calls, you must use the
+    API schema. However by default, ODA Reader returns .Stat codes. See [Schema Translation](schema-translation.md) for details.
     ## Empty Filters
@@ Expand Down @@

docs/docs/getting-started.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -13,7 +13,7 @@ pip install oda-reader @@
     Or using uv (recommended for faster installs):
     ```bash
-    uv pip install oda-reader
+    uv add oda-reader
     ```
     That's it! ODA Reader and its dependencies (pandas, requests, pyarrow, etc.) are now installed.
@@ Expand Down Expand Up / @@ -121,6 +121,4 @@ Now that you've downloaded your first datasets, explore: @@
     **Query is slow**: First-time queries can take 10-30 seconds as ODA Reader fetches from OECD's API. Subsequent identical queries are instant due to caching.
-    **Rate limit errors**: By default, ODA Reader limits to 20 requests per 60 seconds. This should prevent rate limit errors. If you see them, your cache might have been cleared. Wait a minute and retry.
-    **Import errors**: Make sure you installed with dependencies: `pip install oda-reader` (not just `oda_reader`).
+    **Rate limit errors**: By default, ODA Reader limits to 20 requests per hour. This should prevent rate limit errors. If you see them, your cache might have been cleared. Wait and retry.

docs/docs/index.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,12 +1,12 @@
  
    # ODA Reader

    **Programmatic access to OECD DAC data without the headaches**

    **Programmatic access to OECD DAC data**

    Working with OECD Development Assistance Committee (DAC) data is frustrating. You need to navigate multiple datasets (DAC1, DAC2a, CRS), understand complex SDMX API syntax, manage rate limits, and reconcile different schema versions. The OECD doesn't provide any first-party Python library to help.

    Working with OECD Development Assistance Committee (DAC) data can be frustrating. You need to navigate multiple datasets (DAC1, DAC2a, CRS,...), understand complex SDMX API syntax, manage toy rate limits, and reconcile different schema versions. The OECD doesn't provide any first-party Python library to help.

    Worse, the OECD has a habit of introducing undocumented schema changes, breaking link URLs, and making format changes without notice. What works today might break tomorrow, making it extremely difficult to build robust data pipelines for research and analysis.

    Unfortunately, the OECD has a habit of introducing undocumented schema changes, breaking link URLs, and making format changes without notice. What works today might break tomorrow, making it very difficult to build robust data pipelines for research and analysis.

    ODA Reader eliminates these headaches. It provides a unified Python interface that handles complexity for you: automatic version fallbacks when schemas change, consistent APIs across datasets, smart caching to reduce dependency on flaky endpoints, and schema translation between API and legacy formats.

    ODA Reader eliminates these headaches. It provides a unified Python interface that handles complexity for you: automatic search of the latest schema, consistent APIs across datasets, smart caching to reduce dependency on flaky endpoints, and schema translation between the data-explorer API and OECD.Stat formats.

    **Key features**:

    @@ -15,7 +15,8 @@ ODA Reader eliminates these headaches. It provides a unified Python interface th
  
    - **Bulk download large files** with memory-efficient streaming for the full CRS (1GB+)

    - **Automatic rate limiting** and caching to work within API constraints

    - **Schema translation** between Data Explorer API and OECD.Stat formats

    - **Version fallback** automatically retries with older schema versions when OECD makes breaking changes

    - **Version fallback** automatically searches for the most recent schema version since they

    can unexpectedly change with new data releases.

    **Built for researchers, analysts, and developers** who need reliable, programmatic access to ODA data without fighting infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs improvements #23

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!