Skip to content

randsley/pyptine

Repository files navigation

pyptine - INE Portugal API Client

PyPI version Python 3.9+ License: MIT DOI

High-level Python client for Statistics Portugal (INE) API. Query and download statistical data from INE Portugal with a simple, intuitive interface.

Features

  • 🎯 High-level Convenience API: Simple interface for common data retrieval and analysis tasks.
  • Async Support: Non-blocking I/O with AsyncINE for concurrent requests using httpx.
  • 📊 Multiple Output Formats: Export data to pandas DataFrames, JSON, or CSV with ease.
  • 📈 Data Visualization: Interactive plotly charts (line, bar, area, scatter) directly from data.
  • 🔬 Statistical Analysis: Built-in YoY growth, MoM changes, moving averages, and EMA calculations.
  • 💾 Smart Caching: Disk-based caching reduces redundant API calls, speeding up repeated queries.
  • 🔍 Metadata Browsing: Search and discover indicators, themes, and dimensions.
  • 🖥️ Enhanced CLI: Rich formatting with progress bars, tables, and colored output.
  • 📑 True Pagination: Efficient streaming of large datasets with get_all_data().
  • 📖 Modern Python: Fully type-annotated for better developer experience and IDE support.
  • Well-Tested: Comprehensive test suite with 81% code coverage (239 tests).
  • 🔄 API Compatible: Supports both old and new INE API response formats seamlessly.

Installation

pip install pyptine

For development, install with all extra dependencies:

pip install "pyptine[dev,docs]"

Quick Start

from pyptine import INE

# Initialize the client
ine = INE(language="EN")

# 1. Search for an indicator
print("Searching for 'gdp' indicators...")
results = ine.search("gdp")
for indicator in results[:5]:  # Print top 5 results
    print(f"- {indicator.varcd}: {indicator.title}")

# 2. Get data for a specific indicator
varcd = "0004167"  # Resident population
print(f"\nFetching data for indicator {varcd}...")
response = ine.get_data(varcd)

# 3. Convert to a pandas DataFrame
df = response.to_dataframe()
print("\nData as DataFrame:")
print(df.head())

# 4. Export data to a CSV file
output_file = "population_data.csv"
print(f"\nExporting data to {output_file}...")
ine.export_csv(varcd, output_file)
print("Done!")

Async API

For concurrent requests and non-blocking I/O, use the AsyncINE client:

import asyncio
from pyptine import AsyncINE

async def main():
    async with AsyncINE(language="EN") as ine:
        # Fetch single indicator
        response = await ine.get_data("0004167")
        df = response.to_dataframe()
        print(df.head())

        # Fetch multiple indicators concurrently
        import asyncio
        responses = await asyncio.gather(
            ine.get_data("0004167"),
            ine.get_data("0004127"),
            ine.get_data("0008074")
        )

        # Stream large datasets
        async for chunk in ine.get_all_data("0004127", chunk_size=40000):
            df_chunk = chunk.to_dataframe()
            print(f"Processing {len(df_chunk)} rows...")

asyncio.run(main())

AsyncINE Features:

  • Non-blocking I/O for faster concurrent requests
  • Async iterator for memory-efficient pagination
  • Same API as the synchronous INE client
  • Automatic connection pooling and retries

Command-Line Usage

The pyptine CLI provides a convenient way to access INE data from your terminal, with rich formatting and progress indicators for a better user experience.

# Search for indicators related to "pib" (GDP in Portuguese)
pyptine search "pib"

# Get detailed information about a specific indicator
pyptine info 0004127

# Download data for an indicator to a CSV file (with progress bar)
pyptine download 0004127 --output data.csv

# Download data and filter by dimensions
pyptine download 0004167 --output filtered_data.csv -d Dim1=S7A2023 -d Dim2=PT

# List all available statistical themes (in formatted table)
pyptine list-commands themes

# List all indicators (with pagination support)
pyptine list-commands indicators --limit 50

# View available dimensions for an indicator
pyptine dimensions 0004167

# Clear the local cache
pyptine cache clear

CLI Features:

  • Rich Formatting - Tables, panels, and colored output for better readability
  • Progress Indicators - Spinners and progress bars for long-running operations
  • Error Handling - Centralized, user-friendly error messages with context
  • Better Organization - Data displayed in well-formatted tables rather than plain text

Documentation

Initializing the Client

The INE class is the main entry point.

from pyptine import INE
from pathlib import Path

# Default client (language='EN', caching=True)
ine = INE()

# Client with Portuguese language
ine_pt = INE(language="PT")

# Disable caching
ine_no_cache = INE(cache=False)

# Use a custom cache directory
ine_custom_cache = INE(cache_dir=Path("/path/to/custom/cache"))

Working with Indicators

Searching for Indicators

You can search for indicators by keyword and filter by theme or sub-theme.

# Basic search
results = ine.search("unemployment rate")

# Search within a specific theme
results = ine.search("employment", theme="Labour market")

Getting Indicator Metadata

Retrieve detailed information about an indicator, including its dimensions.

metadata = ine.get_metadata("0004167")
print(f"Title: {metadata.title}")
print(f"Unit: {metadata.unit}")
print(f"Source: {metadata.source}")

# List available dimensions
dimensions = ine.get_dimensions("0004167")
for dim in dimensions:
    print(f"\nDimension: {dim.name}")
    for value in dim.values[:5]:  # Show first 5 values
        print(f"- {value.code}: {value.label}")

Fetching and Exporting Data

Getting Data

The get_data method returns a DataResponse object, which can be easily converted to different formats.

response = ine.get_data("0004127")

# Convert to pandas DataFrame
df = response.to_dataframe()

# Convert to a dictionary
data_dict = response.to_dict()

# Get data as a JSON string
json_str = response.to_json()

Filtering Data with Dimensions

Use the dimensions parameter to filter data before downloading.

# Get data for the year 2023 and region "Portugal"
# Note: Dimension values use specific codes (e.g., 'S7A2023' for year 2023)
filtered_response = ine.get_data(
    "0004167",
    dimensions={
        "Dim1": "S7A2023",  # Year 2023
        "Dim2": "PT"        # Geographic region 'Portugal'
    }
)
df_filtered = filtered_response.to_dataframe()

Exporting Data

You can export data directly to CSV or JSON files.

# Export to CSV
ine.export_csv("0004127", "output.csv")

# Export to JSON with pretty printing
ine.export_json("0004127", "output.json", pretty=True)

# Export filtered data
ine.export_csv(
    "0004167",
    "filtered_output.csv",
    dimensions={"Dim1": "S7A2023"}
)

Working with Large Datasets

For large datasets that exceed the default 40,000 data point limit, use the get_all_data() method which automatically handles pagination:

from pyptine.client.data import DataClient

client = DataClient(language="EN")

# Fetch data in chunks (default chunk_size=40,000)
for chunk in client.get_all_data("0004127"):
    df = chunk.to_dataframe()
    print(f"Processed {len(df)} rows")
    # Process each chunk

# Custom chunk size
for chunk in client.get_all_data("0004127", chunk_size=5000):
    # Process smaller chunks
    pass

# Combine all chunks into a single dataset
all_chunks = list(client.get_all_data("0004127"))
all_data = [point for chunk in all_chunks for point in chunk.data]

Visualizing Data

Create interactive visualizations directly from indicator data without exporting to DataFrame:

# Get data and create interactive line chart
response = ine.get_data("0004127")
fig = response.plot(chart_type="line")
fig.show()

# Different chart types
fig_bar = response.plot_bar()
fig_area = response.plot_area()
fig_scatter = response.plot_scatter()

# Customize visualization
fig = response.plot_line(
    markers=True,
    x_column="Period",
    y_column="value"
)

# Color by dimensions (if data has dimension columns)
fig = response.plot_line(color_column="region")

# Save to HTML for sharing
fig.write_html("indicator_plot.html")

# Further customization with plotly
fig.update_layout(height=600, width=1200, title="Custom Title")
fig.show()

Available Visualization Methods:

  • plot(chart_type) - Generic plot with selectable chart type
  • plot_line() - Interactive line chart with optional markers
  • plot_bar() - Bar chart for categorical comparison
  • plot_area() - Stacked area chart for trends
  • plot_scatter() - Scatter plot with optional size and color dimensions

All methods support:

  • Interactive plotly charts with hover, zoom, and pan
  • Custom column selection for x/y axes
  • Color coding by dimension columns
  • Export to HTML, PNG, or other formats

Advanced Data Analysis

Perform statistical calculations on indicator data directly within the library:

# Get data and calculate year-over-year growth
response = ine.get_data("0004127")
yoy_response = response.calculate_yoy_growth()
df_yoy = yoy_response.to_dataframe()
print(df_yoy[['Period', 'value', 'yoy_growth']])

# Calculate month-over-month changes
mom_response = response.calculate_mom_change()
df_mom = mom_response.to_dataframe()

# Calculate simple moving average (3-period)
ma_response = response.calculate_moving_average(window=3)
df_ma = ma_response.to_dataframe()

# Calculate exponential moving average
ema_response = response.calculate_exponential_moving_average(span=5)
df_ema = ema_response.to_dataframe()

# Chain multiple analyses
result = response.calculate_yoy_growth().calculate_moving_average(window=2)
df = result.to_dataframe()
print(df[['Period', 'value', 'yoy_growth', 'moving_avg']])

Available analysis methods on DataResponse:

  • calculate_yoy_growth() - Year-over-year percentage change
  • calculate_mom_change() - Month-over-month percentage change
  • calculate_moving_average(window) - Simple moving average
  • calculate_exponential_moving_average(span) - Exponential weighted moving average

All methods support custom value_column and period_column parameters to work with different data structures.

API Reference

INE Class

The main class for interacting with the INE API.

INE(language: str = "EN", cache: bool = True, cache_dir: Optional[Path] = None, cache_ttl: int = 86400)

Method Description
search(query, ...) Search for indicators.
get_data(varcd, ...) Get data for an indicator as a DataResponse object.
get_metadata(varcd) Get detailed metadata for an indicator.
get_dimensions(varcd) Get available dimensions for an indicator.
get_indicator(varcd) Get catalogue information for a single indicator.
validate_indicator(varcd) Check if an indicator code is valid.
list_themes() Get a list of all available themes.
export_csv(varcd, ...) Export indicator data to a CSV file.
export_json(varcd, ...) Export indicator data to a JSON file.
clear_cache() Clear all cached data.
get_cache_info() Get statistics about the cache.

Links & Resources


Development

Setup

To set up the development environment:

# Clone the repository
git clone https://github.com/nigelrandsley/pyptine.git
cd pyptine

# Install in editable mode with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks to ensure code quality
pre-commit install

Running Tests

# Run all tests
pytest

# Run tests with coverage report
pytest --cov=src/pyptine --cov-report=term-missing

Code Quality

This project uses black for formatting, ruff for linting, and mypy for type checking.

# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/amazing-feature).
  3. Commit your changes (git commit -m 'Add amazing feature').
  4. Push to the branch (git push origin feature/amazing-feature).
  5. Open a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

High-level Python client for Statistics Portugal (INE) API. Query and download statistical data from INE Portugal with a simple, intuitive interface.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors