High-level Python client for Statistics Portugal (INE) API. Query and download statistical data from INE Portugal with a simple, intuitive interface.
- 🎯 High-level Convenience API: Simple interface for common data retrieval and analysis tasks.
- ⚡ Async Support: Non-blocking I/O with
AsyncINEfor concurrent requests using httpx. - 📊 Multiple Output Formats: Export data to pandas DataFrames, JSON, or CSV with ease.
- 📈 Data Visualization: Interactive plotly charts (line, bar, area, scatter) directly from data.
- 🔬 Statistical Analysis: Built-in YoY growth, MoM changes, moving averages, and EMA calculations.
- 💾 Smart Caching: Disk-based caching reduces redundant API calls, speeding up repeated queries.
- 🔍 Metadata Browsing: Search and discover indicators, themes, and dimensions.
- 🖥️ Enhanced CLI: Rich formatting with progress bars, tables, and colored output.
- 📑 True Pagination: Efficient streaming of large datasets with
get_all_data(). - 📖 Modern Python: Fully type-annotated for better developer experience and IDE support.
- ✅ Well-Tested: Comprehensive test suite with 81% code coverage (239 tests).
- 🔄 API Compatible: Supports both old and new INE API response formats seamlessly.
pip install pyptineFor development, install with all extra dependencies:
pip install "pyptine[dev,docs]"from pyptine import INE
# Initialize the client
ine = INE(language="EN")
# 1. Search for an indicator
print("Searching for 'gdp' indicators...")
results = ine.search("gdp")
for indicator in results[:5]: # Print top 5 results
print(f"- {indicator.varcd}: {indicator.title}")
# 2. Get data for a specific indicator
varcd = "0004167" # Resident population
print(f"\nFetching data for indicator {varcd}...")
response = ine.get_data(varcd)
# 3. Convert to a pandas DataFrame
df = response.to_dataframe()
print("\nData as DataFrame:")
print(df.head())
# 4. Export data to a CSV file
output_file = "population_data.csv"
print(f"\nExporting data to {output_file}...")
ine.export_csv(varcd, output_file)
print("Done!")For concurrent requests and non-blocking I/O, use the AsyncINE client:
import asyncio
from pyptine import AsyncINE
async def main():
async with AsyncINE(language="EN") as ine:
# Fetch single indicator
response = await ine.get_data("0004167")
df = response.to_dataframe()
print(df.head())
# Fetch multiple indicators concurrently
import asyncio
responses = await asyncio.gather(
ine.get_data("0004167"),
ine.get_data("0004127"),
ine.get_data("0008074")
)
# Stream large datasets
async for chunk in ine.get_all_data("0004127", chunk_size=40000):
df_chunk = chunk.to_dataframe()
print(f"Processing {len(df_chunk)} rows...")
asyncio.run(main())AsyncINE Features:
- Non-blocking I/O for faster concurrent requests
- Async iterator for memory-efficient pagination
- Same API as the synchronous
INEclient - Automatic connection pooling and retries
The pyptine CLI provides a convenient way to access INE data from your terminal, with rich formatting and progress indicators for a better user experience.
# Search for indicators related to "pib" (GDP in Portuguese)
pyptine search "pib"
# Get detailed information about a specific indicator
pyptine info 0004127
# Download data for an indicator to a CSV file (with progress bar)
pyptine download 0004127 --output data.csv
# Download data and filter by dimensions
pyptine download 0004167 --output filtered_data.csv -d Dim1=S7A2023 -d Dim2=PT
# List all available statistical themes (in formatted table)
pyptine list-commands themes
# List all indicators (with pagination support)
pyptine list-commands indicators --limit 50
# View available dimensions for an indicator
pyptine dimensions 0004167
# Clear the local cache
pyptine cache clearCLI Features:
- Rich Formatting - Tables, panels, and colored output for better readability
- Progress Indicators - Spinners and progress bars for long-running operations
- Error Handling - Centralized, user-friendly error messages with context
- Better Organization - Data displayed in well-formatted tables rather than plain text
The INE class is the main entry point.
from pyptine import INE
from pathlib import Path
# Default client (language='EN', caching=True)
ine = INE()
# Client with Portuguese language
ine_pt = INE(language="PT")
# Disable caching
ine_no_cache = INE(cache=False)
# Use a custom cache directory
ine_custom_cache = INE(cache_dir=Path("/path/to/custom/cache"))You can search for indicators by keyword and filter by theme or sub-theme.
# Basic search
results = ine.search("unemployment rate")
# Search within a specific theme
results = ine.search("employment", theme="Labour market")Retrieve detailed information about an indicator, including its dimensions.
metadata = ine.get_metadata("0004167")
print(f"Title: {metadata.title}")
print(f"Unit: {metadata.unit}")
print(f"Source: {metadata.source}")
# List available dimensions
dimensions = ine.get_dimensions("0004167")
for dim in dimensions:
print(f"\nDimension: {dim.name}")
for value in dim.values[:5]: # Show first 5 values
print(f"- {value.code}: {value.label}")The get_data method returns a DataResponse object, which can be easily converted to different formats.
response = ine.get_data("0004127")
# Convert to pandas DataFrame
df = response.to_dataframe()
# Convert to a dictionary
data_dict = response.to_dict()
# Get data as a JSON string
json_str = response.to_json()Use the dimensions parameter to filter data before downloading.
# Get data for the year 2023 and region "Portugal"
# Note: Dimension values use specific codes (e.g., 'S7A2023' for year 2023)
filtered_response = ine.get_data(
"0004167",
dimensions={
"Dim1": "S7A2023", # Year 2023
"Dim2": "PT" # Geographic region 'Portugal'
}
)
df_filtered = filtered_response.to_dataframe()You can export data directly to CSV or JSON files.
# Export to CSV
ine.export_csv("0004127", "output.csv")
# Export to JSON with pretty printing
ine.export_json("0004127", "output.json", pretty=True)
# Export filtered data
ine.export_csv(
"0004167",
"filtered_output.csv",
dimensions={"Dim1": "S7A2023"}
)For large datasets that exceed the default 40,000 data point limit, use the get_all_data() method which automatically handles pagination:
from pyptine.client.data import DataClient
client = DataClient(language="EN")
# Fetch data in chunks (default chunk_size=40,000)
for chunk in client.get_all_data("0004127"):
df = chunk.to_dataframe()
print(f"Processed {len(df)} rows")
# Process each chunk
# Custom chunk size
for chunk in client.get_all_data("0004127", chunk_size=5000):
# Process smaller chunks
pass
# Combine all chunks into a single dataset
all_chunks = list(client.get_all_data("0004127"))
all_data = [point for chunk in all_chunks for point in chunk.data]Create interactive visualizations directly from indicator data without exporting to DataFrame:
# Get data and create interactive line chart
response = ine.get_data("0004127")
fig = response.plot(chart_type="line")
fig.show()
# Different chart types
fig_bar = response.plot_bar()
fig_area = response.plot_area()
fig_scatter = response.plot_scatter()
# Customize visualization
fig = response.plot_line(
markers=True,
x_column="Period",
y_column="value"
)
# Color by dimensions (if data has dimension columns)
fig = response.plot_line(color_column="region")
# Save to HTML for sharing
fig.write_html("indicator_plot.html")
# Further customization with plotly
fig.update_layout(height=600, width=1200, title="Custom Title")
fig.show()Available Visualization Methods:
plot(chart_type)- Generic plot with selectable chart typeplot_line()- Interactive line chart with optional markersplot_bar()- Bar chart for categorical comparisonplot_area()- Stacked area chart for trendsplot_scatter()- Scatter plot with optional size and color dimensions
All methods support:
- Interactive plotly charts with hover, zoom, and pan
- Custom column selection for x/y axes
- Color coding by dimension columns
- Export to HTML, PNG, or other formats
Perform statistical calculations on indicator data directly within the library:
# Get data and calculate year-over-year growth
response = ine.get_data("0004127")
yoy_response = response.calculate_yoy_growth()
df_yoy = yoy_response.to_dataframe()
print(df_yoy[['Period', 'value', 'yoy_growth']])
# Calculate month-over-month changes
mom_response = response.calculate_mom_change()
df_mom = mom_response.to_dataframe()
# Calculate simple moving average (3-period)
ma_response = response.calculate_moving_average(window=3)
df_ma = ma_response.to_dataframe()
# Calculate exponential moving average
ema_response = response.calculate_exponential_moving_average(span=5)
df_ema = ema_response.to_dataframe()
# Chain multiple analyses
result = response.calculate_yoy_growth().calculate_moving_average(window=2)
df = result.to_dataframe()
print(df[['Period', 'value', 'yoy_growth', 'moving_avg']])Available analysis methods on DataResponse:
calculate_yoy_growth()- Year-over-year percentage changecalculate_mom_change()- Month-over-month percentage changecalculate_moving_average(window)- Simple moving averagecalculate_exponential_moving_average(span)- Exponential weighted moving average
All methods support custom value_column and period_column parameters to work with different data structures.
The main class for interacting with the INE API.
INE(language: str = "EN", cache: bool = True, cache_dir: Optional[Path] = None, cache_ttl: int = 86400)
| Method | Description |
|---|---|
search(query, ...) |
Search for indicators. |
get_data(varcd, ...) |
Get data for an indicator as a DataResponse object. |
get_metadata(varcd) |
Get detailed metadata for an indicator. |
get_dimensions(varcd) |
Get available dimensions for an indicator. |
get_indicator(varcd) |
Get catalogue information for a single indicator. |
validate_indicator(varcd) |
Check if an indicator code is valid. |
list_themes() |
Get a list of all available themes. |
export_csv(varcd, ...) |
Export indicator data to a CSV file. |
export_json(varcd, ...) |
Export indicator data to a JSON file. |
clear_cache() |
Clear all cached data. |
get_cache_info() |
Get statistics about the cache. |
- PyPI Package: https://pypi.org/project/pyptine/
- GitHub Repository: https://github.com/randsley/pyptine
- Issue Tracker: https://github.com/randsley/pyptine/issues
- INE Portal: https://www.ine.pt/
To set up the development environment:
# Clone the repository
git clone https://github.com/nigelrandsley/pyptine.git
cd pyptine
# Install in editable mode with development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks to ensure code quality
pre-commit install# Run all tests
pytest
# Run tests with coverage report
pytest --cov=src/pyptine --cov-report=term-missingThis project uses black for formatting, ruff for linting, and mypy for type checking.
# Format code
black src/ tests/
# Lint code
ruff check src/ tests/
# Type check
mypy src/Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/amazing-feature). - Commit your changes (
git commit -m 'Add amazing feature'). - Push to the branch (
git push origin feature/amazing-feature). - Open a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.