Feel free to use, but will likely run into issues.
High-performance Data Ingestion Pipeline for B3 Market Data
For MacBook/Local Setup:
# See comprehensive setup guide
→ [QUICKSTART_SETUP.md](QUICKSTART_SETUP.md) - 5 minute setup
→ [SETUP_LOCAL.md](SETUP_LOCAL.md) - Detailed local setup guide
# Quick version:
python3 -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
source .env.supabase # After adding your password
python scripts/verify_database.py
python scripts/ingest_cotahist_data.py --date 2024-12-20For GitHub Actions Automation:
→ [SETUP_GITHUB_SECRETS.md](SETUP_GITHUB_SECRETS.md) - GitHub secrets configuration
# Quick version:
gh secret set DATABASE_URL -R PedroDnT/pyb3
# Paste connection string when prompted# Activate environment
source venv/bin/activate
source .env.supabase
# Run ingestion scripts
python scripts/ingest_cotahist_data.py # Yesterday's stock data
python scripts/ingest_futures_data.py # Futures settlement
python scripts/ingest_yieldcurve_data.py # Yield curves
python scripts/ingest_indexes_data.py # Index values
python scripts/ingest_indexes_composition_data.py # Index holdingsimport pyb3
from pyb3.db.client import Pyb3Client
# Access data directly from database
# (See docs/guides/DATA_INGESTION_WORKFLOWS.md for details)Access Brazilian Stock Exchange (B3) data with:
- 📈 Stocks & Options - Historical prices, volume, 13 filter functions
- 📊 Futures - Settlement prices, maturity codes
- 💹 Yield Curves - BRL, IPCA, USD rates
- 📉 Indices - IBOV, composition, historical data
- 🏢 Funds - ETFs, FIIs, FIDCs, FIAGROs, BDRs
- 🔬 Analysis - Returns, volatility, quality checks
10-100x faster than pandas using Polars and PyArrow.
| Doc | Purpose |
|---|---|
| QUICKSTART.md | Get started in 5 minutes |
| API_REFERENCE.md | All functions and examples |
| DEPLOYMENT.md | Deploy the REST API |
| CONTRIBUTING.md | How to contribute |
| CHANGELOG.md | Version history |
import pyb3
import polars as pl
# Download data
pyb3.fetch_marketdata("b3-cotahist-yearly", year=2023)
# Query with filters
stocks = (
pyb3.cotahist_get("yearly")
.filter(pl.col("refdate").dt.year() == 2023)
.pipe(pyb3.cotahist_filter_equity)
.collect()
)# Get multiple stocks
portfolio = pyb3.cotahist_get_symbols(
symbols=['PETR4', 'VALE3', 'ITUB4'],
start_date='2023-01-01'
).collect()
# Calculate returns
returns = pyb3.calculate_returns(portfolio, return_type='log')
# Check correlation
corr_matrix = pyb3.create_returns_matrix(portfolio)# Download futures data
pyb3.fetch_marketdata("b3-futures-settlement-prices", refdate="2023-12-29")
# Get DI1 futures
futures = (
pyb3.futures_get()
.filter(pl.col("commodity") == "DI1")
.collect()
)
# Convert maturity codes
date = pyb3.maturitycode2date("F24") # → 2024-01-01# Download yield curve
pyb3.fetch_marketdata("b3-reference-rates", refdate="2023-12-29", curve_name="PRE")
# Get BRL rates
yc = (
pyb3.yc_brl_get()
.filter(pl.col("refdate") == "2023-12-29")
.collect()
)| Function | Purpose |
|---|---|
cotahist_get(type) |
Get all data (yearly/daily) |
cotahist_get_symbols(symbols, dates) |
Get specific stocks |
cotahist_filter_equity(df) |
Filter to stocks only |
cotahist_filter_options(df) |
Filter to options |
cotahist_filter_etf(df) |
Filter to ETFs |
cotahist_filter_fii(df) |
Filter to REITs |
cotahist_options_by_symbols_get(symbols) |
Get options with underlying prices |
11 more filter functions - See API_REFERENCE.md
| Function | Purpose |
|---|---|
futures_get() |
Get all futures data |
code2month(code) |
Convert maturity code to month |
maturitycode2date(code) |
Convert code to date |
| Function | Purpose |
|---|---|
yc_get() |
All curves |
yc_brl_get() |
BRL nominal rates |
yc_ipca_get() |
Real rates (inflation-linked) |
yc_usd_get() |
USD rates |
| Function | Purpose |
|---|---|
indexes_get() |
List available indices |
indexes_composition_get() |
Index composition |
indexes_current_portfolio_get() |
Current weights |
indexes_historical_data_get() |
Historical values |
| Function | Purpose |
|---|---|
calculate_returns(df) |
Calculate returns |
calculate_volatility(df) |
Calculate volatility |
create_returns_matrix(df) |
Returns matrix for correlation |
align_symbols(dfs) |
Align multiple stocks by date |
data_quality_check(df) |
Check for issues |
data_quality_fix(df) |
Fix missing data |
resample_ohlcv(df, freq) |
Aggregate to different frequency |
40+ functions total - See API_REFERENCE.md for details
pyb3/
├── src/
│ └── pyb3/ # Core library (installable package)
│ ├── core/ # Infrastructure (types, fields, templates)
│ ├── data/ # Data acquisition (downloaders, readers)
│ ├── api/ # Data access functions
│ ├── analysis/ # Analysis utilities
│ └── utils/ # Shared utilities
├── services/
│ └── rest-api/ # REST API microservice (FastAPI)
│ ├── app/ # Application code
│ ├── database/ # Migrations & scripts
│ └── tests/ # API tests
├── clients/
│ └── python/ # Python SDK client
├── examples/
│ ├── scripts/ # Example scripts
│ └── notebooks/ # Jupyter notebooks
├── demos/ # Demo applications
├── docs/ # Documentation
│ ├── guides/ # User guides
│ ├── api-reference/ # API reference
│ └── architecture/ # Architecture docs
├── tests/ # Library tests
└── scripts/ # Utility scripts
PyB3 includes a production-ready FastAPI server:
cd services/rest-api
make dev-docker # Start with Docker
# OR
uvicorn app.main:app --reloadFeatures:
- Authentication with API keys
- Usage-based billing
- Rate limiting
- Supabase PostgreSQL database (v2.0.0: simplified storage)
- Web dashboard
- Auto-generated docs
v2.0.0 Update: Simplified architecture - all data stored in PostgreSQL. No S3/R2 needed. v2.1.0 Update: Restructured codebase for better organization - see below.
→ Read DEPLOYMENT.md for full guide
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black pyb3/
# Check types
mypy pyb3/
# Run examples
python examples/03_test_cotahist.py→ Read CONTRIBUTING.md for guidelines
Note: v2.1.0 introduces a cleaner structure. See RESTRUCTURE_PLAN.md for migration details and KNOWN_LIMITATIONS.md for current limitations.
✅ Complete (40+ functions)
- Stock & options data (COTAHIST)
- Futures settlement prices
- Yield curves (BRL, IPCA, USD)
- Market indices
- Investment funds (ETFs, FIIs, etc)
- Analysis utilities
✅ Production Ready
- Type hints throughout
- Comprehensive tests
- Full documentation
- REST API included
- Python SDK included
✅ High Performance
- Lazy evaluation with Polars
- Columnar storage (Parquet)
- 10-100x faster than pandas
- Efficient memory usage
- Python 3.9+
- polars >= 0.19.0
- pyarrow >= 10.0.0
- requests >= 2.28.0
- pyyaml >= 6.0
From source:
git clone https://github.com/PedroDnT/pyb3.git
cd pyb3
pip install -e .PyPI (coming soon):
pip install pyb3Downloaded data is cached in ~/.pyb3_cache/:
~/.pyb3_cache/
├── downloads/ # Raw files from B3
├── input/ # Parsed Parquet files
└── metadata.db # Download tracking (SQLite)
PyB3 uses YAML templates to define data sources. 9 templates included:
b3-cotahist-yearly- Annual stock/options datab3-cotahist-daily- Daily stock/options datab3-futures-settlement-prices- Futuresb3-reference-rates- Yield curvesb3-indexes-*- 5 index templates
See pyb3/templates/ directory.
MIT License - see LICENSE file
Python port of rb3 R package by:
- Wilson Freitas (@wilsonfreitas)
- Marcelo Perlin (@msperlin)
- GitHub: https://github.com/PedroDnT/pyb3
- Issues: https://github.com/PedroDnT/pyb3/issues
- Original R package: https://github.com/ropensci/rb3
Made with ❤️ for the Brazilian financial data community