Optimitron Data Sources Catalog
Complete directory of data needed across all packages, with sources, APIs, and estimated sizes.
🏛️ Government Spending & Budget
Dataset
Source
API?
Size
Frequency
Notes
Federal spending by category
USAspending.gov
✅ Free
~500 rows
Annual
Aggregated by budget function, not transaction-level
OECD government spending
OECD.Stat
✅ Free
~5K rows
Annual
38 countries × 12 categories × ~30 years
State/local spending (US)
Census Bureau
✅ Free
~10K rows
Annual
State-level by function
Military expenditure
SIPRI
CSV download
~10K rows
Annual
1949-present, all countries
IMF fiscal data
IMF Data
✅ Free
~20K rows
Annual
Government Finance Statistics
Dataset
Source
API?
Size
Frequency
Notes
GDP per capita (PPP)
World Bank WDI
✅ Free
~12K rows
Annual
All countries since 1960
Median household income
FRED
✅ Free (key)
~100 rows
Annual
US only; other countries via OECD
CPI / Inflation
FRED
✅ Free (key)
~1K rows
Monthly
Poverty rates
World Bank
✅ Free
~10K rows
Annual
$1.90, $3.20, $5.50/day thresholds
Employment / unemployment
BLS
✅ Free (key)
~5K rows
Monthly
US; OECD for international
Income inequality (Gini)
World Bank
✅ Free
~5K rows
Annual
Corruption Perception Index
Transparency Intl
CSV download
~2K rows
Annual
Economic Freedom Index
Heritage Foundation
CSV download
~3K rows
Annual
Dataset
Source
API?
Size
Frequency
Notes
Life expectancy
WHO GHO
✅ Free
~12K rows
Annual
All countries
Healthy life expectancy (HALE)
WHO GHO
✅ Free
~6K rows
Annual
Key welfare metric
Cause-of-death statistics
WHO GHO
✅ Free
~50K rows
Annual
By cause, country
Healthcare spending
WHO GHO
✅ Free
~12K rows
Annual
% GDP, per capita
Disease burden (DALYs)
IHME GBD
Download
~100K rows
Annual
Global Burden of Disease
Vaccination rates
WHO GHO
✅ Free
~20K rows
Annual
By vaccine, country
Smoking / alcohol / BMI
WHO GHO
✅ Free
~15K rows
Annual
Risk factors
US health surveys (BRFSS)
CDC
✅ Free
~50K rows
Annual
State-level health behaviors
💊 Drug/Treatment Outcomes (dFDA)
Dataset
Source
API?
Size
Frequency
Notes
FDA adverse event reports
openFDA
✅ Free
Huge (>15M)
Continuous
FAERS database
Drug labels / interactions
openFDA
✅ Free
~100K drugs
As updated
SPL labels
Clinical trial results
ClinicalTrials.gov
✅ Free
~400K trials
Continuous
Patient-reported outcomes
User-submitted
Custom
Varies
Continuous
Future: via Optimitron web app
Wearable/app health data
Various APIs
Varies
Varies
Continuous
Apple Health, Fitbit, etc.
🗳️ Political / Governance (RAPPA)
Dataset
Source
API?
Size
Frequency
Notes
US Congress votes
congress.gov
✅ Free (key)
~50K votes
Per session
Roll call votes
US Congress members
congress.gov
✅ Free (key)
~600 current
Per session
Bioguide IDs
Congress votes (alt)
ProPublica
✅ Free (key)
Same
Per session
Easier API
State legislatures
OpenStates
✅ Free (key)
~7K legislators
Per session
All 50 states
World governance indicators
World Bank
✅ Free
~3K rows
Annual
Rule of law, corruption, etc.
Election results
Various
Mixed
Varies
Per election
Would need country-specific sources
Citizen preferences
User-submitted
Custom
Varies
Continuous
Future: via RAPPA web app
Dataset
Source
API?
Size
Frequency
Notes
Tax rates by country
OECD Tax Database
CSV
~5K rows
Annual
Income, corporate, VAT
Education spending & outcomes
OECD PISA
CSV
~2K rows
Triennial
Test scores + spending
Environmental regulations
Yale EPI
CSV
~2K rows
Biennial
Environmental Performance Index
Drug policy (legalization dates)
Manual curation
❌
~200 rows
As changed
Cannabis, psychedelics, etc.
Policy change dates
Manual curation
❌
~1K rows
As changed
Key for causal analysis
📦 Mike's Existing Data (mikepsinn/economic-data)
65 CSVs in Gapminder format (countries × years), ~2.6 MB total:
Health (8 files): life expectancy, BMI, blood pressure, cholesterol, HIV deaths, suicide, smoking rates, alcohol, sugar intake
Economic (5 files): GDP per capita, national income, poverty rates (2 thresholds)
Energy (11 files): CO2, coal, electricity, oil, gas, nuclear, hydro — all per capita
Military (1 file): SIPRI data 1949-2022
Other: CPI, murder rates, population, country regions
→ Strategy: Git submodule or copy into repo (it's tiny)
Category
Strategy
Why
Mike's CSVs (2.6 MB)
In repo (submodule or copy)
Tiny, static, core reference data
Country metadata
In repo (JSON/CSV)
Small, rarely changes
Policy change dates
In repo (curated JSON)
Small, needs human curation
OECD, World Bank, WHO
API at runtime + cache
Large, frequently updated, free APIs
FRED, BLS, Census
API at runtime + cache
US-specific, needs API keys
USAspending
API at runtime
Only need aggregates, not 100GB bulk
Congress/OpenStates
API at runtime + cache
Updates per legislative session
openFDA, ClinicalTrials
API at runtime
Huge datasets, query as needed
User-submitted data
Database (Prisma)
Collected via web app