Team: Connor Watson, Stuart Holland, Francisco Munoz, Tony Gibbons
AI-driven forecasting for coffee & sugar futures to help Colombian traders optimize harvest sales.
Key Insight: Traders care about Coffee Price (USD) × COP/USD Rate, not just USD futures.
🤖 START HERE: CLAUDE.md
This is your primary entry point containing:
- Credential setup (AWS & Databricks)
- Development best practices
- Navigation to all key docs
- Current project state
- Quick reference for common tasks
Documentation Strategy: See docs/DOCUMENTATION_STRATEGY.md for our hierarchical documentation organization
# Project structure
ucberkeley-capstone/
├── README.md # Human entry point
├── CLAUDE.md # 🤖 AI agent entry point
├── docs/ # Core reference documentation
│ ├── DOCUMENTATION_STRATEGY.md # How we organize docs
│ ├── DATA_CONTRACTS.md # Database schemas (single source of truth)
│ ├── ARCHITECTURE.md # System architecture
│ ├── SECURITY.md # Credential management
│ └── EVALUATION_STRATEGY.md
├── research_agent/ # Data pipeline (Francisco)
├── forecast_agent/ # Time series forecasting (Connor)
├── trading_agent/ # Risk/trading signals (Tony)
└── data/ # Local snapshots (gitignored)Research → Forecast → Trading
(Francisco) (Connor) (Tony)
Research Agent: Creates commodity.silver.unified_data
- Lambda functions for data ingestion
- Bronze/Silver layers in Databricks
- See research_agent/README.md
Forecast Agent: Generates forecasts + distributions
- Time series models (SARIMAX, Prophet, XGBoost, ARIMA)
- Walk-forward evaluation framework
- See forecast_agent/README.md
Trading Agent: Risk management + signals
- VaR, CVaR metrics
- Position sizing recommendations
- See trading_agent/README.md
- Grain: (date, commodity, region)
- ~75k rows, 37 columns
- Market data + weather + macro + exchange rates
commodity.forecast.point_forecasts- 14-day forecasts with confidence intervalscommodity.forecast.distributions- 2,000 Monte Carlo paths for risk analysiscommodity.forecast.forecast_metadata- Model metadata and evaluation metrics
See docs/DATA_CONTRACTS.md for complete schemas.
Production Tables (Databricks):
- ✅ commodity.landing.* - Raw incremental data (6 tables)
- ✅ commodity.bronze.* - Deduplicated views (6 views)
- ✅ commodity.silver.unified_data - Joined dataset (~75k rows)
- ✅ commodity.forecast.distributions - 22,000 rows (9 models, Coffee)
- ✅ commodity.forecast.point_forecasts - Point forecasts with confidence intervals
Infrastructure:
- Lambda Functions deployed in us-west-2
- EventBridge daily triggers
- Databricks Unity Catalog
- Platform: Databricks (PySpark)
- Storage: Delta Lake
- Modeling: statsmodels, Prophet, XGBoost
- Infrastructure: AWS Lambda, EventBridge
- Local Testing: Parquet snapshots
Core Reference:
- CLAUDE.md - AI agent entry point
- docs/DOCUMENTATION_STRATEGY.md - How we organize docs
- docs/DATA_CONTRACTS.md - Database schemas
- docs/ARCHITECTURE.md - System architecture
- docs/SECURITY.md - Credential management
Agent-Specific:
Note: All documentation follows a hierarchical web-graph structure. See docs/DOCUMENTATION_STRATEGY.md for details.
Last Updated: 2025-01-11