Formula Intelligence is a production-grade system for parsing, analyzing, and visualizing complex Excel spreadsheet dependencies in Zero-Based Costing (ZBC) models. It handles 50+ sheets with 100k+ formulas, resolves dynamic references, builds dependency graphs, and detects anomalies.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React + D3.js) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β File Upload β β Graph Viewer β β Anomaly List β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β REST API
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI + Python) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Routes β β Services β β Models β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Core Processing Engine (Python) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Parser β β Dependency β β DAG Builder β β
β β β β Resolver β β β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Anomaly β β Cost Driver β β Cache β β
β β Detector β β Analyzer β β β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β PyO3 Bindings
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Excel Reader (Rust + Calamine) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β High-Performance Streaming Excel Parser β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
formula-intelligence/
βββ backend/
β βββ rust_reader/ # Rust Excel reader with PyO3
β β βββ src/
β β β βββ lib.rs
β β βββ Cargo.toml
β β βββ pyproject.toml
β βββ app/
β β βββ api/ # FastAPI routes
β β β βββ __init__.py
β β β βββ routes.py
β β β βββ dependencies.py
β β βββ core/ # Core processing engine
β β β βββ __init__.py
β β β βββ parser.py # Formula tokenization
β β β βββ dependency_resolver.py
β β β βββ dag_builder.py
β β β βββ anomaly_detector.py
β β β βββ cost_driver_analyzer.py
β β β βββ dynamic_resolver.py
β β βββ models/ # Pydantic models
β β β βββ __init__.py
β β β βββ schemas.py
β β β βββ graph_models.py
β β βββ services/ # Business logic
β β β βββ __init__.py
β β β βββ analysis_service.py
β β β βββ cache_service.py
β β βββ utils/
β β β βββ __init__.py
β β β βββ logger.py
β β β βββ config.py
β β βββ main.py
β βββ tests/
β β βββ test_parser.py
β β βββ test_resolver.py
β β βββ test_api.py
β βββ requirements.txt
β βββ Dockerfile
β βββ pytest.ini
βββ frontend/
β βββ public/
β βββ src/
β β βββ components/
β β β βββ FileUpload.jsx
β β β βββ SankeyDiagram.jsx
β β β βββ ForceDirectedGraph.jsx
β β β βββ AnomalyList.jsx
β β β βββ MetricsDashboard.jsx
β β βββ services/
β β β βββ api.js
β β βββ hooks/
β β β βββ useGraphData.js
β β βββ utils/
β β β βββ graphHelpers.js
β β βββ App.jsx
β β βββ main.jsx
β βββ package.json
β βββ vite.config.js
β βββ Dockerfile
βββ docker-compose.yml
βββ Makefile
βββ .pre-commit-config.yaml
βββ .github/
β βββ workflows/
β βββ ci.yml
βββ README.md
- Rust: Calamine (Excel parsing), PyO3 (Python bindings)
- Python 3.11+: Core processing
- FastAPI: REST API framework
- NetworkX/igraph: Graph algorithms
- PyTorch: Graph Neural Networks (anomaly detection)
- Redis: Caching layer
- Pydantic: Data validation
- React 18: UI framework
- Vite: Build tool
- D3.js: Data visualization
- Axios: HTTP client
- Zustand: State management
- TailwindCSS: Styling
- Docker & Docker Compose: Containerization
- pytest: Testing
- GitHub Actions: CI/CD
- pre-commit: Code quality hooks
- Docker & Docker Compose
- Python 3.11+
- Node.js 18+
- Rust 1.70+ (for development)
# Clone the repository
git clone <repo-url>
cd formula-intelligence
# Start all services
make up
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docscd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txt
# Build Rust extension
cd rust_reader
maturin develop --release
cd ..
# Run backend
uvicorn app.main:app --reload --port 8000cd frontend
# Install dependencies
npm install
# Run development server
npm run dev- Rust-based reader: 10x faster than openpyxl
- Streaming: O(1) memory for large files
- Parallel processing: Multi-core sheet parsing
- Static references: A1, Sheet2!B5, Named ranges
- Dynamic references: INDIRECT, OFFSET, INDEX
- Cross-sheet: Full workbook graph
- Array formulas: Spill ranges
- DAG construction: Directed Acyclic Graph of dependencies
- Cycle detection: Identify circular references
- Cost driver identification: Betweenness centrality
- Clustering: Semantic grouping by sheet/department
- Hard-coded overwrites: Formula cells replaced with values
- Broken references: #REF!, #NAME! errors
- Unused formulas: Dead logic detection
- Pattern deviation: GNN-based anomaly scoring
- Sankey diagrams: Cost flow visualization
- Force-directed graphs: Dependency networks
- Lazy loading: Render only visible nodes
- Zoom & pan: Explore large graphs
# Run all tests
make test
# Run with coverage
make test-coverage
# Run specific test file
pytest tests/test_parser.py -v| Metric | Value |
|---|---|
| Parse 500k rows | ~3.5s |
| Build dependency graph (100k formulas) | ~12s |
| Detect anomalies | ~2s |
| API response time (cached) | <100ms |
| Frontend render (10k nodes) | ~1.5s |
Edit backend/app/utils/config.py:
# Maximum file size (MB)
MAX_FILE_SIZE = 100
# Cache TTL (seconds)
CACHE_TTL = 3600
# Parallel workers
MAX_WORKERS = 8
# Graph rendering threshold
MAX_NODES_RENDER = 10000POST /api/v1/analyze
Content-Type: multipart/form-data
{
"file": <excel_file>
}
Response:
{
"job_id": "uuid",
"status": "processing"
}GET /api/v1/analysis/{job_id}
Response:
{
"graph": {...},
"anomalies": [...],
"cost_drivers": [...],
"metrics": {...}
}Full API documentation: http://localhost:8000/docs
# Build images
make build
# Start services
make up
# Stop services
make down
# View logs
make logs
# Restart services
make restartLogs are structured JSON format with correlation IDs:
{
"timestamp": "2026-01-26T16:30:00Z",
"level": "INFO",
"correlation_id": "abc-123",
"message": "Parsed 50 sheets in 3.2s",
"metadata": {
"sheets": 50,
"formulas": 125000
}
}- Fork the repository
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open Pull Request
MIT License - see LICENSE file
- Calamine Rust library
- NetworkX team
- FastAPI framework
- D3.js community