Formula Intelligence is a production-grade system for parsing, analyzing, and visualizing complex Excel spreadsheet dependencies in Zero-Based Costing (ZBC) models. It handles 50+ sheets with 100k+ formulas, resolves dynamic references, builds dependency graphs, and detects anomalies.
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React + D3.js) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ File Upload │ │ Graph Viewer │ │ Anomaly List │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│ REST API
▼
┌─────────────────────────────────────────────────────────────┐
│ Backend (FastAPI + Python) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Routes │ │ Services │ │ Models │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│
┌─────────────────────────────────────────────────────────────┐
│ Core Processing Engine (Python) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Parser │ │ Dependency │ │ DAG Builder │ │
│ │ │ │ Resolver │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Anomaly │ │ Cost Driver │ │ Cache │ │
│ │ Detector │ │ Analyzer │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│ PyO3 Bindings
▼
┌─────────────────────────────────────────────────────────────┐
│ Excel Reader (Rust + Calamine) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ High-Performance Streaming Excel Parser │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
formula-intelligence/
├── backend/
│ ├── rust_reader/ # Rust Excel reader with PyO3
│ │ ├── src/
│ │ │ └── lib.rs
│ │ ├── Cargo.toml
│ │ └── pyproject.toml
│ ├── app/
│ │ ├── api/ # FastAPI routes
│ │ │ ├── __init__.py
│ │ │ ├── routes.py
│ │ │ └── dependencies.py
│ │ ├── core/ # Core processing engine
│ │ │ ├── __init__.py
│ │ │ ├── parser.py # Formula tokenization
│ │ │ ├── dependency_resolver.py
│ │ │ ├── dag_builder.py
│ │ │ ├── anomaly_detector.py
│ │ │ ├── cost_driver_analyzer.py
│ │ │ └── dynamic_resolver.py
│ │ ├── models/ # Pydantic models
│ │ │ ├── __init__.py
│ │ │ ├── schemas.py
│ │ │ └── graph_models.py
│ │ ├── services/ # Business logic
│ │ │ ├── __init__.py
│ │ │ ├── analysis_service.py
│ │ │ └── cache_service.py
│ │ ├── utils/
│ │ │ ├── __init__.py
│ │ │ ├── logger.py
│ │ │ └── config.py
│ │ └── main.py
│ ├── tests/
│ │ ├── test_parser.py
│ │ ├── test_resolver.py
│ │ └── test_api.py
│ ├── requirements.txt
│ ├── Dockerfile
│ └── pytest.ini
├── frontend/
│ ├── public/
│ ├── src/
│ │ ├── components/
│ │ │ ├── FileUpload.jsx
│ │ │ ├── SankeyDiagram.jsx
│ │ │ ├── ForceDirectedGraph.jsx
│ │ │ ├── AnomalyList.jsx
│ │ │ └── MetricsDashboard.jsx
│ │ ├── services/
│ │ │ └── api.js
│ │ ├── hooks/
│ │ │ └── useGraphData.js
│ │ ├── utils/
│ │ │ └── graphHelpers.js
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ ├── vite.config.js
│ └── Dockerfile
├── docker-compose.yml
├── Makefile
├── .pre-commit-config.yaml
├── .github/
│ └── workflows/
│ └── ci.yml
└── README.md
- Rust: Calamine (Excel parsing), PyO3 (Python bindings)
- Python 3.11+: Core processing
- FastAPI: REST API framework
- NetworkX/igraph: Graph algorithms
- PyTorch: Graph Neural Networks (anomaly detection)
- Redis: Caching layer
- Pydantic: Data validation
- React 18: UI framework
- Vite: Build tool
- D3.js: Data visualization
- Axios: HTTP client
- Zustand: State management
- TailwindCSS: Styling
- Docker & Docker Compose: Containerization
- pytest: Testing
- GitHub Actions: CI/CD
- pre-commit: Code quality hooks
- Docker & Docker Compose
- Python 3.11+
- Node.js 18+
- Rust 1.70+ (for development)
# Clone the repository
git clone <repo-url>
cd formula-intelligence
# Start all services
make up
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docscd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txt
# Build Rust extension
cd rust_reader
maturin develop --release
cd ..
# Run backend
uvicorn app.main:app --reload --port 8000cd frontend
# Install dependencies
npm install
# Run development server
npm run dev- Rust-based reader: 10x faster than openpyxl
- Streaming: O(1) memory for large files
- Parallel processing: Multi-core sheet parsing
- Static references: A1, Sheet2!B5, Named ranges
- Dynamic references: INDIRECT, OFFSET, INDEX
- Cross-sheet: Full workbook graph
- Array formulas: Spill ranges
- DAG construction: Directed Acyclic Graph of dependencies
- Cycle detection: Identify circular references
- Cost driver identification: Betweenness centrality
- Clustering: Semantic grouping by sheet/department
- Hard-coded overwrites: Formula cells replaced with values
- Broken references: #REF!, #NAME! errors
- Unused formulas: Dead logic detection
- Pattern deviation: GNN-based anomaly scoring
- Sankey diagrams: Cost flow visualization
- Force-directed graphs: Dependency networks
- Lazy loading: Render only visible nodes
- Zoom & pan: Explore large graphs
# Run all tests
make test
# Run with coverage
make test-coverage
# Run specific test file
pytest tests/test_parser.py -v| Metric | Value |
|---|---|
| Parse 500k rows | ~3.5s |
| Build dependency graph (100k formulas) | ~12s |
| Detect anomalies | ~2s |
| API response time (cached) | <100ms |
| Frontend render (10k nodes) | ~1.5s |
Edit backend/app/utils/config.py:
# Maximum file size (MB)
MAX_FILE_SIZE = 100
# Cache TTL (seconds)
CACHE_TTL = 3600
# Parallel workers
MAX_WORKERS = 8
# Graph rendering threshold
MAX_NODES_RENDER = 10000POST /api/v1/analyze
Content-Type: multipart/form-data
{
"file": <excel_file>
}
Response:
{
"job_id": "uuid",
"status": "processing"
}GET /api/v1/analysis/{job_id}
Response:
{
"graph": {...},
"anomalies": [...],
"cost_drivers": [...],
"metrics": {...}
}Full API documentation: http://localhost:8000/docs
# Build images
make build
# Start services
make up
# Stop services
make down
# View logs
make logs
# Restart services
make restartLogs are structured JSON format with correlation IDs:
{
"timestamp": "2026-01-26T16:30:00Z",
"level": "INFO",
"correlation_id": "abc-123",
"message": "Parsed 50 sheets in 3.2s",
"metadata": {
"sheets": 50,
"formulas": 125000
}
}- Fork the repository
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open Pull Request
MIT License - see LICENSE file
- Calamine Rust library
- NetworkX team
- FastAPI framework
- D3.js community