Skip to content

Latest commit

Β 

History

History
369 lines (297 loc) Β· 11.5 KB

File metadata and controls

369 lines (297 loc) Β· 11.5 KB

Formula Intelligence - Production System

🎯 Overview

Formula Intelligence is a production-grade system for parsing, analyzing, and visualizing complex Excel spreadsheet dependencies in Zero-Based Costing (ZBC) models. It handles 50+ sheets with 100k+ formulas, resolves dynamic references, builds dependency graphs, and detects anomalies.

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Frontend (React + D3.js)                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ File Upload  β”‚  β”‚ Graph Viewer β”‚  β”‚ Anomaly List β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–²
                            β”‚ REST API
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Backend (FastAPI + Python)                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Routes     β”‚  β”‚   Services   β”‚  β”‚   Models     β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–²
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Core Processing Engine (Python)                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Parser     β”‚  β”‚  Dependency  β”‚  β”‚ DAG Builder  β”‚     β”‚
β”‚  β”‚              β”‚  β”‚  Resolver    β”‚  β”‚              β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Anomaly    β”‚  β”‚ Cost Driver  β”‚  β”‚    Cache     β”‚     β”‚
β”‚  β”‚   Detector   β”‚  β”‚  Analyzer    β”‚  β”‚              β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–²
                            β”‚ PyO3 Bindings
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Excel Reader (Rust + Calamine)                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  High-Performance Streaming Excel Parser         β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

formula-intelligence/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ rust_reader/              # Rust Excel reader with PyO3
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   └── lib.rs
β”‚   β”‚   β”œβ”€β”€ Cargo.toml
β”‚   β”‚   └── pyproject.toml
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/                  # FastAPI routes
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ routes.py
β”‚   β”‚   β”‚   └── dependencies.py
β”‚   β”‚   β”œβ”€β”€ core/                 # Core processing engine
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ parser.py         # Formula tokenization
β”‚   β”‚   β”‚   β”œβ”€β”€ dependency_resolver.py
β”‚   β”‚   β”‚   β”œβ”€β”€ dag_builder.py
β”‚   β”‚   β”‚   β”œβ”€β”€ anomaly_detector.py
β”‚   β”‚   β”‚   β”œβ”€β”€ cost_driver_analyzer.py
β”‚   β”‚   β”‚   └── dynamic_resolver.py
β”‚   β”‚   β”œβ”€β”€ models/               # Pydantic models
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ schemas.py
β”‚   β”‚   β”‚   └── graph_models.py
β”‚   β”‚   β”œβ”€β”€ services/             # Business logic
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ analysis_service.py
β”‚   β”‚   β”‚   └── cache_service.py
β”‚   β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ logger.py
β”‚   β”‚   β”‚   └── config.py
β”‚   β”‚   └── main.py
β”‚   β”œβ”€β”€ tests/
β”‚   β”‚   β”œβ”€β”€ test_parser.py
β”‚   β”‚   β”œβ”€β”€ test_resolver.py
β”‚   β”‚   └── test_api.py
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── pytest.ini
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ public/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ FileUpload.jsx
β”‚   β”‚   β”‚   β”œβ”€β”€ SankeyDiagram.jsx
β”‚   β”‚   β”‚   β”œβ”€β”€ ForceDirectedGraph.jsx
β”‚   β”‚   β”‚   β”œβ”€β”€ AnomalyList.jsx
β”‚   β”‚   β”‚   └── MetricsDashboard.jsx
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   └── api.js
β”‚   β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   β”‚   └── useGraphData.js
β”‚   β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”‚   └── graphHelpers.js
β”‚   β”‚   β”œβ”€β”€ App.jsx
β”‚   β”‚   └── main.jsx
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ vite.config.js
β”‚   └── Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile
β”œβ”€β”€ .pre-commit-config.yaml
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml
└── README.md

πŸš€ Tech Stack

Backend

  • Rust: Calamine (Excel parsing), PyO3 (Python bindings)
  • Python 3.11+: Core processing
  • FastAPI: REST API framework
  • NetworkX/igraph: Graph algorithms
  • PyTorch: Graph Neural Networks (anomaly detection)
  • Redis: Caching layer
  • Pydantic: Data validation

Frontend

  • React 18: UI framework
  • Vite: Build tool
  • D3.js: Data visualization
  • Axios: HTTP client
  • Zustand: State management
  • TailwindCSS: Styling

DevOps

  • Docker & Docker Compose: Containerization
  • pytest: Testing
  • GitHub Actions: CI/CD
  • pre-commit: Code quality hooks

πŸ› οΈ Setup & Installation

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+
  • Node.js 18+
  • Rust 1.70+ (for development)

Quick Start with Docker

# Clone the repository
git clone <repo-url>
cd formula-intelligence

# Start all services
make up

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Local Development Setup

Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

# Build Rust extension
cd rust_reader
maturin develop --release
cd ..

# Run backend
uvicorn app.main:app --reload --port 8000

Frontend Setup

cd frontend

# Install dependencies
npm install

# Run development server
npm run dev

πŸ“Š Key Features

1. High-Performance Excel Parsing

  • Rust-based reader: 10x faster than openpyxl
  • Streaming: O(1) memory for large files
  • Parallel processing: Multi-core sheet parsing

2. Comprehensive Dependency Resolution

  • Static references: A1, Sheet2!B5, Named ranges
  • Dynamic references: INDIRECT, OFFSET, INDEX
  • Cross-sheet: Full workbook graph
  • Array formulas: Spill ranges

3. Graph Analysis

  • DAG construction: Directed Acyclic Graph of dependencies
  • Cycle detection: Identify circular references
  • Cost driver identification: Betweenness centrality
  • Clustering: Semantic grouping by sheet/department

4. Anomaly Detection

  • Hard-coded overwrites: Formula cells replaced with values
  • Broken references: #REF!, #NAME! errors
  • Unused formulas: Dead logic detection
  • Pattern deviation: GNN-based anomaly scoring

5. Interactive Visualization

  • Sankey diagrams: Cost flow visualization
  • Force-directed graphs: Dependency networks
  • Lazy loading: Render only visible nodes
  • Zoom & pan: Explore large graphs

πŸ§ͺ Testing

# Run all tests
make test

# Run with coverage
make test-coverage

# Run specific test file
pytest tests/test_parser.py -v

πŸ“ˆ Performance Benchmarks

Metric Value
Parse 500k rows ~3.5s
Build dependency graph (100k formulas) ~12s
Detect anomalies ~2s
API response time (cached) <100ms
Frontend render (10k nodes) ~1.5s

πŸ”§ Configuration

Edit backend/app/utils/config.py:

# Maximum file size (MB)
MAX_FILE_SIZE = 100

# Cache TTL (seconds)
CACHE_TTL = 3600

# Parallel workers
MAX_WORKERS = 8

# Graph rendering threshold
MAX_NODES_RENDER = 10000

πŸ“ API Documentation

Upload & Analyze

POST /api/v1/analyze
Content-Type: multipart/form-data

{
  "file": <excel_file>
}

Response:
{
  "job_id": "uuid",
  "status": "processing"
}

Get Analysis Results

GET /api/v1/analysis/{job_id}

Response:
{
  "graph": {...},
  "anomalies": [...],
  "cost_drivers": [...],
  "metrics": {...}
}

Full API documentation: http://localhost:8000/docs

🐳 Docker Commands

# Build images
make build

# Start services
make up

# Stop services
make down

# View logs
make logs

# Restart services
make restart

πŸ” Monitoring & Logging

Logs are structured JSON format with correlation IDs:

{
  "timestamp": "2026-01-26T16:30:00Z",
  "level": "INFO",
  "correlation_id": "abc-123",
  "message": "Parsed 50 sheets in 3.2s",
  "metadata": {
    "sheets": 50,
    "formulas": 125000
  }
}

🀝 Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing)
  5. Open Pull Request

πŸ“„ License

MIT License - see LICENSE file

πŸ™ Acknowledgments

  • Calamine Rust library
  • NetworkX team
  • FastAPI framework
  • D3.js community