Advanced Automated Survey Estimation & Reporting Platform
StatFlow AI is a production-ready web application designed to automate the lifecycle of official survey statistics. It streamlines the complex process of ingestion, cleaning, weighted estimation, and report generation into a unified, secure, and user-friendly interface.
- Multi-Format Support: Ingests CSV, Excel (.xlsx), SPSS (.sav), and SAS (.sas7bdat) files via
pyreadstat. - Smart Batching: Drag-and-drop up to 5 files simultaneously with folder-rejection logic.
- Schema Mapping: Interactive UI to map inconsistent headers to standard official schemas.
- Granular Duplicate Detection: Identifies specific row indices and exact content matches.
- Automated Cleaning: Missing value imputation (Mean/Median/Drop) and Outlier removal (IQR Method).
- Rule-Based Validation: Enforces logical constraints (e.g., Age > 0, Income >= 0).
- In-Browser Pivot Tables: dynamic grouping, aggregation (Sum, Mean, Count), and summarization.
- Rigorous Estimation: Calculates Weighted Means, Standard Errors (SE), Margins of Error (MOE), and 95% Confidence Intervals.
- Interactive Visuals: Real-time distribution charts and outlier boxplots using Chart.js.
- AI Executive Summary: Auto-generates natural language insights from statistical tables.
- Government-Style Reports: One-click download of standardized PDF/HTML reports.
| Component | Technology |
|---|---|
| Frontend | HTML5, CSS3 (Glassmorphism), Vanilla JS, Chart.js, Lucide Icons |
| Backend | Python (Flask), Flask-Session, Gunicorn |
| Data Core | Pandas, NumPy, Pyreadstat (SAS/SPSS), OpenPyXL |
| Security | Werkzeug Security, Role-Based Access Control (RBAC) |
| Deployment | Vercel / Render (Serverless compatible) |
git clone [https://github.com/YOUR_USERNAME/statflow-ai.git](https://github.com/YOUR_USERNAME/statflow-ai.git)
cd statflow-ai
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate
pip install -r requirements.txt
python app.py
Access the portal at: http://127.0.0.1:5000
- User:
admin - Pass:
admin123
- Fork this repo.
- Create a New Web Service on Render.
- Connect your repo.
- Build Command:
pip install -r requirements.txt - Start Command:
gunicorn app:app
- Import project to Vercel.
- Framework Preset: Other.
- Deploy (The
vercel.jsonis pre-configured for Python).
The estimation engine uses the variance formula for weighted survey data:
Where:
- = Survey Design Weight
- = Variable of Interest
- = Sample Size
- Team Members: Alan S, Hussain Mustafa Ali, Janani E
Built for Statathon 2025