A comprehensive Data Science project analyzing historical air quality data from Delhi (2023-2025) to identify seasonal pollution trends, forecast future AQI levels, and recommend data-driven policy interventions.
Air pollution in major Indian cities follows a predictable but severe seasonal cycle. This project aims to:
- Analyze historical AQI data to visualize the "Winter Smog" phenomenon.
- Forecast future pollution levels using Time Series models (ARIMA vs. Holt-Winters).
- Visualize live trends using an interactive Streamlit dashboard.
- Suggest actionable policy interventions based on predictive data.
- Severe Winter Spikes: AQI consistently crosses 350+ during November-January due to temperature inversion and stubble burning.
- Monsoon Relief: Air quality improves significantly (AQI < 60) during July-September.
- Model Selection: While Holt-Winters visually captured the seasonal peaks better, ARIMA proved to be statistically more stable for general trend forecasting.
- Data Engineering: Python, Pandas, Open-Meteo API (Real-time data fetching)
- Visualization: Matplotlib (Static Reports), Plotly & Streamlit (Interactive Dashboard)
- Machine Learning & Forecasting:
- ARIMA: AutoRegressive Integrated Moving Average (Trend-focused).
- Holt-Winters: Exponential Smoothing (Seasonality-focused).
AQI_Analysis/
│
├── fetch_data_v2.py # Data Pipeline: Fetches real historical data from Open-Meteo API
├── analyze_forecast.py # Analysis: Performs EDA and trains the ARIMA model
├── compare_models.py # Evaluation: Compares ARIMA vs Holt-Winters (RMSE scores)
├── dashboard.py # UI: Interactive Web Dashboard (Streamlit + Plotly)
├── requirements.txt # Project Dependencies
├── README.md # Documentation & Policy Report
│
└── (Generated Output)
├── india_aqi_data.csv # The dataset
├── aqi_trend.png # Static Trend Graph
├── aqi_forecast.png # Static Forecast Graph
└── model_comparison.png # Model Comparison Graph
git clone [https://github.com/yourusername/aqi-forecasting-project.git](https://github.com/yourusername/aqi-forecasting-project.git)
cd aqi-forecasting-project
pip install -r requirements.txtPull the latest historical data for Delhi:
python fetch_data_v2.pyGenerate static graphs for reports (Trend & Forecast):
python analyze_forecast.pyRun the competition between ARIMA and Seasonal Holt-Winters:
python compare_models.pyOpen the web interface to explore the data:
streamlit run dashboard.py| Model | RMSE Score | Strength |
|---|---|---|
| ARIMA | Lower (Better) | Excellent at following the general yearly trend without overreacting to noise. |
| Holt-Winters | Higher | Better at capturing the extreme volatility of winter smog spikes. |
Technical Note: We prioritized Holt-Winters (Exponential Smoothing) over Facebook Prophet for the seasonal component to ensure lightweight deployment and avoid C++ dependency issues on Windows environments.
Based on our time-series analysis identifying severe winter spikes (AQI 350+), we recommend:
- Automated GRAP Enforcement: Trigger the Graded Response Action Plan (GRAP) automatically when the model forecasts AQI > 300 for 3 consecutive days.
- Smart Odd-Even Rule: Instead of random dates, apply the Odd-Even vehicle rule only during the "Red Zone" weeks identified by the Holt-Winters model.
- Stubble Burning Subsidies: Focus financial aid for "Happy Seeder" machines specifically in October-November to prevent the initial winter spike.
- School Timings: Shift school start times to 10:00 AM or switch to online classes when the ARIMA model predicts morning smog.
- Early Warning System: Issue health advisories 48 hours in advance based on model predictions, allowing hospitals to prepare for respiratory cases.
Dev Pandey
- Role: Software Engineer
This project is open-source and available for educational purposes.