Skip to content

codemuggle09/AquaRisk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Groundwater Fluoride Prediction Using Machine Learning & Fuzzy Logic

A data-driven, intelligent, and scalable framework to analyze groundwater fluoride contamination across India using Machine Learning, Regression Models, and a Fuzzy Inference System (FIS). This system supports early detection of fluoride-vulnerable regions and helps government agencies and water-resource managers make informed decisions.


Project Objectives

  • Large-Scale Analysis: Evaluates over 16,776 groundwater samples across various Indian states and districts.
  • Predictive Modeling: Utilizes advanced regression techniques to estimate precise fluoride concentrations.
  • Automated Classification: Categorizes water quality into Safe, Moderate, and High-risk zones using optimized classifiers.
  • Interpretability: Employs Mamdani Fuzzy Logic to convert technical data into human-readable risk scores.
  • Spatial Visualization: Generates state-level heatmaps and regional analysis for spatial awareness.

Dataset Architecture

The dataset comprises physicochemical parameters that significantly influence fluoride mobility within aquifers.

Feature Category Parameters Included
Physicochemical pH, EC, TDS, Na⁺, Ca²⁺, Mg²⁺, K⁺, Cl⁻, SO₄²⁻, NO₃⁻, HCO₃⁻
Target Variable Fluoride concentration (mg/L)
Geospatial State and District identifiers

Data Preprocessing Pipeline

  1. Standardization: Normalization of column nomenclature (e.g., standardizing “EC µS/cm” to “EC”).
  2. Imputation: Conversion of invalid entries to null values followed by Median Imputation to maintain numeric stability.
  3. Risk Labeling: Implementation of WHO drinking water standards for classification.
    • Class 0 (< 1.5 mg/L): Safe
    • Class 1 (1.5–2.5 mg/L): Moderate Risk
    • Class 2 (> 2.5 mg/L): High Risk
  4. Scaling: Application of Min-Max scaling to a standard 0–1 range.
  5. Class Balancing: Utilization of SMOTE (Synthetic Minority Over-sampling Technique) to resolve dataset imbalances and achieve perfect class parity.

Machine Learning Performance

Seven distinct models were evaluated to determine the most effective classifier for fluoride risk.

Model Classification Type Accuracy
Random Forest Ensemble Learning 93% (Top Performer)
XGBoost Gradient Boosting High Accuracy
LightGBM Boosting Efficiency at Scale
ANN Neural Network Pattern Recognition
SVM (RBF) Kernel-based Nonlinear Mapping

Regression Analysis (Continuous Prediction)

Model R² Score RMSE
Random Forest Regressor 0.273 0.684
Linear Regression 0.218 0.709
SVR 0.174 0.729

Fuzzy Logic Inference System (FIS)

The system utilizes a Mamdani-type FIS to handle environmental uncertainty and provide interpretable results.

  • Input Memberships: Very Low, Low, Normal, High, Very High.
  • Output Risk Scores: Low Risk (< 33), Medium Risk (33–66), High Risk (>= 66).

Limitations & Future Scope

Current Constraints:

  • Absence of seasonal temporal data.
  • Limited to fluoride without accounting for heavy metal or nitrate interactions.
  • Exclusion of complex spatial hydrogeological layers.

Future Directions:

  • Implementation of GIS-based real-time heatmaps.
  • Integration of Deep Learning for enhanced predictive precision.
  • Incorporation of SHAP/LIME for model explainability and transparency.

Installation & Usage

# Clone the repository
git clone [https://github.com/codemuggle09/AquaRisk](https://github.com/codemuggle09/AquaRisk)

# Navigate into project folder
cd AquaRisk

# Install dependencies
pip install -r requirements.txt

# Launch the dashboard
python -m streamlit run webapp.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages