Groundwater Fluoride Prediction Using Machine Learning & Fuzzy Logic

A data-driven, intelligent, and scalable framework to analyze groundwater fluoride contamination across India using Machine Learning, Regression Models, and a Fuzzy Inference System (FIS). This system supports early detection of fluoride-vulnerable regions and helps government agencies and water-resource managers make informed decisions.

Project Objectives

Large-Scale Analysis: Evaluates over 16,776 groundwater samples across various Indian states and districts.
Predictive Modeling: Utilizes advanced regression techniques to estimate precise fluoride concentrations.
Automated Classification: Categorizes water quality into Safe, Moderate, and High-risk zones using optimized classifiers.
Interpretability: Employs Mamdani Fuzzy Logic to convert technical data into human-readable risk scores.
Spatial Visualization: Generates state-level heatmaps and regional analysis for spatial awareness.

Dataset Architecture

The dataset comprises physicochemical parameters that significantly influence fluoride mobility within aquifers.

Feature Category	Parameters Included
Physicochemical	pH, EC, TDS, Na⁺, Ca²⁺, Mg²⁺, K⁺, Cl⁻, SO₄²⁻, NO₃⁻, HCO₃⁻
Target Variable	Fluoride concentration (mg/L)
Geospatial	State and District identifiers

Data Preprocessing Pipeline

Standardization: Normalization of column nomenclature (e.g., standardizing “EC µS/cm” to “EC”).
Imputation: Conversion of invalid entries to null values followed by Median Imputation to maintain numeric stability.
Risk Labeling: Implementation of WHO drinking water standards for classification.
- Class 0 (< 1.5 mg/L): Safe
- Class 1 (1.5–2.5 mg/L): Moderate Risk
- Class 2 (> 2.5 mg/L): High Risk
Scaling: Application of Min-Max scaling to a standard 0–1 range.
Class Balancing: Utilization of SMOTE (Synthetic Minority Over-sampling Technique) to resolve dataset imbalances and achieve perfect class parity.

Machine Learning Performance

Seven distinct models were evaluated to determine the most effective classifier for fluoride risk.

Model	Classification Type	Accuracy
Random Forest	Ensemble Learning	93% (Top Performer)
XGBoost	Gradient Boosting	High Accuracy
LightGBM	Boosting	Efficiency at Scale
ANN	Neural Network	Pattern Recognition
SVM (RBF)	Kernel-based	Nonlinear Mapping

Regression Analysis (Continuous Prediction)

Model	R² Score	RMSE
Random Forest Regressor	0.273	0.684
Linear Regression	0.218	0.709
SVR	0.174	0.729

Fuzzy Logic Inference System (FIS)

The system utilizes a Mamdani-type FIS to handle environmental uncertainty and provide interpretable results.

Input Memberships: Very Low, Low, Normal, High, Very High.
Output Risk Scores: Low Risk (< 33), Medium Risk (33–66), High Risk (>= 66).

Limitations & Future Scope

Current Constraints:

Absence of seasonal temporal data.
Limited to fluoride without accounting for heavy metal or nitrate interactions.
Exclusion of complex spatial hydrogeological layers.

Future Directions:

Implementation of GIS-based real-time heatmaps.
Integration of Deep Learning for enhanced predictive precision.
Incorporation of SHAP/LIME for model explainability and transparency.

Installation & Usage

# Clone the repository
git clone [https://github.com/codemuggle09/AquaRisk](https://github.com/codemuggle09/AquaRisk)

# Navigate into project folder
cd AquaRisk

# Install dependencies
pip install -r requirements.txt

# Launch the dashboard
python -m streamlit run webapp.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
src		src
README.md		README.md
requirements.txt		requirements.txt
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Groundwater Fluoride Prediction Using Machine Learning & Fuzzy Logic

Project Objectives

Dataset Architecture

Data Preprocessing Pipeline

Machine Learning Performance

Regression Analysis (Continuous Prediction)

Fuzzy Logic Inference System (FIS)

Limitations & Future Scope

Installation & Usage

About

Uh oh!

Releases

Packages

Languages

codemuggle09/AquaRisk

Folders and files

Latest commit

History

Repository files navigation

Groundwater Fluoride Prediction Using Machine Learning & Fuzzy Logic

Project Objectives

Dataset Architecture

Data Preprocessing Pipeline

Machine Learning Performance

Regression Analysis (Continuous Prediction)

Fuzzy Logic Inference System (FIS)

Limitations & Future Scope

Installation & Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages