Skip to content

Azm1ne/Climate-Change-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

🌍 Climate Change Data Analysis & Sea Level Rise Prediction

A Data Science Project using Machine Learning & Exploratory Analysis

Kaggle Colab


📌 Overview

This project explores the impacts of various climate indicators—such as temperature, CO₂ emissions, precipitation, humidity, and wind speed—on sea level rise. Using a combination of exploratory data analysis (EDA), feature engineering, outlier treatment, and multiple machine-learning models, the project predicts sea-level variations and examines which environmental factors influence them most strongly.

This repository includes a complete end-to-end workflow:

  • Data loading
  • Cleaning & feature engineering
  • Exploratory data analysis
  • Model training (Linear Regression, Random Forest, Decision Tree, SVR)
  • Model evaluation
  • A prediction interface for generating sea-level rise estimates

📁 Dataset

The dataset contains the following climate-related fields:

Column Description
Date Daily timestamp
Location City or locality
Country Country identifier
Temperature Temperature (°C)
CO₂ Emissions Carbon emissions (tons/year)
Sea Level Rise Rise in sea level (meters)
Precipitation Precipitation (mm)
Humidity Air humidity (%)
Wind Speed Wind speed (m/s)

🔧 Features & Engineering

To support time-series and seasonal analysis, the following features were extracted from the Date column:

  • year_month (YYYY-MM)
  • year
  • month

These engineered features are used later in the model training pipeline, along with one-hot encoding of categorical fields such as Country.


🧹 Data Cleaning

Outliers were removed using the Interquartile Range (IQR) method for:

  • CO₂ Emissions
  • Sea Level Rise
  • Temperature

A total of 218 rows were removed, improving model stability and reducing noise.


📊 Exploratory Data Analysis (EDA)

The notebook includes detailed EDA through:

  • Correlation heatmaps
  • Scatter plots showing relationships with sea-level rise
  • Histograms & boxplots for distribution analysis
  • Seasonal and temporal trend visualizations

These analyses reveal which climate indicators correlate most strongly with sea-level variations.


🤖 Machine Learning Models Used

A variety of regression models were trained and evaluated:

1. Linear Regression

A baseline model for detecting linear relationships.

2. Random Forest Regressor

An ensemble learning method capable of modeling complex non-linear interactions.

3. Decision Tree Regressor

A simple, interpretable tree-based model using recursive splitting.

4. Support Vector Regression (SVR)

A margin-based model effective for non-linear boundaries (after feature scaling).


Evaluation Metrics

Each model is assessed using:

  • Mean Squared Error (MSE)
  • R² Score
  • Actual vs. Predicted Scatter Plots
  • Actual vs. Predicted Trend Plots

🔮 Prediction System

The notebook includes an interactive prediction system that accepts user input for:

  • Temperature
  • CO₂ emissions
  • Precipitation
  • Humidity
  • Wind speed
  • Year
  • Month
  • Country

These inputs go through the same preprocessing pipeline used during training (including scaling and encoding), ensuring no data leakage.

Example Output:

Model Predicted Sea Level Rise
Linear Regression 0.08829
Random Forest 0.12365
Decision Tree 0.53591
SVR -0.06596

📈 Results Summary

  • Outlier removal improved distribution smoothness and reduced skewness.
  • Random Forest performed best overall in accuracy and robustness.
  • SVR predictions varied significantly, showing sensitivity to scaling and kernel parameters.
  • Using multiple models provides a more comprehensive understanding of possible sea-level rise outcomes.

🚀 Next Steps

Potential enhancements include:

  • Hyperparameter tuning (GridSearchCV / RandomizedSearchCV)
  • Deep learning models (LSTM, GRU) for time-series prediction
  • GIS or geographical visualization dashboards
  • Country-level climate trend analytics
  • Feature importance analysis (SHAP, permutation importance)

🗂 Project Structure

├── climate_change_data.csv # Dataset
├── Climate_Analysis.ipynb # Main notebook (EDA, models, predictions)
├── README.md # Project documentation

🧰 Technologies Used

  • Python 3.10+
  • pandas
  • numpy
  • seaborn / matplotlib
  • scikit-learn
  • wordcloud
  • Jupyter Notebook / Google Colab

🙌 Acknowledgments

This project aims to deepen understanding of the global climate crisis using data-driven methods. Special thanks to the providers of open climate datasets and the open-source community.