Skip to content
This repository was archived by the owner on Feb 6, 2026. It is now read-only.
Oghenetega Idogun edited this page Aug 10, 2024 · 2 revisions

Data Analysis - BlueGAP Wiki

Welcome to the Data Analysis BlueGAP project wiki! This section provides detailed information about the project, including its structure, key components, and how to use it.

Project Overview

The Data Analysis BlueGAP project aims to analyze water quality data, specifically focusing on nitrogen (NO23) concentrations across various monitoring stations. This project is designed to assist in environmental assessments and regulatory compliance by identifying trends, geographical patterns, and outliers in the data.

Project Structure

Data

  • data/: Contains the raw and cleaned water quality data files.
    • cleaned_water_quality.csv: Cleaned data prepared for analysis.
    • original_water_quality.csv: The initial raw dataset.
    • water_quality.csv: The dataset before any cleaning processes.

Notebooks

  • notebooks/: Jupyter notebooks for data analysis and visualization.
    • 01_Data_Analysis.ipynb: Analyzes the cleaned water quality data and identifies key metrics.
    • 02_Visualization.ipynb: Generates static visualizations to illustrate trends and patterns in the data.
    • 03_Interactive_Visualizations.ipynb: Provides interactive visualizations for dynamic data exploration.

Reports

  • reports/: Contains the analysis report and generated figures.
    • analysis_report.md: Markdown version of the analysis report.
    • analysis_report.html: HTML version of the analysis report.
    • figures/: Directory with figures and interactive maps.
      • mean_nitrogen_by_year.png: Visualization of mean nitrogen concentration by year.
      • spatial_distribution_nitrogen.png: Spatial distribution of nitrogen concentration.
      • cleaned_data_map.html: Interactive map of cleaned data.
      • original_data_map.html: Interactive map of original data.

Source Code

  • src/: Contains the source code for data processing and analysis.
    • analysis.py: Functions for calculating mean nitrogen concentrations and other metrics.
    • data_loader.py: Functions for loading and preprocessing data.
    • prepare_data.py: Functions for preparing and cleaning data.
    • visualization.py: Functions for creating visualizations.

Main Script

  • main.py: The entry point for running the main script of the project.

Key Components

Data Cleaning and Preprocessing

The project includes steps for cleaning the data:

  • Removal of unnecessary columns.
  • Outlier detection and removal using the Interquartile Range (IQR) method.

Analysis and Visualization

The project performs several analyses:

  • Calculation of mean nitrogen concentrations by year.
  • Creation of static and interactive visualizations to explore data trends and distributions.

Interactive Visualizations

Explore dynamic visualizations in the Interactive Visualizations Notebook. This notebook provides interactive tools for a more in-depth exploration of the data.

How to Run the Project

  1. Install Dependencies: Ensure all dependencies are listed in requirements.txt are installed.
  2. Run Analysis: Execute the Jupyter notebooks in order to perform the data analysis and visualization.
  3. View Reports: Access the Markdown and HTML reports in the reports/ directory for detailed insights.

Conclusion

The analysis conducted in this project provides valuable insights into nitrogen concentration variations across different times and locations. This information is crucial for environmental monitoring and remediation efforts.

Recommendations

  • Monitoring: Continue monitoring nitrogen concentrations at identified hotspots.
  • Advanced Analysis: Consider implementing advanced techniques, such as time series forecasting, for future trend predictions.

Notes