Skip to content

fctonio/Smart-meters-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ironhack Logo

Statistical-Analysis-Project

Paul Musco, Mayowa Akinyele and Anton Neike

Data-ber-08-20, Berlin & 16/09/2020

Content

Project Description

The goal of this project is to create and interprete different types of visualizations using real world data as well as statistical analysis. For this project we worked in a group of three people and had to choose our own subject and data. The goal is to hold a presentation of 5 minutes at the end of the week and present our finding in a way anyone could understand.

Dataset

For this project, we used a dataset sourced from Kaggle looking at energy consumption in London, weather conditions over the same period and Acorn groups. We downloaded the following datasets from Kaggle:

  • acorn_details.csv
  • daily_dataset.csv
  • informations_households.csv
  • uk_bank_holidays.csv
  • weather_daily_darksky.csv

The original datasets can be downloaded from the following link: https://www.kaggle.com/jeanmidev/smart-meters-in-london

We used statistical analysis skills to analyze these datasets, the resulting CSV files are:

- acorn_demog (daily_dataset.csv & informations_households.csv merged, ACORN groups as index)
- daily_dataset_clean (cleaned version of Kaggle original)
- daily_group_households_merge (daily_dataset.csv (grouped by ‘LCLid’) & informations_households.csv merged)
- daily_weather_merge ('weather_daily_darksky_clean.csv' & 'daily_dataset_clean.csv' merged, needed for regression)
- households_accom_merge (daily_dataset.csv & informations_households.csv merged, categories as index)
- weather_daily_darksky_clean (cleaned version of Kaggle original)

The created CSV files can be found here: https://drive.google.com/drive/folders/1Ku0PFKlWstaLbY0ine2Q_LnbC5SLzctG?usp=sharing

Workflow

  • We chose the our data
  • We used basic pandas functions to familiarise ourselves with the data. (head(), shape, isnull().sum() ...)
  • Made some basic visualisations with tableau to see if there was anything obvious.
  • Merged the datframes with each other.
  • Checked the correlation of energy usage with different variables.
  • We then made:
    -> a scatter plot and deleted the outliers.
    -> an OLS model and checked the statistical relevance.

Repository organisation

  • Notebooks (Folder) :
    -> average_weekday_consumption.ipynb : calculate and plot average energy consumption by weekday
    -> concat_blocks.ipynb : look at half hourly meter readings for each household (NOT USED IN FINAL ANALYSIS)
    -> dataset_analysis_cleaning.ipynb : initial cleaning of original datasets downloaded from Kaggle
    -> linear_regression_visualisation.ipynb : look at energy consumption by ACORN group and which variables have the strongest correlation with energy consumption
    -> merge_dailydata_households_acorn.ipynb : merge daily_dataset.csv & informations_households.csv
    -> merge_weather-daily_data.ipynb: merge weather_daily_darksky.csv with daily_dataset.csv
    -> modified_Regression_model.ipynb : linear regression analysis
  • Tableau (Folder) :
    -> Tableau_merged_weather_Daily.twb : Tableau file with the graphs for the presentation.
  • Presentation (Folder) :
    -> Contains final version of the presentation as a PDF file, along with presentation notes.
  • .gitignore file
  • README file

Results/findings

  • Those with a higher income (ACORN-A) consume more energy on average than those with a lower income (ACORN-Q)
  • Weekends saw the highest levels of energy consumption
  • There’s a clear seasonal pattern, more energy consumption in colder months
  • Negative correlation between the number of daylight hours and energy consumption
  • Negative correlation between the average daily temperature and energy consumption (0.84 R-squared value)
  • Carried out regression on the model (R-squared value = 0.68)
  • Overall energy consumption in London during the time period shows a slight downward trend over time

Potential improvements/future analysis

  • Look further into potential outliers
  • Exclude dates at the start of the testing period where the number of installed meters is small
  • Dive deeper into consumer demographics
  • Look at energy consumption by individual households
  • Try quadratic regression to see if energy consumption increases in high temperatures

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors