Skip to content

sanjeevtw/exercise-co2-vs-temperature-databricks

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CO2 vs. Temperature Exercise (Databricks)

This repository contains exercises in Databricks that ingests Global Temperature and Global Temperature By Country data from Kaggle and CO2 Emissions data from OWID and transforms it. The goal of this exercise is to teach some basics about data wrangling and Spark with respect to real world questions.

  • Which countries are worse-hit (higher temperature anomalies)?
  • Which countries are the biggest emitters?
  • What are some attempts of ranking “biggest polluters” in a sensible way?

Data Sources

In order to answer some of the questions of the exercise, we picked open-source data from Open World in Data (OWID) and Kaggle.

The specific datasets:

Data Sources (Modified!)

Since the point of this exercise is to learn how to work with data and the datasets from OWID and Kaggle are both too clean and curated, a set of dirtied data is provided.

They can be found at:

Prerequisites

Data Ingestion

  1. Open Data Ingestion CO2 vs Temperature.dbc in Databricks Community Edition databricks-import
  2. Follow instructions, move on to following exercises once tests all pass.
  3. Solutions can be found here.

Data Transformation

  1. Open Data Transformation CO2 vs Temperature.dbc in Databricks Community Edition databricks-import
  2. Follow instructions, move on to following exercises once tests all pass.
  3. Solutions can be found here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%