This repository is a dbt package containing data extracted from
FiveThirtyEight's data repository.
The package is intended to be used as a way to rapidly load interesting, curated
data sets into your database of choice.
To load data from this package, you'll need to install the package into your dbt project
just like any other package by adding it your packages.yml file and running dbt deps.
packages:
- git: "https://github.com/stkbailey/fivethirtyeight-open-data.git"
revision: 0.1.0
Afterwards, you'll need to indicate which projects you'd like to load by specifying the folder
name in the seeds config block of dbt_project.yml. (Example below.) The next time you run
dbt seed, the data will load!
seeds:
fivethirtyeight:
bob_ross:
enabled: true
fandango:
enabled: true
tarantino:
enabled: true
Data in this package are pulled from FiveThirtyEight's data repository, then minimally processed
to makem them compliant with dbt. This includes, for each project:
- Reformatting the
README.mdfile into aschema.ymlfile. - Renaming all
csvfiles to be<project_name>_<file_name>.csv. - Trimming large files (of a customizable size).
The code for re-downloading files is found in download_and_process_files.py.
See https://data.fivethirtyeight.com/ for a list of the data and code FiveThirtyEight has published.
Unless otherwise noted, these data sets are available under the Creative Commons Attribution 4.0 International License, and the code is available under the MIT License. If you find this information useful, please let us know.