- data - folder containing Udacity's sample sparkify dataset for the project
- sql_queries.py - file created based on the project's template that contains the queries for drops, creates and inserts
- create_tables.py - file created based on the project's template that resets the PostgreSQL database (always run before
etl.py) - etl.py - file created based on the project's template that processes the data
- requirements.txt - library requirements for the docker image that runs the data processing steps
- run_etl.sh - simple shell script to execute
create_tables.py, thenetl.py - Dockerfile - Dockerfile for a docker image that has all the requirements to run the etl process
- docker-compose.yml - docker-compose file that creates two containers: one with a postgreSQL database to store the sparkify data and another with a metabase server to allow for data exploration
- metabase - set of files required to create the metabase container correctly and restore a backup of its database after the creation of a set of dashboards
- dashboards - examples of analysis that could be performed in the data provided with dashboards and queries
In order to run this project, you will need Docker and Docker-Compose. If you don't have them installed, you may follow the guides for Docker and docker-compose
Once you have them installed, run the following command in the project folder to start your metabase server and your postgres database containers (the first time running this will trigger a build of the metabase image):
docker-compose upAfter this is up and running, open another terminal and build the image that will be responsible for the etl process:
docker build . -t project-1Then, perform a docker run to execute the etl:
docker run --network host project-1Those commands will setup the environment for you to access the data. Now, if you want to explore the database, access the metabase server on your localhost at http://localhost:3000
To log in, use the credentials:
email: udacity_student@example.com
password: Yd72!2da5oAHsJY0ICIJtrCf
If you followed the steps above, you can find the songplay_metrics dashboard here and the user_base dashboard here