Data Modeling in Postgres for Music Startup, Sparkify

Overview: Database, business context, analytical goals

The following Postgres database has been designed to easily access Sparkify's new music streaming app data to gain further insight into their customers habits and listening patterns. Data includes users, songs, and user activity.

How to: Running Python scripts, Jupyter

Run the following scripts in the Juptyer notebook by selecting the interested cell contatining your query and clicking 'Run' in the panel above. Or by calling the functions individually via the 'etl.py' file.

Files:

'data' - The current data that we ate working with.
'etl.pynb' - The notebook analyzing the data as well as process of understanding the code.
'test.pynb' - Code tests ensuring functionality.
'create_tables.py' - Functions for creating our tables.
'etl.py' - Python functions to run our code independent of the notebook.
'sql_queries.py' - SQL queries for creating tables and inserting data into them.
'README.md' - Description of project.

Design: Database schema design, ETL pipeline, and justification

The database schema has the following dimension tables: 'users', 'songs, 'artists;, and 'time', as well as one fact table called 'songplays'. The ETL pipeline pulls up to date data provided by Sparkify and includes a combination of 'song_data' and 'log_data' files combined to tie users, and time spent listening to the songs data for user activity insights. These table have been designed to optimize queries on song play analysis of Sparkify's users.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
create_tables.py		create_tables.py
data.zip		data.zip
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Modeling in Postgres for Music Startup, Sparkify

Overview: Database, business context, analytical goals

How to: Running Python scripts, Jupyter

Files:

Design: Database schema design, ETL pipeline, and justification

About

Uh oh!

Releases

Packages

Languages

BryanHolbrook/spark-startup-data-modeling

Folders and files

Latest commit

History

Repository files navigation

Data Modeling in Postgres for Music Startup, Sparkify

Overview: Database, business context, analytical goals

How to: Running Python scripts, Jupyter

Files:

Design: Database schema design, ETL pipeline, and justification

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages