dbWorkshop

Steps to Use this Repo

Clone this GitHub repo
Download, unzip and put the USCensus1990.data.txt in the main directory of this repo. https://archive.ics.uci.edu/dataset/116/us+census+data+1990
Install Docker and deploy the databases
1. Make sure that docker is installed on your system. https://docs.docker.com/engine/install/
2. Deploy the databases by running docker compose up -d
3. Verify the database containers are running docker ps
Make a Python virtual environment (recommended)
1. python -m venv
2. Install the dependencies of this repo pip install -r requirements.txt
Populate the databases with the sample data
1. Run ingestion_test.py and census_ingest_mongo.py to load in the Census dataset
2. Run ecommerce_ingest.py and ecommerce_ingest_postgres.py to load the ecommerce dataset into each database
Run the streamlit application streamlit run app.py
1. streamlit will list the URLs of where to reach the app
2. The default is to localhost:8501 and also on your local subnet
Populate the test data to see the differences in database performance via python populate_query_logs.py
Evaluate results!

Individual Tests

In this repo are also a series of individual performance tests that you can run. For example, to see the improvements of indexing a SQL database, you can run python indexing_test.py. Some of my results are in the results.md document.

Powerpoint

The presentation of this workshop is available as a PowerPoint and is available as part of this repo, it gives a general motivation for understanding your data flow and database selection. Database selection is an active choice that needs to be made by the discerning data scientist. Understanding the actual mechanics of data storage, transport, and processing is a key piece of education that is missed in data science programs.

Links and Notes

A good overview of database systems and where I got some images: https://cs186berkeley.net/notes/note17/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Dont Use Excel.pptx		Dont Use Excel.pptx
README.md		README.md
app.py		app.py
census_ingest_mongo.py		census_ingest_mongo.py
docker-compose.yml		docker-compose.yml
ecommerce_ingest.py		ecommerce_ingest.py
ecommerce_ingest_postgres.py		ecommerce_ingest_postgres.py
indexing_test.py		indexing_test.py
ingestion_test.py		ingestion_test.py
nosql_vs_sql_test.py		nosql_vs_sql_test.py
populate_query_logs.py		populate_query_logs.py
querying_test.py		querying_test.py
requirements.txt		requirements.txt
results.md		results.md
update_test.py		update_test.py
visualization_test.py		visualization_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbWorkshop

Steps to Use this Repo

Individual Tests

Powerpoint

Links and Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dbWorkshop

Steps to Use this Repo

Individual Tests

Powerpoint

Links and Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages