The goal of this project is to practice creating and interpreting different types of visualizations using real world data.
You will be working individually for this project, but we'll be guiding you along the process and helping you as you go.
The technical requirements for this project are as follows:
- Select a dataset from a public source.
- Create a Jupyter noteboosk to analyze the data
- Using your data, create a minimum of one scatter plot, one histogram, one box plot and one bar graph (you can add more than one visualization of each type of you choose). Graphs should contain the proper labeling of the x and y axis when appropriate as well as a title for the graph.
- Explain what insight or information is inferred from these visualizations. The explanation should be in the notebook in markdown cells.
The following deliverables should be pushed to your Github repo for this chapter.
- A Jupyter notebook containing your analysis and the code you used to obtain this analysis.
- A data folder containing your data set.
- Find a data set to process - a great place to start looking would be Awesome Public Data Sets, Kaggle Data Sets, or the UCI Machine Learning Repository. A great new source is the Google dataset search.
- Perform Preliminary Analysis - use functions like
describeto help guide you to the correct insight and data visualization. - Use the tools in your tool kit - your knowledge of the different types of visualizations and when to use them should come in handy with this assignment.
- Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want.
Por el gran tamaño de las bases estas fueron guardadadas en google drive.
La siguiente liga incluye las bases en crudo dentro de un zip y las bases ya trabajadas en csv: https://drive.google.com/drive/folders/115_ZmI6tIrBlVRyfpRdbRzy232njNdd6?usp=sharing
