Submission:
- Please submit your project via GitHub and send a private message on Slack to both Dan and Ivan with a link to it.
Exploratory data analysis is a crucial and informative step in the data process. It helps confirm or deny your initial hypotheses and helps visualize the relationships among your data. Your exploratory data analysis also informs the kinds of data transformations that you will need to optimize for machine learning models.
In this assignment, you will explore and visualize your initial analysis in order to effectively tell your data's story. You will create a Jupyter notebook that explores your data mathematically, using a visualiation package such as matplotlib.
Objective: Confirm your data and create an exploratory data analysis notebook with statistical analysis and visualization.
- Requirements:
- A well organized Jupyter notebook with code and fully ran top to bottom.
- At least one visual for each independent variable and, if possible, its relationship to your dependent variable.
- It's just as important to show what's not correlated as it is to show any actual correlations found.
- Visuals should be well labeled and intuitive based on the data types.
- For example, if your
xvariable is temperature andyis "did it rain," a reasonable visual would be two histograms of temperature, one where it rained, and one where it didn't.
- For example, if your
- Tables are a perfectly valid visualization tool! Interweave them into your work.
- Provide insight about dataset and its impact on your hypothesis.
- Keep the project simple! The "cool" part of the analysis will come; just looking at simple relationships between variables can be incredibly insightful.
- Consider building some helper functions that help you quickly visualize and interpret data.
- Exploratory data analysis should be formulaic; the code should not be holding you back. There are plenty of "starter code" examples from class materials.
- DRY: Don't Repeat Yourself! If you see yourself copy and pasting code a lot, turn it into a function, and use the function instead!
- This deliverable should be similar to the work you did for Unit Project 2 earlier in the course.
The rubric is available here.
