The goal of this project is for you to derive insights using the Pandas library to analyze a real world dataset of your choice.
You will be working individually for this project, but we'll be guiding you along the process and helping you as you go.
The technical requirements for this project are as follows:
- Select a real world dataset.
- Create a Jupyter notebook to perform your analysis the dataset.
- Incorporate at least three elements from the topics covered in this chapter in your analysis of the dataset.
- Include your thoughts and conclusions from the analysis in markdown cells in the notebook.
The following deliverables should be pushed to your Github repo for this chapter.
- A Jupyter notebook containing your analysis and the code you used to obtain this analysis.
- A data folder containing the csv file you chose.
- Find a data set to process - a great place to start looking would be Awesome Public Data Sets, Kaggle Data Sets, or the UCI Machine Learning Repository. A great new source is the Google dataset search.
- Perform Preliminary Analysis - use functions like
describeto help uncover interesting insights about your dataset. - Use the tools in your tool kit - Using tools like calculated columns and pivot tables, discover and desribe the insights you have found.
- Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want.
-
Technical Requirements: Did you deliver a project that met all the technical requirements? Given what the class has covered so far, did you build something that was reasonably complex?
-
Creativity: Did you add a personal spin or creative element into your project submission? Did you incorporate domain knowledge or unique perspective into your analysis.
-
Code Quality: Did you follow code style guidance and best practices covered in class?
-
Total: Your instructors will give you a total score on your project between:
Score Expectations 0 Does not meet expectations 1 Meets expectations, good job! 2 Exceeds expectations, you wonderful creature, you!
This will be useful as an overall gauge of whether you met the project goals, but the more important scores are described in the specs above, which can help you identify where to focus your efforts for the next project!
- Presentation Time: 6 minutes
- Q & A: 3 minutes
- Total Time: 9 minutes
- DRESS TO IMPRESS: Smart casual would be great
- Present your findings using the Jupyter notebook.
- The presentation and demo will be executed on a class computer (instead of your own).
- Be prepared to explain your entire notebook and thought process.
- Short presentation of yourself:
- Who are you?
- What background did you have before taking this course?
- Note: we are getting you ready for final presentation!
- Introduction
- The dataset you chose
- What drew you to this dataset?
- The most important thing you learned.
- One technical challenge you faced:
- Explain the challenge.
- Explain how and what you did to overcome it.
- Show your code and explain in markdown cells in your notebook.
- The code
- Be prepared to explain your code as well as markdown comments
- Your insights
- Explain what insight you derived from your analysis and what this tells us about the data.
- What recommendations would you make given these insights?
