Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

IronHack Logo

Guided Project: Web Data Pipeline

Overview

The goal of this project is for you to practice what you have learned in the Intermediate Python and Data Engineering chapter of this program. For this project, you will start with a data set of your choice. You will need to import it, use your newly-acquired skills to build a data pipeline that processes the data and produces a result. You should demonstrate your proficiency with the tools we covered (functions, list comprehensions, string operations, and error handling) in your pipeline.

You will be working individually for this project, but we'll be guiding you along the process and helping you as you go. Show us what you've got!


Technical Requirements

The technical requirements for this project are as follows:

  • You must construct a data pipeline with the majority of your code wrapped in functions.
  • Each data pipeline stage should be covered: acquisition, wrangling, analysis, and reporting.
  • You must demonstrate all the topics we covered in the chapter (functions, list comprehensions, string operations, and error handling) in your processing of the data.
  • There should be some data set that gets imported and some result that gets exported.
  • Your code should be saved in a Python executable file (.py), your data should be saved in a folder named data, and your results should be saved in a folder named output.
  • You should also include a README.md file that describes the steps you took and your thought process as you built your data pipeline.

Necessary Deliverables

The following deliverables should be pushed to your Github repo for this chapter.

  • A Python (.py) code file that contains the code for your data pipeline.
  • A data folder containing your data set.
  • An output folder containing the output of your data pipeline.
  • A README.md file containing a detailed explanation of the process followed in the design and construction of your pipeline and incorporation of intermediate Python concepts as well as your results, obstacles encountered, and lessons learned.

Suggested Ways to Get Started

  • Find a data set to process - a great place to start looking would be Awesome Public Data Sets and Kaggle Data Sets.
  • Examine the data and come up with a deliverable before diving in and applying any methods to it.
  • Break the project down into different steps - leverage the stages of the data pipeline covered in the pipelines lesson and answer the appropriate questions for each stage.
  • Use the tools in your tool kit - your knowledge of intermediate Python as well as some of the things you've learned in previous chapters. This is a great way to start tying everything you've learned together!
  • Work through the lessons in class & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... procrastinating.
  • Commit early, commit often, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
  • Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want.

Useful Resources

Project Feedback + Evaluation

  • Technical Requirements: Did you deliver a project that met all the technical requirements? Given what the class has covered so far, did you build something that was reasonably complex?

  • Creativity: Did you add a personal spin or creative element into your project submission? Did you incorporate domain knowledge or unique perspective into your analysis.

  • Code Quality: Did you follow code style guidance and best practices covered in class?

  • Total: Your instructors will give you a total score on your project between:

    Score Expectations
    0 Does not meet expectations
    1 Meets expectactions, good job!
    2 Exceeds expectations, you wonderful creature, you!

This will be useful as an overall gauge of whether you met the project goals, but the more important scores are described in the specs above, which can help you identify where to focus your efforts for the next project!

Presentation Guideline and Criteria

Format

  • Presentation Time: 6 minutes
  • Q & A: 3 minutes
  • Total Time: 9 minutes

Attire

Outputs

  • A presentation in slides.com
  • A demo deployed on GitHub Pages
  • The presentation and demo will be executed on a class computer (instead of your own)
  • Get ready to explain some of your code in GitHub

Things you might want to talk about

  • Short presentation of yourself:
    • Who are you?
    • A hobby you have.
    • Note: we are getting you ready for final presentation!
  • Elevator pitch:
    • Data set you chose.
    • Why did you chose that data set?
    • The most important thing you learned.
  • One technical challenge you faced:
    • Explain the challenge.
    • Explain how and what you did to overcome it.
    • Show and explain code snippets in your presentation slides.
  • Git:
    • Display an screenshot of your GitHub graphs to show your commit frequency and how much work you did.
  • Data Pipeline Walkthrough:
    • Walk the audience through the data set you chose, providing an overview of some of the fields and other information contained in the data.
    • Walk the audience through your process of designing and constructing your data pipeline including what tools and techniques you employed, what avenues you decided to pursue and why, and what lessons you learned.
  • One important mistake you made:
    • Did you made a mistake in the construction of your pipeline? Did you perform one of the operations incorrectly?