Course materials for General Assembly's Data Science course in San Francisco (11/30/15 - 3/2/16).
Instructor: Rob Hall
TA: Justin Breucop
Once you've received the invitation to Slack, please log in and add your picture! Slack will be the primary way we communicate with each other.
Installation and Setup Checklist
| Monday | Wednesday |
|---|---|
| 11/30: Course Overview, Introduction to Data Science | 12/2: Version Control, Intro to Python |
| 12/7: Intro to Machine Learning, KNN | 12/9: Data Reading and Cleaning |
| 12/14: Data Exploration | 12/16: Scikit-learn and Model Evaluation Project Question & Dataset Due |
| 12/21: No Class (Holiday Break) | 12/23: No Class (Holiday Break) |
| 12/28: No Class (Holiday Break) | 12/30: No Class (Holiday Break) |
| 1/4: Linear Regression | 1/6: Logistic Regression |
| 1/11: Naive Bayes | 1/13: Advanced Model Evaluation |
| 1/18: No Class (MLK Day) | 1/20: Clustering Project First Draft Due |
| 1/25: Decision Trees | 1/27: Ensembling Techniques |
| 2/1: Dimensionality Reduction | 2/3: Support Vector Machines |
| 2/8: Recommender Systems | 2/10: SQL, Databases Project Second Draft Due (Optional) |
| 2/15: No Class (President's Day) | 2/17: Advanced Topic or Guest Speaker |
| 2/22: Advanced Topic or Guest Speaker | 2/24: Course Review |
| 2/29: Project Presentations & Project Due | 3/2: Project Presentations & Project Due |
syllabus last updated: 12/2/2015
- Welcome from General Assembly staff
- Course overview (slides)
- Introduction to data science (slides)
- Command line & exercise (code)
- Exit tickets
Homework:
- Work through GA's friendly command line tutorial using Terminal (Linux/Mac) or Git Bash (Windows), and then browse through this command line reference.
- Watch videos 1 through 8 (21 minutes) of Introduction to Git and GitHub.
- If your laptop has any setup issues, please work with us to resolve them by Wednesday.
Resources:
- For a useful look at the different types of data scientists, read Analyzing the Analyzers (32 pages).
- For some thoughts on what it's like to be a data scientist, read these short posts from Win-Vector and Datascope Analytics.
- Quora has a data science topic FAQ with lots of interesting Q&A.
- Final project presentations from other class
- Q&A on course project expectations & schedule
- Version Control with Git and GitHub (slides)
- Git Configuration and Github setup
- Intro to Python (slides)
- Exit tickets
Homework:
- If you haven't already, complete the homework exercise listed in the command line introduction. Create a Markdown document that includes your answers and the code you used to arrive at those answers. Add this file to a GitHub repo that you'll use for all of your coursework, and submit a link to your repo using the homework submission form.
Git and Markdown Resources:
- Pro Git is an excellent book for learning Git. Read the first two chapters to gain a deeper understanding of version control and basic commands.
- Github's Mastering Markdown is a good starting point for learning github-flavored markdown.
Command Line Resources:
- If you want to go much deeper into the command line, Data Science at the Command Line is a great book. The companion website provides installation instructions for a "data science toolbox" (a virtual machine with many more command line tools), as well as a long reference guide to popular command line tools.