-
Notifications
You must be signed in to change notification settings - Fork 8
Assignments
This page contains information about the assignments and final project.
One important thing is (of course) when the various assignments are due. Below is an overview
- Assignment 1
- Posted: During lecture 4.
- Due: Sunday Feb 28th, 2016 at 23:55.
- Peergrading due: Monday Feb 29th, 2016 at 23:55.
- Assignment 2
- Posted: During lecture 8.
- Due: Sunday April 3rd, 2016 at 23:55.
- Peergrading due: Tuesday Apr 5th, 2016 at 23:55.
- Project Assignment A
- Due: Monday April 18th, 2016 at 23:55.
- Peergrading due: Friday Apr 22nd at 23:55.
- Project Assignment B
- Due: Tuesday May 17th, 2016 at 23:55
- Peergrading due: Friday May 20th, 2016 at 23:55.
The lectures in this class run over 8 weeks. Each week, we will post a number of exercises. After a set of lectures, we will post an assignment. The assignment is a subset of the exercises. This means that, if you solve the exercises each week, the assignments will be easy. Since Assignments 1 and 2 will be written reports (IPython notebooks), summarizing the work contained in exercises preceding it.
- Assignment 1 is available here. (see details regarding Formalia below)
- Assignment 2 is available here. (see details regarding Formalia below)
- Assignments should be handed in in groups.
- Groups should have 3 members.
- It's preferred that you can work in the same groups throughout the semester.
- All group members should be familiar with every aspect of the assignment. That means, that you can split up the writing, etc, but everyone in the group should be able to solve every exercise. If there's an exercise that you can't solve, talk to your fellow group members, the professor, or one of the TAs about how it's done ... otherwise, you will be missing out.
- It is possible to have fewer than 3 group members, but we judge all reports the same, so being 3 in a group decreases the amount of writing you have to do.
We will be grading your .ipynb file, it should be uploaded via http://peergrade.io/
- For the delivery:
give the file any name, e.g.
Assignment1.ipynbmake sure that your code runs and renders all images, prints, etc before you save your file and upload. We recommend restarting the kernel under 'Kernel' and then clicking Cell --> Run all before uploading.
double check that your file renders correctly once you've uploaded it to http://peergrade.io/ (you will be able to do this as part of the upload procedure). Remember that you'll be annoyed to get bad evaluations because no-one could see your plots.
- Remember that the Notebook should be anonymous, so don't include your name and student ID.
- To help us navigate the Notebook, it's a good idea to repeat the question you're answering.
- Try to control the length of your notebook. While grading, we look at how you prioritize material and express yourself clearly and succinctly.
- Read the text carefully - make sure you understand the question. And make sure that you answer all sub-questions, etc. (It's easy to miss something, so be thorough).
- Do not solve all exercises in a single code cell. Split your code according to the questions
- The notebook is designed to contain your code, so do include it. But do keep it short & neat (minimize long outputs, etc)
- Format your plots properly. Axes must be labeled, make sure there's text explaining the figure, use
%matplotlib inline, etc. - Make sure that you use references when they're needed and follow academic standards.
- Be precise, write in objective language (avoid: "I think ...", "In my opinon...", etc) - if you make an observation, support it with data.
The point of the Project Assignments is to try out the skills you've learned in the course on your own dataset. We will be working with Open City Data. Here are some examples.
- New York City
- San Francisco
- Copenhagen
- Melbourne
- Helsinki
- And there are many more awesome data sets out there.
So what can you do with Open Data? For inspiration, I recommend you sit down and listen to a podcast to get in the right mindset. It's an interview with Ben Wellington, the author of the site http://iquantny.tumblr.com/
Check out this podcast for inspiration. It's 40 minutes, but well worth your time. After listening to this I predict that you'll be brimming with ideas for what to start working on. (Or at least have an idea of where to get started).
Summary: The overall idea is to take a deep dive into some aspect of a city dataset and try to understand that data using the tools you've learned in this class. Once you understand the data, you should tell the story of what you've found on your own website using D3 visualizations (and whichever other tools you need).
The first part of the final project is a 5 minute movie, which should explain the central idea/concept that you will investigate in your final project. You're making the movie so that the TAs and I can give you feedback, and so that other groups can steal your ideas (and you can steal ideas from them). The movie must contain the following
- An explanation of the central idea behind your final project (which city, what is the idea? which datasets do you need to explore the idea?, why is it interesting?)
- An outline on the elements you'll need to get to your goal.
- The implementation plan.
- A walk-through of your preliminary data-analysis, addressing
- What is the total size of your data? (MB, number of rows, number of variables, etc)
- What are other properties? (What is the date range? Is is it geo-data?, then a quick plot of locations, etc.)
- Show the fundamental distributions of the data (similar to the work we did on SF crime data for lecture 3)
But other than that, there are no constraints. And we do appreciate funny/inventive/beautiful movies, although the academic content is most important. Note that we'll display the movie to the entire class.
(The maximum length is 5 minutes, but its OK if the movie is shorter.)
Handing in the assignment: Simply upload your video to youtube (the higher the resolution the better) and submit the link to peergrade.
Note that since Project Assignment A now requires significant data-work, you have 2 weeks to create the video presentation.
The deliverables for the Final project will be
- A website. The website should contain your analysis, it should tell the story about the data that you're interested in getting across. The website should not be technical, but rather aim at using visualization and explanation to get your insights across to a non-scientific reader.
- An explainer Notebook. The Notebook should contain all the behind the scenes stuff, details on the dataset, why you've selected these particular visualizations, explanations of your machine learning methodology, etc. You should link to the book from the site.
The idea is that you can create much more complex, dynamic and interactive analysis (and visualizations) on line. So the website is a way for you to present your work in a way that everyone can understand it ... including dynamic visualizations, interactive analysis, etc, etc ... that would not work on a piece of paper.
The main point of the website is to present your idea/analyses to the world in a way that showcases your use of what you've learned in class. The website should be self-contained and tell the story without the need for the Explainer Notebook (the purpose of the notebook is to provide additional details for interested readers). Here are some requirements
- The page should say clearly what the dataset is and give the reader some idea of its most important properties.
- The page must contain (at least one) D3 visualization (and not the barchart/scatter plot from the book - consider something interesting from Chapter 11-12 or this official page of cool examples). Interactive visualizations that allows users to explore the data are better! Other visualizations (plots from python, etc) can, of course, also be included.
- Somewhere in your analysis, you must use least 2 out of the machine learning methods we've covered in class (standard linear regression doesn't count). It is OK to use other machine learning methods as well (or instead).
- There should be download options for data sets (so the user can play around).
- You must link to the Explainer Notebook (more details below) that explains the details of your analysis (including all of the machine learning, the model selection, etc). You can achieve this with a link to an IPython notebook displaying on the nbviewer.
For hosting, I recommend you use whatever you've already set up during the exercises.
The notebook should contain your analysis and code. Please structure it into the following 5 sections
- Motivation
- What is your dataset?
- Why did you choose this/these particular dataset(s)?
- What was your goal for the end user's experience?
- Basic stats. Let's understand the dataset better
- Write about your choices in data cleaning and preprocessing
- Write a short section that discusses the dataset stats (here you can recycle the work you did for Project Assignment A)
- Theory. Which theoretical tools did you use?
- Describe which machine learning tools you use and why the tools you've chosen are right for the problem you're solving.
- Talk about your model selection. How did you split the data in to test/training. Did you use cross validation?
- Explain the model performance. How did you measure it? Are your results what you expected?
- Visualizations
- Explain the visualizations you've chosen.
- Why are they right for the story you want to tell?
- Discussion. Think critically about your creation
- What went well?,
- What is still missing? What could be improved?, Why?
Some additional notes:
- Make sure that you use references when they're needed and follow academic standards.
This class has been hand crafted for you by Sune Lehmann
This work is licensed under a Creative Commons Attribution 4.0 International License.
