-
Notifications
You must be signed in to change notification settings - Fork 38
Final Project
This page contains information about the final project.
The point of the Project Assignments is to try out the skills you've learned in the course on your own dataset.
Here a video from Sune that summarize some of the concepts for the Final Project. Make sure to watch it and to carefully read the guidelines below!
This year I am going to set one constraint on your project: your application should be green! It should be related to an environmental topic so that if you'd like to try you could also submit an abstract to the Green Challenge (Go and check it out). There are many topics related to Climate Change, Environmental issues, and Sustainability, so you should have enough to choose from. However, keep in mind that this course is about Social Data, so if you choose to look at temperature, weather, etc. should be in relation to some social activities (what is the impact of temperature on X?, how does weather change behavior Y?, etc.).
- Note: you can use multiple data sources in your project.
In previous versions of the class, I've limited the projects to be about city data. I still think that's a great choice of data source to work on if it fits in your idea. So I link below some city datasets you can check out (some of them have environmental data too):
- New York City https://nycopendata.socrata.com
- San Francisco https://data.sfgov.org
- Copenhagen http://data.kk.dk. Better https://portal.opendata.dk
- Melbourne https://data.melbourne.vic.gov.au
- Helsinki http://www.hri.fi/en/
But there are many more awesome data sets out there that are waiting for you!
For inspiration for what to do with data like this, I recommend you sit down and listen to a podcast to get in the right mindset. It's an interview with Ben Wellington, the author of the site http://iquantny.tumblr.com/ (it's a super cool blog, so remember to scroll through that page to check out some of the many projects Ben has worked on).
Check out this podcast for inspiration. It's a 40 minute listen, but well worth your time. After listening to this I predict that you'll be brimming with ideas for what to start working on. (Or at least have an idea of where to get started).
The overall idea is to take a deep dive into some aspect of a city/environmental dataset and try to understand that data using the tools you've learned in this class. Once you understand the data, you should tell the story of what you've found on your own website using visualizations, with Bokeh and whichever other tools you need.
So it's not just cool data viz, it's:
- Data analysis and understanding, then
- Using narrative data viz to communicate what you're learned.
The first part of the final project is an 1 minute movie, which should quickly pitch the central idea/concept that you will investigate in your final project. You're making the movie so that the TAs and I can give you feedback, and so that other groups can steal your ideas (and you can steal ideas from them). The movie must contain the following:
- An explanation of the central idea behind your final project, e.g. think about questions such as
- What is the idea?
- Which datasets do you need to explore the idea?,
- Why is it interesting?
- A mock up of the visualization that you wish to build. (Anything is fine here. Pen and paper, MS Paint, Inkscape, Bokeh, anything.).
- Make sure you answer the questions
- What genre is it? (for Genres, see section 4.3 of the Segel and Heer paper)
- Why is that genre right for telling the story you want to communicate with the data
- A walk-through of your preliminary data-analysis, addressing
- What is the total size of your data? (MB, number of rows, number of variables, etc)
- What are other properties? (What is the date range? Is it geo-data?, then a quick plot of locations, etc.)
- Show the fundamental aspects of the data (similar to the work we did on SF crime data for week 3)
But other than that, there are no constraints. And we do appreciate funny/inventive/beautiful movies, although the academic content is most important. Note that we'll display the movie to the entire class. Here you can find links to the videos from last year!
Important: The maximum length is 1 minute, so you will need to be disciplined & speedy about what you put in there. Movies longer than 1 minute will be stopped at the 1 minute mark!
Handing in the assignment: Simply upload your video to youtube or another video hosting site (the higher the resolution the better) and submit the link to peergrade.
The deliverables for the Final project are
A website with your visualizations an accompanying text. I recommend you structure it as a kind of narrative data story (cf. the Segel paper we read during Lecture 8). The website should tell the story about the data that you're interested in getting across. In the simplest, most minimalist case, the website can be a very nice Jupyter Notebook hosted on nbviewer.
- It should contain visualizations to let the reader explore the data that you're interested in getting across and to explain the findings you want to communicate. It is a plus if some of them are interactive.
Your analysis behind the scenes can be technical and as advanced as you like (in fact the goal is to show you can combine data analysis, machine learning, and data visualization), but the website itself should not be technical, but rather aim at using visualization and explanation to get your data driven insights across to a non-scientific reader.
The idea is that you can create much more complex, dynamic and interactive analysis (and visualizations) using the possibilities available when you're creating a website. So it is a way for you to present your work in a way that everyone can understand it (like something you could show your parents).
An explainer Jupyter Notebook. The explainer notebook should contain all the behind the scenes data-analysis stuff, details on the dataset, why you've selected these particular visualizations, explanations methodology, etc.
The main point of the website is to present your idea/analyses to the world in a way that showcases your use of what you've learned in class. The website should be self-contained and tell the story without the need for the details in the explainer notebook (the purpose of the explainer notebook is to provide additional details for interested/scientific readers).
Here, you can find a tutorial to create a website with Hugo and GitHub Pages!
The notebook should contain your analysis and code. Please structure it into the following sections
- Motivation.
- What is your dataset?
- Why did you choose this/these particular dataset(s)?
- What was your goal for the end user's experience?
- Basic stats. Let's understand the dataset better
- Write about your choices in data cleaning and preprocessing
- Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis.
- Data Analysis.
- Describe your data analysis and explain what you've learned about the dataset.
- If relevant, talk about your machine-learning.
- Genre.
- Which genre of data story did you use?
- Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why?
- Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?
- Visualizations.
- Explain the visualizations you've chosen.
- Why are they right for the story you want to tell?
- Discussion. Think critically about your creation
- What went well?
- What is still missing? What could be improved? Why?
Contributions. Who did what?
*You should write (just briefly) which group member was the main responsible for which elements of the assignment. (I want you guys to understand every part of the assignment, but usually there is someone who took lead role on certain portions of the work. That's what you should explain).
- It is not OK simply to write "All group members contributed equally".
- Make sure that you use references when they're needed and follow academic standards.
Handing in the assignment: Simply upload the link to your website to peergrade.
This class has been hand crafted for you by Sune Lehmann and Anna Sapienza
This work is licensed under a Creative Commons Attribution 4.0 International License.
