The aim of the project is to simulate the real-world process of conceptualizing a data analytics project and bringing unique insights using statistical modeling. More specifically, the project component of this course allows you to explore a dataset and asks you to report your experience of building a statistical model to achieve some (business) goal.
There are two due-dates for project deliverables: one intermediate and one final. See the course logistics page for the exact dates.
The intermediate deliverable is:
- Project Scope and Plan: In at most 2 pages (12 point, single column), you should explain the candidate project idea (more than one idea is acceptable but keep them related), its suitability to a machine learning workflow, as well as an exploratory analysis of the corresponding dataset (number of features, imbalance, number of samples, their marginal distributions). A detailed project plan (what work will be accomplished by a certain time, and by which team-member) should also be included.
The final deliverables are:
- Project Report: In at most 8 pages (12 point, single column, you can have an appendix for supplementary material that may or may not be checked), you should explain your creative contributions in the project (modeling assumptions, data curation, exploration, model diagnostics, insights etc).
- For example, describe how you have built a tailored statistical model(s) for your dataset.
- You should also have inferences and discussion on what went wrong, what went right and what can be improved (be technical here).
- The report can optionally be combined with code as a Jupyter notebook, and should be uploaded on Blackboard.
- Code and data: The code (e.g., Jupyter notebook, if not combined with the report above) and a small sample of the data should be provided along with the report.
- Presentation : You should also aim for a 10 minute presentation at the end of the semester (see date on the course syllabus page) explaining, via your Jupyter notebook or slides, the whole project.
-
The intermediate deliverable will be graded based on whether a complete project plan has been sufficiently described, as well as the strengths and weaknesses of the project scope and initial exploratory analysis.
-
The final deliverable will be graded based on the creativity shown in handling the data and the insights drawn. The report and code should be very clearly written and presented, and will be evaluated based on correctness, content, creativity and clarity:
- Correctness will be assessed based on the evaluation metrics used for the results, valid experimental setup and experimentation, technical correctness and the assumptions laid out, etc.
- Content will be assessed based on the novel contributions made in the project and project depth (e.g., why this data, why this problem, what did you do, visualization and interesting conclusions, insights, discussion of methodology, etc). You should try to demonstrate your understanding of the relevant topics and their use in your innovative non-trivial project.
- Creativity will be assessed based on how no-obvious your solution or contribution is and how different choices were made in the execution of the project.
- Clarity will be assessed based on the language used, the structure of the report, the references cited, the capability of explaining in a clear and professional manner, and the clarity demonstrated in your discussions etc.
Note: All external material/sources (code/idea/theory/insights) used should be cited without failure. Use of pre-trained models, databases, web servers, frontend frameworks, visualization tools etc for your report/presentation-demo is allowed and encouraged, although use of proprietary software (such as Matlab, Mathematica etc.) is discouraged. This project cannot be used as part of any other course or requirement.