-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.qmd
More file actions
135 lines (92 loc) · 5.21 KB
/
index.qmd
File metadata and controls
135 lines (92 loc) · 5.21 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# About this Course {.unnumbered}
```{r}
#| echo: false
#| eval: false
#| results: 'asis'
source("_common.R")
status("complete")
```
<!--## Course Description {.unnumbered}-->
Model building and evaluation are necessary but not sufficient skills for the effective practice of data science. In this module you will develop the technical and personal skills that are required to work successfully as a data scientist within an organisation.
<!--
::: small_right
<img src="images/EDS-logo.jpg" alt="Logo"/>
:::
test line that will not appear because it is a comment
-->
During this module you will critically explore how to:
- effectively scope and manage a data science project;
- work openly and reproducibly;
- efficiently acquire, manipulate, and present data;
- interpret and explain your work for a variety of stakeholders;
- ensure that your work can be put into production;
- assess the ethical implications of your work as a data scientist.
This interdisciplinary course will draw from fields including statistics, computing, management science and data ethics. Each topic will be investigated through a selection of lecture videos, conference presentations and academic papers, supported by hands-on lab exercises and readings on industry best-practices as published by recognised professional bodies.
## Schedule {-}
These notes are intended for students on the course **MATH70076: Data Science** in the academic year 2024/25.
As the course is scheduled to take place over five weeks, the suggested schedule is:
- 1st week: effective data science workflows;
- 2nd week: acquiring and sharing data;
- 3rd week: exploratory data analysis and visualisation;
- 4th week: preparing for production;
- 5th week: ethics and context of data science.
An alternative pdf version of these notes may be downloaded [here](./Effective-Data-Science.pdf). Please be aware that this pdf version is secondary to this course webpage and will be updated less frequently.
## Learning outcomes {-}
On successful completion of this module students should be able to:
1. Independently scope and manage a data science project;
2. Source data from the internet through web scraping and APIs;
3. Clean, explore and visualise data, justifying and documenting the decisions made;
4. Evaluate the need for (and implement) approaches that are explainable, reproducible and scalable;
5. Appraise the ethical implications of a data science projects, particularly the risks of compromising privacy or fairness and the potential to cause harm.
## Allocation of Study Hours {-}
**Lectures:** 10 Hours (2 hours per week)
**Group Teaching:** 5 Hours (1 hour per week)
**Lab / Practical:** 10 hours (2 hours per week)
**Independent Study:** 100 hours (11 hours per week + 45 hours coursework)
**Drop-In Sessions:** Each week there will be an optional drop-in session to address any questions about the course or material. This is where you can get support from the course lecturer or GTA on the topics covered each week, either individually or in small groups.
These will be held on Fridays 14:00-15:00 in Huxley 711C.
<!--**Office Hours:** Additionally, there will be an office hour each week. This is a weekly opportunity for 1-1 discussion with the course lecturer to address any individual questions, concerns or problems that you might have. These meetings can be in person or on Teams and can be academic (relating to course content or progress) or pastoral (relating to student well-being) in nature. To book a 1-1 meeting please use the link on the course blackboard page.
Office hours will be held on Mondays 15:00-16:00 in Huxley 6M20. (Week 3 alteration, 14:00-15:00) -->
## Assessment Structure {-}
The course will be assessed entirely by coursework, reflecting the practical and pragmatic nature of the course material.
**Coursework 1 (30%):** A reproducible data journalism task, to be completed during one week of the course.
**Coursework 2 (70%):** A student-led development of a data product, e.g. an in depth statistical analysis, portfolio website, R package or dashboard. To be released during the course and submitted following the examination period in Summer term.
## Acknowledgements {-}
These notes were created by Dr Zak Varty. They were inspired by a previous lecture series by Dr Purvasha Chakravarti at Imperial College London and draw from many resource that were made available by the R community, which are attributed throughout.
```{r setup}
#| include: false
#| message: false
#library(tidyverse)
#library(lax)
#library(ismev)
#library(evir)
#library(lubridate)
#library(xts)
#library(qrmdata)
#library(tseries)
#source("../labs/extremes-functions.R")
knitr::opts_chunk$set(
fig.path = "images/",
echo = FALSE,
out.width = "85%",
fig.align = "center",
message = FALSE,
fig.width = 8.5,
fig.asp = 0.7)
ggplot2::theme_set(ggplot2::theme_bw(12))
colorise <- function(x, color = "red") {
if (knitr::is_latex_output()) {
sprintf("\\textcolor{%s}{%s}", color, x)
} else if (knitr::is_html_output()) {
sprintf("<span style='color: %s;'>%s</span>", color,
x)
} else x
}
```
```{r}
#| include: false
# automatically create a bib database for R packages
knitr::write_bib(c(
.packages(), 'bookdown', 'knitr', 'rmarkdown'
), 'packages.bib')
```