| title | Reproducible Research: Peer Assessment 1 | ||||||
|---|---|---|---|---|---|---|---|
| output |
|
I begin by loading the necessary packages and importing the data.
library(dplyr)
library(ggplot2)
data = read.csv(unz("activity.zip", "activity.csv"), colClasses = c("integer", "Date", "integer"))Here is the distribution of the total number of steps taken in a day.
dailysteps = summarize(group_by(data,date), numsteps = sum(steps,na.rm = TRUE))
qplot(dailysteps$numsteps)## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
The mean number of steps taken in a day is:
mean(dailysteps$numsteps)## [1] 9354.23
and the median number of steps is:
median(dailysteps$numsteps)## [1] 10395
Below is a plot of the average steps taken in each five minute interval across all days.
intervalsteps = summarize(group_by(data,interval), avesteps = mean(steps,na.rm = TRUE))
qplot(interval,avesteps, data = intervalsteps, geom = "path")The maximum average number of steps occurs during this five minute interval:
intervalsteps$interval[max(intervalsteps$avesteps)==intervalsteps$avesteps]## [1] 835
There are:
sum(is.na(data$steps))## [1] 2304
intervals with missing values.
To impute these missing values I use the average number of steps taken in the five minute interval across all days when that interval is not missing.
data_imp = data %>%
group_by(interval) %>%
mutate(imputesteps = mean(steps ,na.rm = TRUE))
data_imp$steps[is.na(data_imp$steps)] = data_imp$imputesteps[is.na(data_imp$steps)]
data_imp = select(data_imp,-imputesteps)Now that the data is imputed we can view the histogram for the total steps taken in a day.
dailysteps_imp = summarize(group_by(data_imp,date), numsteps = sum(steps,na.rm = TRUE))
qplot(dailysteps_imp$numsteps)## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Note that the imputation has increased the estimates of the average and median number of steps taken on each day:
mean(dailysteps_imp$numsteps)## [1] 10766.19
median(dailysteps_imp$numsteps)## [1] 10766.19
Unsurprisingly, there are differences between weekday and weekends in terms of steps taken in any five minute interval.
data_imp$weekday = weekdays(data_imp$date)
data_imp$weekend = "Weekday"
data_imp$weekend[data_imp$weekday == "Saturday" | data_imp$weekday == "Sunday"] = "Weekend"
data_imp$weekend = as.factor(data_imp$weekend)
intervalsteps = summarize(group_by(data_imp,interval,weekend), avesteps = mean(steps,na.rm = TRUE))
qplot(interval,avesteps, data = intervalsteps, facets = weekend ~ ., geom = "path")


