This codebook breaks down algorithm of code run_analysis.R. The algorithm has the following key steps:
- Downloading and unziping data from a given link to a folder UCI HAR Dataset. Prior downloading it checks whether the data has been already downloaded.
- Using command
read.tablereading and writing the following data:
- features.txt to a
data_names(chr [1:561]) that is lately used for naming columns of output data. The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. - activity_labels.txt to a data frame
activity_label(6x2 df). This data frame links the class labels with their activity name. - subject_test.txt and subject_train.txt to a data frames
subject_test(2947x1 df) andsubject_train(7352x1 df), respectively. Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30. - X_test.txt and X_train.txt to a data frames
x_test(2947x561 df) andx_train(7352x561 df), respectively. Test and training sets' data is collected here. Names for columns are taken fromdata_names. - Y_test.txt and Y_train.txt to a data frames
y_test(2947x1 df) andy_train(7352x1 df), respectively. These data sets contain training activity number.
- Using
factorandsapplycommands onactivity_label,y_testandy_traindataframes, character labels are assignd to each observation of both test and train datasets creating new data frames calledy_test_labeled(chr [1:2947]) andy_train_labeled(chr [1:7352]). - Using
cbindcommand, above data frames are column binded to asubject_testandsubject_traindata sets, respectively, creating data frame with subject number and activity done during each observation for both test and training sets. New data frames were namedtest_subj_act_label(2947x2 df) andtrain_subj_act_label(7352x2 df). - Using
rbindcommand,x_testandx_traindata frames were row binded together to form a new data frame calledx_data(10299x561 df). The same wayy_data(10299x2 df) was created fromy_testandy_traindata frames. - Again using
cbindcommandmerged_data(10299x563 df) was obtained by bindingy_dataandx_datatogether. - Using
select, only first two columns and columns that contain mean and std in the name were selected to form new data frame calledmean_sd_data(10299x88 df) - using
gsubcommand, namings of almost all columns were cleaned based on the specific pattern shown in the code. - Finally, the following code:
mean_sd_data %>% group_by(Subject, Activity) %>% summarise_all(.funs = mean)was used in order to obtain new data framaverage_data(180x88 df) containing means (using commandsummarise_all) and grouped by Subject and Activity (using commandgroup_by) - Finally,
average_datawas written to a file average_data.txt using commandwrite.table