From d363d9d09c7ad87ed830150d0cbc3e313662a02e Mon Sep 17 00:00:00 2001 From: Sathvik Thogaru <89151053+ThogaruSathvik@users.noreply.github.com> Date: Mon, 29 Aug 2022 18:09:34 -0400 Subject: [PATCH 1/4] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1f07cf3..b6632d6 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ This is the generic DACSS course blog hosted on GitHub pages. Students will crea # Setup R Course blog (for Students) If this is your first time setting up a DACSS course blog repository, or if you have forgotten how, please consult the youtube playlist on setting up your GitHub account, linking to R, and setting up a course blog. -# Weekly Workflow (for Students) +# Weekly Workflow (for Students) [Video Link](https://www.loom.com/share/6c15f27ed592423c96613f8f876548cf) - Create a New Post - Render the .qmd file - Commit and Push Changes to Github From 821b5cc28eac03308c00020389c954e450363085 Mon Sep 17 00:00:00 2001 From: Sathvik Thogaru <89151053+ThogaruSathvik@users.noreply.github.com> Date: Mon, 29 Aug 2022 18:23:18 -0400 Subject: [PATCH 2/4] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b6632d6..fded441 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ This is the generic DACSS course blog hosted on GitHub pages. Students will create a templated repository, work in RStudio to create new posts, and then commit and push the changes prior to submitting a pull request to main repository. # Setup R Course blog (for Students) -If this is your first time setting up a DACSS course blog repository, or if you have forgotten how, please consult the youtube playlist on setting up your GitHub account, linking to R, and setting up a course blog. +If this is your first time setting up a DACSS course blog repository, or if you have forgotten how, please consult the [youtube playlist](https://www.youtube.com/watch?v=8ozMX5V_ESk&list=PL8U9JlL13ieeR7QqDM1R8dpvvFWBjNY4N) on setting up your GitHub account, linking to R, and setting up a course blog. # Weekly Workflow (for Students) [Video Link](https://www.loom.com/share/6c15f27ed592423c96613f8f876548cf) - Create a New Post From 4c542078ae850bbe25365cfe9a8a8b9e59f4549a Mon Sep 17 00:00:00 2001 From: Sathvik Thogaru <89151053+ThogaruSathvik@users.noreply.github.com> Date: Tue, 30 Aug 2022 00:18:43 -0400 Subject: [PATCH 3/4] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index fded441..7a369d3 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,8 @@ This is the generic DACSS course blog hosted on GitHub pages. Students will crea # Setup R Course blog (for Students) If this is your first time setting up a DACSS course blog repository, or if you have forgotten how, please consult the [youtube playlist](https://www.youtube.com/watch?v=8ozMX5V_ESk&list=PL8U9JlL13ieeR7QqDM1R8dpvvFWBjNY4N) on setting up your GitHub account, linking to R, and setting up a course blog. -# Weekly Workflow (for Students) [Video Link](https://www.loom.com/share/6c15f27ed592423c96613f8f876548cf) +# Weekly Workflow (for Students) +[Video Link](https://www.loom.com/share/6c15f27ed592423c96613f8f876548cf) - Create a New Post - Render the .qmd file - Commit and Push Changes to Github From c9a5146403606f89f60302457fc8154b3edbc82c Mon Sep 17 00:00:00 2001 From: kkimble25 <98501637+kkimble25@users.noreply.github.com> Date: Mon, 3 Oct 2022 15:46:47 -0400 Subject: [PATCH 4/4] Kimble HW 1 --- DACSS 603 HW 1.qmd | 225 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 225 insertions(+) create mode 100644 DACSS 603 HW 1.qmd diff --git a/DACSS 603 HW 1.qmd b/DACSS 603 HW 1.qmd new file mode 100644 index 0000000..27b4019 --- /dev/null +++ b/DACSS 603 HW 1.qmd @@ -0,0 +1,225 @@ +--- +title: "DACSS 603 HW 1 Kimble" +author: "Karen Kimble" +description: "DACSS 603 HW 1" +date: "10/03/2022" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +editor: visual +--- + +# Question 1: Lung Capacity + +### Setup + +```{r} +library(dplyr) +library(readxl) +library(tidyverse) +knitr::opts_chunk$set(echo = TRUE) + +# Reading in File +LungCapData <- read_excel("LungCapData.xls") +``` + +## Part A: Distribution + +```{r} +hist(LungCapData$LungCap) +``` + +The histogram above shows that the Lung Cap data is roughly normally distributed because a majority of the observations are centered around the mean. There are fewer observations at the tail ends of the histogram. + +## Part B: Probability Distribution of LungCap (Males vs. Females) + +```{r} +boxplot(LungCap ~ Gender, data = LungCapData, main = "Lung Capacity by Gender", + xlab = "Gender", ylab = "Lung Capacity") +``` + +From the box-plots above, it appears that males in this study had slightly higher lung capacities than females, with the median for males at 9 and the median for females at 8. However, both genders had large ranges, but these ranges reflected the overall pattern of males having slightly higher lung capacities. + +## Part C: Smokers vs. Non-Smokers + +```{r} +smokers <- filter(LungCapData, Smoke == "yes") +mean(smokers$LungCap) + +nonsmokers <- filter(LungCapData, Smoke == "no") +mean(nonsmokers$LungCap) +``` + +The mean lung capacity for smokers (8.65) is higher than the mean lung capacity for non-smokers (7.77). Based on what we now know about how smoking affects the lungs, these results don't seem to make sense. However, there is the possibility that smokers may be more used to deep inhales/exhales and therefore could have better lung capacity until the substance has more of an effect on their lungs. There may also be external factors that led to these results that aren't clear from the data right now. + +## Part D: Lung Capacity by Smoker/Non-Smoker and Age + +```{r} +LungCapData <- within(LungCapData, { + Age.group <- NA + Age.group[Age <= 13] <- "13 and Under" + Age.group[Age >= 14 & Age <= 15] <- "14-15" + Age.group[Age >= 16 & Age <= 17] <- "16-17" + Age.group[Age >= 18] <- "18 and Over" +} ) +``` + +### Smokers + +```{r} +# Boxplots + +smoking_age <- filter(LungCapData, Smoke == "yes") + +boxplot(LungCap ~ Age.group, data = smoking_age, + main = "Lung Capacity of Smokers by Age Group", + xlab = "Age Group", ylab = "Lung Capacity") +``` + +From the boxplot above, we can see that smokers' lung capacities reach about a maximum of 12 as age increases, but there is not very much improvement in the maximums. The medians move a bit more as age increases, but still not very dramatically after ages 14 and 15. Smokers that are 18 and over have higher lung capacities overall, but this may just be because of natural aging processes and development. + +```{r} +# Means + +smoking_age %>% + group_by(Age.group) %>% + summarise_at(vars(LungCap), list(name = mean)) +``` + +We see the same trend in means as in the medians: mean lung capacity to increases as the age increases. + +### Non-Smokers + +```{r} +# Boxplot + +nonsmoking_age <- filter(LungCapData, Smoke == "no") + +boxplot(LungCap ~ Age.group, data = nonsmoking_age, + main = "Lung Capacity of Non-Smokers by Age Group", + xlab = "Age Group", ylab = "Lung Capacity") +``` + +In non-smokers, we see the same trend of increasing lung capacities as age increases, but the median lung capacities in the two older age groups in the non-smoking group are higher than those in the smoking group. There are also more outliers for non-smokers, especially in the 14-15 category. + +```{r} +# Means + +nonsmoking_age %>% + group_by(Age.group) %>% + summarise_at(vars(LungCap), list(name = mean)) +``` + +The means of the non-smoking group by age follow the same trend as the medians, as well as in the smoking group. However, the mean lung capacity for the oldest two age groups in the non-smoking category are higher than the means for those groups in the smoking category. + +## Part E: Lung Capacities for Smokers and Non-Smokers within Age Group + +### 13 and Under + +```{r} +LungCapData %>% + filter(Age.group == "13 and Under") %>% + group_by(Smoke) %>% + summarise_at(vars(LungCap), list(name = mean)) +``` + +The mean lung capacity for smokers is higher than the mean lung capacity for non-smokers in the age group 13 and under, which mirrors the general means we found earlier. However, from the boxplot of Smokers by Age Group, we can see that there is a very low outlier in this age group, which might be affecting the mean for this group as well as overall smokers. + +### 14-15 + +```{r} +LungCapData %>% + filter(Age.group == "14-15") %>% + group_by(Smoke) %>% + summarise_at(vars(LungCap), list(name = mean)) +``` + +In this age group, the mean lung capacity for non-smokers is higher than the mean lung capacity for smokers--unlike the younger group. + +### 16-17 + +```{r} +LungCapData %>% + filter(Age.group == "16-17") %>% + group_by(Smoke) %>% + summarise_at(vars(LungCap), list(name = mean)) +``` + +The same trend continues in this age group, with the mean lung capacity in non-smokers ages 16 and 17 higher than the mean lung capacity of smokers in this group. Yet as the ages increase, the mean lung capacities for non-smokers and smokers increase about the same amount (by 1). + +### 18 and Over + +```{r} +LungCapData %>% + filter(Age.group == "18 and Over") %>% + group_by(Smoke) %>% + summarise_at(vars(LungCap), list(name = mean)) +``` + +In this oldest age group, the same trend continues: the mean lung capacity for non-smokers is higher than that of smokers. This pattern in the groups 18+, 16-17, and 14-15 are not found in the overall means for smokers and nonsmokers, suggesting that the outlier in the 13 and Under group might have brought down the overall mean for smokers. + +## Part F: Correlation & Covariance + +```{r} +# Correlation + +cor(LungCapData$Age, LungCapData$LungCap, use = "everything") +``` + +The correlation between lung capacity and age is positive and strong. As age increases, lung capacity also increases. The value of 0.8 is close to 1, meaning there is a somewhat strong relationship between the two variables. + +```{r} +# Covariance + +cov(LungCapData$Age, LungCapData$LungCap, use = "everything") +``` + +The covariance is positive, meaning that there is a positive relationship between the varaibles, which is also clear from the correlation (since the correlation coefficient is a function of the covariance). Age and lung capacity have an overall positive relationship: as age increases, so does lung capacity. + +# Question 2: Prisoner Convictions + +## Part A + +```{r} +160/810 +``` + +## Part B + +```{r} +(434 + 128)/810 +``` + +## Part C + +```{r} +(160 + 434 + 128)/810 +``` + +## Part D + +```{r} +(64 + 24)/810 +``` + +## Part E + +```{r} +# Creating vector +convict <- c(rep(0, 128), rep(1, 434), rep(2, 160), rep(3, 64), rep(4, 24)) + +weighted.mean(convict) +``` + +The expected value for the number of prior convictions is 1.27--but since prior convictions have to be a whole number, that would be rounded to 1. + +## Part F + +```{r} +var(convict) + +sd(convict) +```