Suhi's analysis #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft

suhihanif wants to merge 6 commits into MethodsForReproducibleHealthResearch:main from suhihanif:suhi

1-data/.DS_Store

Binary file not shown.

0 raw-data/cohort.csv → 1-data/raw-data/cohort.csv

File renamed without changes.

2-analysis/assignment1_analysis.R

-Original file line number
+Diff line change
@@ -0,0 +1,105 @@
+    # clear environment
+    rm(list = ls())
+    # load libraries
+    library(tidyverse)
+    library(GGally)
+    library(ggplot2)
+    # read in cohort data
+    d <- read.csv("1-data/raw-data/cohort.csv")
+    # class(d$smoke)
+    # class(d$female)
+    # class(d$age)
+    # class(d$cost)
+    # class(d$cardiac)
+    # check if any rows have missing values
+    missing <- d[rowSums(is.na(d)) > 0,] # no missing values
+    ###############################################################################
+    # plot histogram of cost and cardiac events
+    ###############################################################################
+    # create categorical variable for cardiac status
+    d <- d %>%
+      mutate(cardiac_cat = case_when(cardiac == 0 ~ "No cardiac event",
+                                     cardiac == 1 ~ "Cardiac event"))
+    # create density plot of cost by cardiac event status
+    plot <- ggplot(d, aes(x=cost, fill=cardiac_cat)) +
+      geom_density(alpha = 0.7) +
+      ggtitle("Density plot of cost by cardiac event status") +
+      labs(fill="") +
+      xlab("Cost") +
+      ylab("Density") +
+      theme_minimal()
+    # save density plot
+    ggsave(filename = "density_plot_cost_cardiac.jpg", plot = plot, path = "3-output")
+    # remove categorical variable for cardiac event
+    d <- d %>% select(-cardiac_cat)
+    ###############################################################################
+    # create table to describe variables
+    ###############################################################################
+    # calculate stats for those with cardiac events
+    d_cardiac <- d %>% filter(cardiac == 1) %>%
+      select(-cardiac) %>%
+      summarise(across(everything(), mean)) %>%
+      pivot_longer(cols = everything(), names_to = "Variable", values_to = "Mean/% (In those with cardiac events)")
+    d_cardiac <- d_cardiac %>% mutate("N (Cardiac events)" = nrow(d %>% filter(cardiac == 1)))
+    # variables that should be percentages
+    percentage_vars <- c("smoke", "female")
+    # convert Mean/% column values from mean to percentage if relevant
+    d_cardiac <- d_cardiac %>%
+      mutate("Mean/% (In those with cardiac events)" = ifelse(Variable %in% percentage_vars,
+                               paste0(sprintf("%.1f", `Mean/% (In those with cardiac events)` * 100), "%"),
+                               sprintf("%.1f", `Mean/% (In those with cardiac events)`)))
+    # calculate stats for those without cardiac events
+    d_control <- d %>% filter(cardiac == 0) %>%
+      select(-cardiac) %>%
+      summarise(across(everything(), mean)) %>%
+      pivot_longer(cols = everything(), names_to = "Variable", values_to = "Mean/% (In those without cardiac events)")
+    d_control <- d_control %>% mutate("N (No cardiac events)" = nrow(d %>% filter(cardiac == 0)))
+    # convert Mean/% column values from mean to percentage if relevant
+    d_control <- d_control %>%
+      mutate("Mean/% (In those without cardiac events)" = ifelse(Variable %in% percentage_vars,
+                                                              paste0(sprintf("%.1f", `Mean/% (In those without cardiac events)` * 100), "%"),
+                                                              sprintf("%.1f", `Mean/% (In those without cardiac events)`)))
+    # join columns
+    stats <- d_control %>%
+      left_join(d_cardiac, by = "Variable")
+    # label variables
+    labels <- c("Smoker", "Is female", "Age in years", "Cost of treatment (in dollars)")
+    stats <- stats %>%
+      mutate(Variable = labels)
+    # save table 1
+    write.csv(stats, "~/Documents/BIOMEDIN251/Assignment2/3-output/table1.csv", row.names = FALSE)
+    ###############################################################################
+    # regression analysis
+    ###############################################################################
+    # run logistic regression
+    glm.fit <- glm(cardiac ~ smoke + female + age, data = d, family = "binomial")
+    summary(glm.fit)
+    # print OR and CI for cardiac events by smoking status
+    print(OR_smoke <- exp(glm.fit$coefficients["smoke"]))
+    print(CI <- exp(confint(glm.fit)))

3-output/.DS_Store

Binary file not shown.

3-output/density_plot_cost_cardiac.jpg

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

3-output/table1.csv

-Original file line number
+Diff line change
@@ -0,0 +1,5 @@
+    "Variable","Mean/% (In those without cardiac events)","N (No cardiac events)","Mean/% (In those with cardiac events)","N (Cardiac events)"
+    "Smoker","13.8%",4750,"53.2%",250
+    "Is female","57.8%",4750,"11.6%",250
+    "Age in years","43.9",4750,"45.5",250
+    "Cost of treatment (in dollars)","9127.5",4750,"9892.3",250

Assignment2.Rproj

-Original file line number
+Diff line change
@@ -0,0 +1,14 @@
+    Version: 1.0
+    ProjectId: 2fe8802a-f9b3-4f07-ad26-7ccd6f52f3c7
+    RestoreWorkspace: Default
+    SaveWorkspace: Default
+    AlwaysSaveHistory: Default
+    EnableCodeIndexing: Yes
+    UseSpacesForTab: Yes
+    NumSpacesForTab: 2
+    Encoding: UTF-8
+    RnwWeave: Sweave
+    LaTeX: pdfLaTeX

README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,5 +1,34 @@
  
    # Assignment #2 Repository

    This repository includes the simulated data for Assignment #2. Fork this repository and add your analysis as described in the canvas assignment.

    This repository uses the simulated data for Assignment #2 from the public repository https://github.com/MethodsForReproducibleHealthResearch/Assignment2. 

    The csv file for `cohort` in the `raw-data` folder includes 5,000 observations with variables `smoke`, `female`, `age`, `cardiac`, and `cost`.

    Variables in the dataset (my interpretation):

      - smoke: binary variable indicating that the patient is a smoker (1) or non-smoker (0)

      - female: binary variable indicating that the patient is female (1) or male (0)

      - age: continuous variable indicating patient's age in years 

      - cardiac: binary variable indicating any cardiac event (1) and no event (0)

      - cost: continuous variable indicating cost of hospitalization in dollars 

    Below is a description of the repository structure:

              1-data : contains raw data in the "raw-data folder"

              2-analysis: contains the analysis script

              3-output: contains tables and figures generated by the analysis script

    Methods:

    I used a generalized linear model with a binomial family to evaluate the association between 

    being a smoker and having any cardiac event. The model was adjusted for potential confounders 

    gender and age. 

    Summary of results:

    - Table 1 contains descriptive statistics for variables grouped by cardiac event status. There was no missing data. 

    - The logistic regression results show that being a smoker is associated with a 8.3 fold higher adjusted odds of 

      having a cardiac event compared to not being a smoker (95% CI:6.3, 10.9). The model was adjusted for gender and age. 

    - The density plot "density_plot_cost_cardiac" shows that on average those who experienced any cardiac event 

      have higher costs compared to those who did not. 

    Statement:

    I did not use generative AI technology (e.g., ChatGPT) to complete any portion of the work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suhi's analysis #4

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Suhi's analysis #4

Are you sure you want to change the base?

Uh oh!

Suhi's analysis #4

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing