Skip to content

jschoof25/Data_Science_ML_Assignments

Repository files navigation

Data Science Assignments in R

Data analysis homework assignments in R from four graduate courses in the biostatistics department at the University of Washington. From Fall 2020 and Spring 2021.

Biomedical Data Science (BIOST 544)

Homework 1

This analysis describes and evaluates the effects of the treatment, TFD725, on the probability of surviving more than 400 days across age groups in a Phase II clinical trial of 188 patients with non-small cell lung cancer.

Methods Used:

  • Random Binomial Distribution

Files:

  • Code: "HW1_Biost544_Schoof_20Oct2020.Rmd"
  • Dataset: "nsclc-modified.txt"
  • Final Output: "HW1_Biost544_Schoof_20Oct2020.html"

How to replicate:

  • download dataset.
  • change working directory in line 11 to match the file path for where you saved the dataset.

Homework 2

The purpose of this analysis is to evaluate the effectiveness of a treatment being studied in a randomized trial using simulation.

Methods Used:

  • Rerandomization and simulation to conduct a data-driven t test.

Files:

  • Code: "HW2_Biost544_Schoof_29Oct2020.Rmd"
  • Dataset: "HW2-adaptive-trial.txt"
  • Final Output: "HW2_Biost544_Schoof_29Oct2020.html"

How to replicate:

  • download dataset.
  • change working directory in line to match the file path for where you saved the dataset.

Homework 3

The purpose of this analysis is investigate the relationship between gene expression, as defined by 54,675 different probe sets, and the percentage of necrotic tissue in a tumor. The data sets being analyzed includes genetic and clinical data on 152 patients.

Methods Used:

  • Lasso model for feature selection and k-fold cross validation to determine which gene probes are predictive of cancerous tissue.

Files:

  • Code: "HW2_Biost544_Schoof_29Oct2020.Rmd"
  • Dataset: "HW2-adaptive-trial.txt"
  • Final Output: "HW2_Biost544_Schoof_29Oct2020.html"

How to replicate:

  • download dataset.
  • change working directory in line to match the file path for where you saved the dataset.

Homework 4

The purpose of this analysis is to assess the effect of smoking on bone mass density (BMD) in middle aged women.

Methods Used:

  • Inverse probability weighting method in order to estimate 95% cofidence intervals for each estimate

Files:

  • Code: "HW2_Biost544_Schoof_29Oct2020.Rmd"
  • Dataset: "HW2-adaptive-trial.txt"
  • Final Output: "HW2_Biost544_Schoof_29Oct2020.html"

How to replicate:

  • download dataset.
  • change working directory in line to match the file path for where you saved the dataset.

Machine Learning for Public Health Big Data

Categorical Data Analysis

Longitudinal and Multilevel Data Analysis

About

Data analysis homework assignments in R from courses in biomedical data science and ML.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published