A comprehensive 10-day intensive curriculum on Statistics and R for Biological Sciences. Covers data wrangling, probability distributions, hypothesis testing (t-test, ANOVA), and linear regression
This repository documents an intensive 10-day curriculum designed to build a professional foundation in Data Science and Statistical Inference using R.
- Day 1-2: R Environment, Data Structures (Vectors, DataFrames), and I/O (CSV/Excel).
- Day 3-4: Descriptive Statistics (SD, IQR) and the Grammar of Graphics with
ggplot2. - Day 5: Probability & Simulations (Normal distribution, Z-scores, and
pnorm/qnorm).
- Day 6: Central Limit Theorem (CLT) simulations and Sampling Theory.
- Day 7: Hypothesis Testing: Single, Independent, and Paired T-tests.
- Day 8: Categorical Analysis (Chi-square) and Variance Analysis (ANOVA with Tukey Post-hoc).
- Day 9: Correlation (Pearson/Spearman) and Linear Regression modeling.
The final day focuses on an end-to-end biological analysis:
- Data Cleaning: Factorizing treatments and handling missing values.
- Comparative Analysis: Running ANOVA to determine treatment efficacy.
- Predictive Modeling: Building multiple linear regression models to predict plant weight based on height and leaf count.
- Statistical Tests: t-test, ANOVA, Chi-square, Tukey HSD.
- Modeling: Linear Regression (
lm). - Visualization:
ggplot2(Boxplots, Histograms, Scatter plots). - Simulations:
replicate(),rnorm(),rchisq().