Skip to content

datasci-harris/DAP2-final-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Data Skills 2 - R

Fall Quarter 2024

Final Project: Reproducible Research

Due: December 7

Template for Final Project in PPHA 30536, Data and Programming in R 2. Current version: fall 2024.

Fork this assignment and use it to submit the code and writeup portion of the project. See "Final Project Instructions.pdf" for full instructions.

Coding (70%)

The code for the project should have following components:

1. Data wrangling (20%)

You must use a minimum of three datasets, at least one of which should be retrieved automatically from the web using APIs or web scraping. All processing of the data should be handled by your code, including all merging and reshaping. Any automatic data retrieval must have an option to toggle accessing the web off if the data is already downloaded. This is where you can showcase your abilities practiced in homework 1.

2. Plotting (20%) From that data, you will create a minimum of two static plots using ggplot, and two interactive Shiny plots. Your Shiny does not have to be shared on shinyapps.io. The skills used here will roughly correspond to your work on homework 2.

3. Text processing (10%) You will now introduce some form of text analysis, similar to that of homework 3. While it should relate to your broader question, this may be distinct data from what you created in part 1, and the results of it may be used in your plotting or analysis.

4. Analysis (10%) Then you will fit a model to your data and report basic results. As this is not a statistics or econometrics class, the model you choose and the validity of your results are not terribly important; fitting an OLS model with fixed effects that has insignificant p-values will not be penalized. The goal is to show you can prepare your data through the previous steps to have it ready for model fitting.

5. Reproducibility (10%)

The project and files should be structured and documented so that someone could fork your repository and reproduce your results. This means that your README should document the order in which codes should be run, and what needs to be edited (e.g., where the user should set their path) by the user. If the dataset is retrieved automatically, then the final results do not have to reproduce exactly but the code should run smoothly even if the underlying data changes. Any plots or other output produced by your code should be included in your repository as well.

Writeup (15%)

You will then spend no more than 2-3 pages writing up your project. You should describe your research question, then discuss the approach you took and the coding involved, including discussing any weaknesses or difficulties encountered. Finish with a brief discussion of results, and how this could be fleshed out in future research. The primary purpose of this writeup is to inform me of what I am reading before I look at your code.

About

Template for Final Project in PPHA 30536, Data and Programming in R 2. Current version: winter 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors