Self-reported LLM usage and outcomes on a data science project: Evidence from two undergraduate data science courses

Overview

To help understand the effect of Large Language Models (LLMs) on data science practice we examine the extent to which self-reported LLM usage is correlated with the mark that a student received on a final paper in a classroom data science setting. We find some mild evidence from this observational study that LLM usage may be associated with better scores, especially for students who do not natively speak English. Additionally, comparing self-reported usage, there was a considerable increase between the class that occurred in January-April 2024 where 41 per cent of students self-reported extensive LLM usage and the class that occurred in September-December 2024 where 69 per cent reported extensive LLM usage. Despite the classroom setting used for evaluation, the task of interest is similar to the work done by professional data scientists. Our finding suggests the need for more extensive work evaluating how LLMs can be integrated into the data science workflow in a way that provides value in both the classroom and the workplace.

File structure

The repo is structured as:

data contains the data.
models contains fitted models,and also details of full model outputs and diagnostics in pp_check_and_diagnostics.pdf.
other contains example_student_submissions which contains three examples of student submissions and instructions which details the instructions and rubric provided to students.
paper contains the files used to generate the paper, including the Quarto and bibtex files, as well as the PDF of the paper.
scripts contains the R scripts used to clean and model the data.

Statement on LLM usage

Aspects of the code were written with the help of GitHub Copilot and GPT-4o.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
data		data
models		models
other		other
paper		paper
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
llms-achievement.Rproj		llms-achievement.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-reported LLM usage and outcomes on a data science project: Evidence from two undergraduate data science courses

Overview

File structure

Statement on LLM usage

About

Uh oh!

Releases 1

Packages

Contributors 4

Uh oh!

Languages

lcarnegie/llms-achievement

Folders and files

Latest commit

History

Repository files navigation

Self-reported LLM usage and outcomes on a data science project: Evidence from two undergraduate data science courses

Overview

File structure

Statement on LLM usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Uh oh!

Languages

Packages