Skip to content

A paper studying the relationship between LLM use and academic achievement in Canadian undergraduates.

Notifications You must be signed in to change notification settings

lcarnegie/llms-achievement

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-reported LLM usage and outcomes on a data science project: Evidence from two undergraduate data science courses

Overview

To help understand the effect of Large Language Models (LLMs) on data science practice we examine the extent to which self-reported LLM usage is correlated with the mark that a student received on a final paper in a classroom data science setting. We find some mild evidence from this observational study that LLM usage may be associated with better scores, especially for students who do not natively speak English. Additionally, comparing self-reported usage, there was a considerable increase between the class that occurred in January-April 2024 where 41 per cent of students self-reported extensive LLM usage and the class that occurred in September-December 2024 where 69 per cent reported extensive LLM usage. Despite the classroom setting used for evaluation, the task of interest is similar to the work done by professional data scientists. Our finding suggests the need for more extensive work evaluating how LLMs can be integrated into the data science workflow in a way that provides value in both the classroom and the workplace.

File structure

The repo is structured as:

  • data contains the data.
  • models contains fitted models,and also details of full model outputs and diagnostics in pp_check_and_diagnostics.pdf.
  • other contains example_student_submissions which contains three examples of student submissions and instructions which details the instructions and rubric provided to students.
  • paper contains the files used to generate the paper, including the Quarto and bibtex files, as well as the PDF of the paper.
  • scripts contains the R scripts used to clean and model the data.

Statement on LLM usage

Aspects of the code were written with the help of GitHub Copilot and GPT-4o.

About

A paper studying the relationship between LLM use and academic achievement in Canadian undergraduates.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •