GitHub - dstl-lab/Code-Comprehension-User-Study: Materials for the paper 'I am no sure, but...': Expert Practices that Enable Effective Code Comprehension in Data Science. This repository includes user study data for the paper

Code Comprehension User Study Program

Materials for the paper 'I am no sure, but...': Expert Practices that Enable Effective Code Comprehension in Data Science. This repository includes user study data for the paper

About The Research

This is a screenshot of our research result

This research study the effective methods that novice data scientists can adopt to enhance their understanding of pre-written data analytical programs. We conducted user studies with five novice data scientists and four expert data scientists. In each study, participants were presented with a pre-written data analytical program and asked to use think-aloud method to explain their thought processes. Based on their responses, we performed both quantitative and qualitative analyses. The materials in this repository include the pre-written code we asked our participants to analyze.

(back to top)

User Study Protocol

Files Descriptions

buoy.txt: dataset that the program interact with
data_analytical_problem.ipynb: pre-written data analytical program
saver_func.py: helper function that download the entire dataframe visualize every entry in html

Set Up

Clone the repo

git clone https://github.com/dstl-lab/Code-Comprehension-User-Study.git

Install environment using conda
```
conda env create -f environment.yaml
```

Procedure

Task	Time	Time out of 60	Description
Intro	5	5	Open `data_analytical_problem.ipynb` and introduce the scenario.
Task 0 & Task 1	10	5 - 15	Participants spent 10 minutes on understanding Task 0 and Task 1.
Rating & Follow-up 1	5	15 - 20	Participants rated the difficulties and answered follow-up questions for Task 0 and Task 1.
Task 2	10	20 - 30	Participants spent 10 minutes on understanding Task 2.
Rating & Follow-up 2	5	30 - 35	Participants rated the difficulties and answered follow-up questions for Task 2.
Task 3	10	35 - 45	Participants spent 10 minutes on understanding Task 3.
Rating & Follow-up 3	5	45 - 50	Participants rated the difficulties and answered follow-up questions for Task 3.
Interview	10	50 - 60	Concluding interview to gather additional comments and feedback.

Notebook Details

Regardless of their level of expertise, participants were represented with the same notebook. Each task in the notebook represents a different stage of the data analytical pipeline: Task 0 represents data cleaning; Task 1 represents missing value assessment; Task 3 represents data imputation; and Task 4 represents evaluating the imputation results.

Interview Questions

Participants were asked the following questions during the interview phase of the study (due to time constrain, part of the questions were being asked):

What information are you trying to gather that you couldn’t from just the default Pandas output?
Which rows and columns would you add to the smaller table in order to not need to refer to the larger table? (pick as many as you see fit)
What are you thinking about here? What additional information would make this immediate problem easier to solve?
What did you find easy or difficult about this task?
What did you find to be the most effective way to understand the code?
When you looked broadly at the full table from the save() function, how did you know what to look for?
Let’s look at two of your HTML tables. How did you know what to look at on these tables specifically?
Do you have any feedback, comments, or questions?

Participant Responses

After each task, participants were asked to complete a survey describing their feelings on the task in Google form (Template). We also asked for two synonymous data scientists to evaulate participants' response based on the rubric we proved.

Study Data

All of the data related to the research can be found in study_data folder.

Demographic Information: participants' demographic information
Self Evaluation information: participants self report on the task they have completed
Assessment on participants: performance assessment on participants from two other data scientists

(back to top)

Contact

Sam Lau - @github_profile - lau@ucsd.edu

Christopher Lum - @github_profile - cslum@ucsd.edu

Guoxuan Xu - @github_profile - g7xu@ucsd.edu

Paper Link

(back to top)

Acknowledgments

We would like to thank all nine participants who voluntarily joined our user study, contributing valuable insights that enriched our research
Apperciate constributor of this readme template

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
image		image
study_data		study_data
.gitignore		.gitignore
README.md		README.md
buoy.txt		buoy.txt
data_analytical_problem.ipynb		data_analytical_problem.ipynb
environment.yaml		environment.yaml
saver_func.py		saver_func.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code Comprehension User Study Program

About The Research

User Study Protocol

Files Descriptions

Set Up

Procedure

Notebook Details

Interview Questions

Participant Responses

Study Data

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

dstl-lab/Code-Comprehension-User-Study

Folders and files

Latest commit

History

Repository files navigation

Code Comprehension User Study Program

About The Research

User Study Protocol

Files Descriptions

Set Up

Procedure

Notebook Details

Interview Questions

Participant Responses

Study Data

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages