Prompting-Strategies-CodeGen-Study

Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation

This repository accompanies the research study and provides the code, dataset, and analysis artifacts referenced in the paper (see Associated Publication).

Contributors

Koorosh Nobakhtfar — Technical Analyst [GitHub | LinkedIn]
Kenan Çakılcı — Dataset Architect [GitHub | LinkedIn]
Ruken Zilan — Research Supervisor [LinkedIn]

ANOVA conducted using the Analysis ToolPak add-in.
Effect sizes (eta-squared, partial eta-squared, and omega-squared) were calculated manually using standard definitions (see Effect Sizes).
To enable the add-in in Excel:
1. File → Options → Add-ins
2. From Manage, select Excel Add-ins, click Go…
3. Check Analysis ToolPak, click OK.

4) Dataset

Organized by the combination of Reasoning-Style (CoT vs. Non-CoT) and Example-Context (Zero-Shot vs. Few-Shot).
Each combination contains 20 tasks (cases).
Each task has three files:
- Prompt file: the prompt authored by the LLM.
- Response file: the LLM’s response to that prompt.
- Data file: structured metadata, evaluation results, and other task-level information.

Example layout:

Dataset/
├─ CoT Few-Shot (CFS)/
│  ├─ CFS 1/
│  │  ├─ task_031_data.json
│  │  ├─ task_031_prompt.txt
│  │  └─ task_031_response.txt
│  ├─ CFS 2/
│  │  ├─ ...
│  │  └─ ...
│  └─ ...
├─ CoT Zero-Shot (CZS)/
│  ├─ CZS 1/
│  │  ├─ ...
│  │  └─ ...
│  └─ ...
├─ Non-CoT Few-Shot (NCFS)/
│  ├─ NCFS 1/
│  │  ├─ ...
│  │  └─ ...
│  └─ ...
└─ Non-CoT Zero-Shot (NCZS)/
   ├─ NCZS 1/
   │  ├─ ...
   │  └─ ...
   └─ ...

Data Files

Data files are designed to store information about the prompt and response.
These files were originally produced by the LLM and then edited by humans to correct metadata and add additional information where necessary to ensure accuracy and completeness.

Notes on stored evaluations:

Self-evaluation refers to the model’s own assessment (values typically on a 1–10 scale). These were retained for completeness but disregarded in analysis due to unreliability.
Supervised (human) evaluations were performed by real evaluators. The rubric fields are:

field	weight	range	explanation
`factual_correctness`	25%	1 to 5	Are the facts and steps correct?
`Reasoning_quality`	25%	1 to 5	Is the logic transparent?
`coherency_and_clarity`	20%	1 to 5	Is the response clear and easy to follow?
`completeness`	20%	1 to 5	Does it cover all required aspects?
`understanding_depth`	10%	1 to 5	Does it show insight beyond surface-level?
`weighted_total`	N/A	0 to 100 (pct)	Final composite score from weights

For more information regarding the evaluation of accuracy, see the Accuracy Evaluation Process and Criteria file.

Effect Sizes

Eta-squared (η²), partial eta-squared (η_p²), and omega-squared (ω²) were derived from the ANOVA results using their standard formulas based on sums-of-squares (SS), mean-squares (MS) and degrees-of-freedom (df).

$$ \eta^{2} = \frac{SS_{\text{effect}}}{SS_{\text{total}}} $$ $$ \text{partial } \eta^{2} = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}} $$ $$ \omega^{2} = \frac{ SS_{\text{effect}} - (df_{\text{effect}})(MS_{\text{error}}) }{ SS_{\text{total}} + MS_{\text{error}} } $$

Some formulas are not presented in our paper
For more information regarding the formulas, see the reference below.

Reference for effect-size formulas
B. G. Tabachnick and L. S. Fidell, Using Multivariate Statistics, 6th ed., Upper Saddle River, NJ: Pearson Education, 2013, pp. 54–55.

Associated Publication

This repository contains the source code and materials for the work described in our paper, "Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation," by K. Nobakhtfar, K. Çakılcı, R. Zilan.

Current Status: Accepted by the 5th International Informatics and Software Engineering Conference (IISEC 2026).

Note: The final, peer-reviewed version of the paper may contain minor changes. We will update this section upon acceptance or publication.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Dataset		Dataset
Token_Counter		Token_Counter
ANOVA.xlsx		ANOVA.xlsx
Accuracy Evaluation Process and Criteria.pptx		Accuracy Evaluation Process and Criteria.pptx
Data and Analytics.xlsx		Data and Analytics.xlsx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompting-Strategies-CodeGen-Study

Contributors

Contents

1) Token Counter Program

2) Data and Analytics Excel File

3) ANOVA Excel File

4) Dataset

Data Files

Effect Sizes

Associated Publication

About

Uh oh!

Releases 2

Languages

License

KCyrusNF/Prompting-Strategies-CodeGen-Study

Folders and files

Latest commit

History

Repository files navigation

Prompting-Strategies-CodeGen-Study

Contributors

Contents

1) Token Counter Program

2) Data and Analytics Excel File

3) ANOVA Excel File

4) Dataset

Data Files

Effect Sizes

Associated Publication

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Languages