GitHub - hsjang0/CORE-PO

Self-Training Large Language Models with Confident Reasoning

The official code base for reproducing self-training in Self-Training Large Language Models with Confident Reasoning. The training code involves (1) sampling math/science questions, (2) generating multiple answers with a LoRA-tuned Llama 3.1, (3) scoring them with an internal judge, and (4) running Direct Preference Optimization (DPO).

main.py
core_po/
- arguments.py
- __init__.py
- data.py – loaders for GSM8K, ARC, MATH, and GPQA with rank-aware sharding.
- generation.py – prompt templates and sampling.
- judge.py – confidence scorer that evaluates reasoning and final answers.
- models.py – loader of LoRA-adapted Llama 3.1 weights (8-bit or bf16).
- trainer.py – CORE-PO DPO training loop

Quick Start for Training

The training requires four NVIDIA A100 GPUs.

accelerate launch --num_processes 4 main.py \
  --save_directory ./dpo_saved \
  --save_name ours_run \
  --learning_rate 5e-6 \
  --batch_size 4

BibTeX

@inproceedings{jang-etal-2025-self,
  title     = {Self-Training Large Language Models with Confident Reasoning},
  author    = {Jang, Hyosoon and Jang, Yunhui and Lee, Sungjae and Ok, Jungseul and Ahn, Sungsoo},
  editor    = {Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
  month     = {nov},
  year      = {2025},
  address   = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.findings-emnlp.806/},
  doi       = {10.18653/v1/2025.findings-emnlp.806},
  pages     = {14925--14939},
  isbn      = {979-8-89176-335-7}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
core_po		core_po
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Training Large Language Models with Confident Reasoning

Quick Start for Training

BibTeX

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Training Large Language Models with Confident Reasoning

Quick Start for Training

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages