Q*-Enhanced Learning

Overview

This repository contains the implementation and experimental setup for the Q*-Enhanced Learning Algorithm, which integrates Q-Learning and A* search to optimize training processes for large language models (LLMs) using nanoGPT. This innovative approach aims to enhance efficiency and adaptability in complex training environments.

Project Lead

Parag Ghorpade
Northeastern University

Abstract

The Q*-Enhanced Learning project investigates a novel algorithm that combines the robust decision-making capabilities of Q-learning with the efficient path optimization of A* search. The primary goal is to improve the training efficiency and effectiveness of large language models like GPT-2, focusing on reducing computational costs and improving adaptability in complex tasks.

nanoGPT Integration

This project utilizes nanoGPT, a minimal and fast repository for training medium-sized GPTs. It serves as the base for implementing and testing our Q* algorithm. nanoGPT is based on a rewrite of minGPT and is designed to be simple and hackable for various needs.

Installation

Dependencies include PyTorch, numpy, and several other libraries for processing and training:

pip install torch numpy transformers datasets tiktoken wandb tqdm

Experimental Setup with nanoGPT

Before integrating the Q* algorithm, it is essential to establish baseline performance using nanoGPT alone. Training can be conducted on various datasets, including the provided Shakespeare character-level dataset as a quick start.

Baseline Training

Train a baseline GPT model using the configurations provided by nanoGPT:

python train.py config/train_shakespeare_char.py

Evaluate Baseline

After training, evaluate the baseline model to establish performance metrics:

python sample.py --out_dir=out-shakespeare-char

Q* Algorithm Integration

Integrate the Q* algorithm into the nanoGPT training loop, modifying the train.py script to include the Q-learning and A* search enhancements:

Q* Training Command

python train.py config/train_q_star.py

Evaluation of Q* Model

python sample.py --out_dir=out-q-star-model

Dataset Selection for Q* Testing

We will test the Q* algorithm across several key datasets to validate the enhancement:

WikiText-103
BookCorpus
OpenWebText
Penn Treebank (PTB)
Common Crawl News

Contributions and Contact

Contributions to this project are welcome. Please feel free to fork the repository, make improvements, and submit pull requests.

For questions or further information, please contact:

Parag Ghorpade
Email: ghorpade.p@northeastern.edu

Acknowledgements

This project utilizes resources from the nanoGPT repository and is conducted with support from Northeastern University.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
config		config
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitignore copy		.gitignore copy
LICENSE		LICENSE
README copy.md		README copy.md
README.md		README.md
bench.py		bench.py
configurator.py		configurator.py
model.py		model.py
sample.py		sample.py
scaling_laws.ipynb		scaling_laws.ipynb
train.py		train.py
transformer_sizing.ipynb		transformer_sizing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Q*-Enhanced Learning

Overview

Project Lead

Abstract

nanoGPT Integration

Installation

Experimental Setup with nanoGPT

Baseline Training

Evaluate Baseline

Q* Algorithm Integration

Q* Training Command

Evaluation of Q* Model

Dataset Selection for Q* Testing

Contributions and Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Parag0506/Q--Enhanced-Learning

Folders and files

Latest commit

History

Repository files navigation

Q*-Enhanced Learning

Overview

Project Lead

Abstract

nanoGPT Integration

Installation

Experimental Setup with nanoGPT

Baseline Training

Evaluate Baseline

Q* Algorithm Integration

Q* Training Command

Evaluation of Q* Model

Dataset Selection for Q* Testing

Contributions and Contact

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages