quTrainer

This is an experiment in increasing the query understanding of LLMs for better formed queries in agentic search.

It begins with a reward model, trained on Google's query-wellformedness dataset which consists of annotated data from the Paralex corpus scraped from WikiAnswers.

https://github.com/google-research-datasets/query-wellformedness https://knowitall.cs.washington.edu/paralex/

This reward model is a fine tune of ModernBERT from Answer.AI.

https://github.com/AnswerDotAI/ModernBERT

It uses pytorch for training and optuna to do a bayesian hyperparameter optimization run.

The goal is to use this reward model in RL based post training and measure performance on an agentic search benchmark before and after. I haven't yet decided what benchmark(s) I will use but I'll update when I do.

Currently the only thing to do here is run:

uv run reward_model.py

This will start the optuna study and train the reward model, saving an optimization history and the best model. With default settings, this takes a few hours on a 3090.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
download_data.sh		download_data.sh
main.py		main.py
pyproject.toml		pyproject.toml
reward_model.py		reward_model.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

quTrainer

About

Uh oh!

Releases

Packages

Languages

License

RhizoNymph/quTrainer

Folders and files

Latest commit

History

Repository files navigation

quTrainer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages