Skip to content

This is an experiment in increasing the query understanding of LLMs for better formed queries in agentic search.

License

Notifications You must be signed in to change notification settings

RhizoNymph/quTrainer

Repository files navigation

quTrainer

This is an experiment in increasing the query understanding of LLMs for better formed queries in agentic search.

It begins with a reward model, trained on Google's query-wellformedness dataset which consists of annotated data from the Paralex corpus scraped from WikiAnswers.

https://github.com/google-research-datasets/query-wellformedness https://knowitall.cs.washington.edu/paralex/

This reward model is a fine tune of ModernBERT from Answer.AI.

https://github.com/AnswerDotAI/ModernBERT

It uses pytorch for training and optuna to do a bayesian hyperparameter optimization run.

The goal is to use this reward model in RL based post training and measure performance on an agentic search benchmark before and after. I haven't yet decided what benchmark(s) I will use but I'll update when I do.

Currently the only thing to do here is run:

uv run reward_model.py

This will start the optuna study and train the reward model, saving an optimization history and the best model. With default settings, this takes a few hours on a 3090.

About

This is an experiment in increasing the query understanding of LLMs for better formed queries in agentic search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published