Feat/reranking by Yokto13 · Pull Request #25 · Yokto13/mel

Yokto13 · 2025-10-12T08:20:26Z

No description provided.

…tracking

…for limiting loaded items

…h of different models

…same device as the module

- Updated BruteForceSearcher to use register_buffer for kb_embs and added eval mode with gradient disabling. - Modified DPBruteForceSearcherPT to compile the searcher and improve inference performance. - Enhanced create_binary_dataset to sort embeddings and qids, and adjusted data types for better memory efficiency. - Refactored PairwiseMLPReranker to streamline the forward pass and ensure base model remains non-trainable. - Introduced validation during training in the trainer module and improved logging for better monitoring. - Added a new reranking2 script for improved entity linking functionality. - Removed deprecated reranking.py script to clean up the codebase. - Updated test cases for pairwise MLP and training configurations to reflect recent changes.

…uration

…ctions

…rings

…nk token loading and dataset creation logic

… configuration

- Updated `trainer.py` to support dynamic configuration loading and improved logging with Weights & Biases integration. - Introduced `trainer_simple.py` for simplified single-device training. - Enhanced `training_configs.py` with additional model configurations and parameters. - Modified `reranking2.py` and added `reranking3.py` for improved reranking functionality and model evaluation. - Implemented a new test suite for `PairwiseMLPRerankerWithRetrievalScore` and `RerankingIterableDataset`. - Added support for new models including `FullLEALLAReranker` and `PairwiseMLPRerankerWithLargeContextEmb`. - Improved data handling and validation logic in training scripts.

…_from_tokens_model_name_and_state_dict

- Implemented `map_qids_to_token_matrix` function in `loaders.py` to create a sparse matrix mapping qids to their token vectors. - Added unit tests for `map_qids_to_token_matrix` in `test_loaders.py` to verify correct mapping and handling of non-existent qids. - Created a new test file `test_change_dataset_tokens.py` to test the `update_tokens_in_file` and `process_directory` functions. - Updated `uv.lock` to include `scipy` as a dependency.

…rt and parallel processing

…apes

…able dataset for training

Yokto13 added 28 commits September 30, 2025 17:45

feat: ✨ add verbose option to load_tokens_qids_from_dir for progress …

fe17fc0

…tracking

feat: ✨ add new scripts for Qwen reranking and processing inputs

8fbe834

feat: ✨ add max_items_to_load parameter to load_tokens_qids_from_dir …

a8038d8

…for limiting loaded items

feat: :wip: reranking simple model

266fe7a

fix(rerank): 🐛 calling super in .train override

0585685

feat(rerank): 🚧 new trainer code that will work more easily with bunc…

8ed60da

…h of different models

feat(reranking): ✨ add training configs

b4e8ed7

feat(models): ✨ add searcher that expects inputs to be tensor on the …

45364df

…same device as the module

feat(utils): ✨ benchmark whether DataParallel really helps

78e4462

skip unused tests

d22eebd

add ipython and pytest-cov

41cc070

feat(rerank): add binary dataset creation functions

aeb0dff

feat(dependencies): add einops package to project dependencies

725576e

feat(config): add run_kb_creator.langs to multilingual dataset config…

b3ec79c

…uration

feat(rerank): add create_default_binary_dataset and reranking_train a…

0707fec

…ctions

feat(creator): enhance dataset creation functions with detailed docst…

7e801b6

…rings

feat(rerank): enhance create_binary_dataset function with improved li…

3584f42

…nk token loading and dataset creation logic

feat(config): add model path for run_damuel_description in paraphrase…

ea5ee20

… configuration

feat(embeddings): update model loading to include output type in embs…

d1d5ea8

…_from_tokens_model_name_and_state_dict

feat(scripts): add another finetuning script

1b2bba8

feat(train): add ManualSyncBruteForceSearcher for improved CUDA suppo…

d7a91e7

…rt and parallel processing

feat(train): enable fused optimization in AdamW for improved performance

f681cfc

fix(reranking): 🐛 make updating dataset tokens work with different sh…

2e655c6

…apes

feat(dataset): add multiclass dataset creation and corresponding iter…

70f3cb5

…able dataset for training

fix tests

05c8532

Yokto13 merged commit 0d448af into main Oct 21, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/reranking#25

Feat/reranking#25
Yokto13 merged 28 commits intomainfrom
feat/reranking

Yokto13 commented Oct 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yokto13 commented Oct 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant