Skip to content

Quhaoh233/TokenRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendations

A LLM-based Recommender System with user&item Tokenizers and a generative retrieval paradigm. The overall framework of the proposed TokenRec, which consists of the masked vector-quantized tokenizer with a K-way encoder for item ID tokenization and the generative retrieval paradigm for recommendation generation.

If you find this project useful, please give us a star🌟.

Example of Implementation

Setup

First, please install the required dependencies using the following command:

conda create -n tokenrec python=3.9
conda activate tokenrec
pip install -r requirements.txt

The environment includes seven packages solely: torch, torchmetrics, tqdm, transformers, pandas, numpy, and kmeans_pytorch.

[Optional] Please download the checkpoints from Google Drive and place them in the "checkpoints/" path for the inference-only implementation.

Training

Get into the "code" direction:

cd code

A simple example on the small dataset LastFM to run the LLM finetuning with the default configuration:

python main.py

To run the whole pipeline (tokenizer + backbone):

python main.py --vq --train_vq

For other datasets, we need to set up the correct token number and codebook number:

python main.py --dataset=ML1M --vq --train_vq --vq_model=MQ --n_token=256 --n_book=3
python main.py --dataset=Beauty --vq --train_vq --vq_model=MQ --n_token=512 --n_book=3
python main.py --dataset=Clothing --vq --train_vq --vq_model=MQ --n_token=512 --n_book=3

If you want to finetune the model based on a certain checkpoint:

python main.py --dataset=LastFM --n_token=256 --n_book=3 --train_from_checkpoint

Evaluation

python main.py --dataset=LastFM --no_train

More configurations can be found in the "./code/parse.py" file.

Citation

If you find this repository is useful, please star🌟 this repo and cite🖇️ our paper.

@article{qu2025tokenrec,
  title={TokenRec: Learning to Tokenize ID for LLM-Based Generative Recommendations},
  author={Qu, Haohao and Fan, Wenqi and Zhao, Zihuai and Li, Qing},
  journal={IEEE Transactions on Knowledge \& Data Engineering},
  volume={37},
  number={10},
  pages={6216--6231},
  year={2025},
  publisher={IEEE Computer Society}
}

About

[IEEE TKDE] A LLM-based Recommender System with user&item Tokenizers and a generative retrieval paradigm.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages