A LLM-based Recommender System with user&item Tokenizers and a generative retrieval paradigm. The overall framework of the proposed TokenRec, which consists of the masked vector-quantized tokenizer with a K-way encoder for item ID tokenization and the generative retrieval paradigm for recommendation generation.
First, please install the required dependencies using the following command:
conda create -n tokenrec python=3.9
conda activate tokenrec
pip install -r requirements.txtThe environment includes seven packages solely: torch, torchmetrics, tqdm, transformers, pandas, numpy, and kmeans_pytorch.
[Optional] Please download the checkpoints from Google Drive and place them in the "checkpoints/" path for the inference-only implementation.
Get into the "code" direction:
cd codeA simple example on the small dataset LastFM to run the LLM finetuning with the default configuration:
python main.pyTo run the whole pipeline (tokenizer + backbone):
python main.py --vq --train_vqFor other datasets, we need to set up the correct token number and codebook number:
python main.py --dataset=ML1M --vq --train_vq --vq_model=MQ --n_token=256 --n_book=3
python main.py --dataset=Beauty --vq --train_vq --vq_model=MQ --n_token=512 --n_book=3
python main.py --dataset=Clothing --vq --train_vq --vq_model=MQ --n_token=512 --n_book=3If you want to finetune the model based on a certain checkpoint:
python main.py --dataset=LastFM --n_token=256 --n_book=3 --train_from_checkpointpython main.py --dataset=LastFM --no_trainMore configurations can be found in the "./code/parse.py" file.
If you find this repository is useful, please star🌟 this repo and cite🖇️ our paper.
@article{qu2025tokenrec,
title={TokenRec: Learning to Tokenize ID for LLM-Based Generative Recommendations},
author={Qu, Haohao and Fan, Wenqi and Zhao, Zihuai and Li, Qing},
journal={IEEE Transactions on Knowledge \& Data Engineering},
volume={37},
number={10},
pages={6216--6231},
year={2025},
publisher={IEEE Computer Society}
}