SETRec

This is the pytorch implementation of our paper

Order-agnostic Identifier for Large Language Model-based Generative Recommendation

Xinyu Lin, Haihan Shi, Wenjie Wang, Fuli Feng, Qifan Wang, See-Kiong Ng, Tat-Seng Chua

Environment

Anaconda 3
Python 3.8.0
pytorch 2.0.1
transformers 4.41.0

Usage

Data

The experimental data are in './data' folder, including Toys, Beauty, Sports, and Steam.

▶️ CF Tokenizer

A pre-trained CF tokenizer is utilized in SETRec to obtain the CF token. You can either

Approach 1 - train your own CF tokenizer (e.g., SASRec)
Approach 2 - directly use our provided CF tokens (e.g., SASRec_item_embed.pkl under each specific dataset folder)

Skip this step if you directly use our provided ones (Approach 2).

▶️ Semantic Tokenizer

A semantic tokenizer will be created and trained during the SETRec training. But before that, we need item semantic representation for tokenization. To obtain the semantic representation, you can either

Approach 1 - extract manually The item semantic representation is extracted by pre-trained language model (e.g., T5 or Qwen). We provide the scripts for extracting semantic representation in "./data" folder, extract_item_semantic_rep.py.
Approach 2 - use our provided semantic representations under each specific dataset folder.

Skip this step if you directly use our provided ones (Approach 2).

🔴 Training

First direct to the './code' folder

cd code

For T5 backend, run the command

bash scripts/train_t5.sh <dataset> <lr> <n_sem> <alpha>

For Qwen backend, run the command

bash scripts/train_qwen.sh <dataset> <lr> <n_sem> <alpha>

The log file will be in the './log/' folder.
The explanation of hyper-parameters can be found in './code/parse_utils.py'.

🔵 Inference

Get the results of SETRec by running inference.py:

Infer with T5 backend

bash scripts/inference_t5.sh <dataset> <n_sem> <ckpt_path>

Infer with Qwen backend (Please stay tuned for the inference script for Qwen. Currently you can get the evaluation results at the end of training.)

bash scripts/inference_qwen.sh <dataset> <n_sem> <ckpt_path>

⚪ Examples

Train SETRec (T5) on Toys dataset

cd ./code
bash scripts/train_t5.sh toys 1e-3 4 0.7

Inference

cd ./code
sh scripts/inference_t5.sh toys 4 <ckpt_path>

Citation

If you find our work is useful for your research, please consider citing:

@inproceedings{lin2025order,
  title={Order-agnostic Identifier for Large Language Model-based Generative Recommendation},
  author={Lin, Xinyu and Shi, Haihan and Wang, Wenjie and Feng, Fuli and Wang, Qifan and Ng, See-Kiong and Chua, Tat-Seng},
  booktitle={SIGIR},
  year={2025}
}

License

NUS © NExT++

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
data		data
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SETRec

Environment

Usage

Data

▶️ CF Tokenizer

▶️ Semantic Tokenizer

🔴 Training

🔵 Inference

⚪ Examples

Citation

License

About

Uh oh!

Releases

Packages

Languages

Linxyhaha/SETRec

Folders and files

Latest commit

History

Repository files navigation

SETRec

Environment

Usage

Data

▶️ CF Tokenizer

▶️ Semantic Tokenizer

🔴 Training

🔵 Inference

⚪ Examples

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages