Codebase for the paper Improving Minimax Group Fairness in Sequential Recommendation
The source code is in the src directory and the data in the data directory.
-
The implementation of the sequential recommender and training methods is in
src/recommenders.src/recommenders/SASRec.pycontains the SASRec recommender, andsrc/recommenders/utils/sequencer.pycontains the sequencer used for padding/truncation of each user's item sequence.- The distributionally robust methods and baselines in the paper are available in
src/recommenders/utils/loss.py. src/recommenders/utils/metric_computer.pyandsrc/recommenders/utils/checkpointer.pyare used for calculating metrics and checkpointing the best model, respectively.
-
src/training/train_sasrec_dpandsrc/sagemaker/sagemaker_training.pyare used to launch jobs locally and on AWS Sagemaker. -
Finally, the improvements compared to standard training are in
src/improvements. The RR_improvement.ipynb and ML1m_improvement.ipynb notebooks contain the analysis of the RetailRocket and Movielens1M experiments.
We use Movielens1M and RetailRocket, which are popular open datasets for movies and e-commerce.
- Download the processed ML1m from https://github.com/FeiSun/BERT4Rec/blob/master/data/ml-1m.txt and place it in
data/raw/ml1m. Download theevents.csvfrom https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset and place it indata/raw/retailrocket. - We have already preprocessed the two datasets by running
src/preprocess/preprocess_movielens1m.pyandsrc/preprocess/preprocess_retailrocket.py, respectively. The processed data is indata/processed. We have also added user groups by runningsrc/preprocess/addgroups_dsplit.py.
Note that the PYTHONPATH for your conda environment should be set to the base of this repository.
We first suggest running python src/training/train_sasrec_dp.py, this script has default values for the training parameters.
The key parameters we change for the experiments:
- The loss type is configurable to one of:
["joint_dro", "erm", "cb", "cb_log", "s_dro", "group_dro", "ipw", "ipw_log"]. joint-dro-alphais the alpha level for CVaR DRO.- The
gdro-stepsizeis the stepsize for the ascent step in the group and streaming dro loss. stream-lris the streaming learning rate for SDRO.groupsselects which user group and size we use in the experiment:[popdsplit_balanced, popdsplit_0.2_0.6_0.2, popdsplit_0.1_0.8_0.1]are G_pop33, G_pop2060, and G_pop1080 from the paper.[seqdsplit_balanced, seqdsplit_0.2_0.6_0.2, seqdsplit_0.1_0.8_0.1]are G_seq33, G_seq2060, and G_seq1080 from the paper.popseq_bal
subgroup-for-lossselects which subgroup to use for loss computation for GDRO, SDRO, and IPW when users belong to both popularity and sequence length groups.0uses popularity-based groups,1uses sequence length groups.
Launching jobs on Sagemaker is straightforward via python src/sagemaker/sagemaker_training.py <config path> with the appropriate config path.
The configs for all the jobs in our experiments are available in src/sagemaker/training_configs, for e.g. the command:
python src/sagemaker/sagemaker_training.py 'src/sagemaker/training_configs/RR/sasrec/popseq/rr-sasrec-jdro-popseq.json'
Launches CVaR DRO (also called jointDRO) training jobs with 10 different alpha values, on the RetailRocket dataset for users belonging to intersecting groups (popularity and sequence length-based groups).