Source code and baselines for the RecGURU paper "RecGURU: Adversarial Learning of Generalized User Representations for Cross-Domain Recommendation (WSDM 2022)"
RecGURU
βββ README.md Read me file
βββ data_process Data processing methods
βΒ Β βββ __init__.py Package initialization file
βΒ Β βββ amazon_csv.py Code for processing the amazon data (in .csv format)
βΒ Β βββ business_process.py Code for processing the collected data
βΒ Β βββ item_frequency.py Calculate item frequency in each domain
βΒ Β βββ run.sh Shell script to perform data processing
βββ GURU Scripts for modeling, training, and testing
βΒ Β βββ data Dataloader package
βΒ Β Β Β βββ __init__.py Package initialization file
βΒ Β Β Β βββ data_loader.py Customized dataloaders
βΒ Β βββ tools Tools such as loss function, evaluation metrics, etc.
βΒ Β Β Β βββ __init__.py Package initialization file
βΒ Β Β Β βββ lossfunction.py Customized loss functions
βΒ Β Β Β βββ metrics.py Evaluation metrics
βΒ Β Β Β βββ plot.py Plot function
βΒ Β Β Β βββ utils.py Other tools
βΒ Β βββ Transformer Transformer package
βΒ Β Β Β βββ __init__.py Package initialization
βΒ Β Β Β βββ transformer.py transformer module
βΒ Β βββ AutoEnc4Rec.py Autoencoder based sequential recommender
βΒ Β βββ AutoEnc4Rec_cross.py Cross-domain recommender modules
βΒ Β βββ config_auto4rec.py Model configuration file
βΒ Β βββ gan_training.py Training methods of the GAN framework
βΒ Β βββ train_auto.py Main function for training and testing single-domain sequential recommender
βΒ Β βββ train_gan.py Main function for training and testing cross-domain sequential recommender
βββ .gitignore gitignore file- The public datasets: Amazon view dataset at: https://nijianmo.github.io/amazon/index.html
- Collected datasets: https://drive.google.com/file/d/1Eszu-mApyzvVj6tAunPuYql6aIJlQmhg/view?usp=sharing (tar.gz) or https://drive.google.com/file/d/1NbP48emGPr80nL49oeDtPDR3R8YEfn4J/view (.gz file use 7z to unzip)
- Data processing:
```shell
cd ../data_process
python amazon_csv.py
```
```shell
cd ../data_process
python business_process.py --rate 0.1 # portion of overlapping user = 0.1
```
After data process, for each cross-domain scenario we have a dataset folder:
."a_domain"-"b_domain"
βββ a_only.pickle # users in domain a only
βββ b_only.pickle # users in domain b only
βββ a.pickle # all users in domain a
βββ b.pickle # all users in domain b
βββ a_b.pickle # overlapped users of domain a and b Note: see the code for processing details and make modifications accordingly.
- Single-domain Methods:
# SAS python train_auto.py --sas "True" # AutoRec (ours) python train_auto.py
- Cross-Domain Methods:
# RecGURU python train_gan.py --cross "True"