!!! For Some Satey reasons, I removed all the config files and some models version. !!!
YoYo Recommendation Model is a TensorFlow-based ranking and conversion prediction stack that powers multi-task ad recommendation research. The repo contains production-style pipelines for downloading features from MaxCompute/ODPS, training several neural architectures (e.g., TIGER, RankMixer, DeepFM variants, sequence-aware HSTU models), evaluating performance, exporting SavedModels, and syncing artifacts to OSS.
The code base is organized so a single MODEL_TASK (for example O35_mutil_cvr_v10) encapsulates everything it needs: dataset schema, feature configuration, model hyper-parameters, and cron-style shell wrappers. The same runner can execute training, evaluation, inference, and exporting on GPU or CPU with identical business logic.
bin/: automation scripts used in production (train/eval/export/infer, TF-Serving smoke tests, ODPS downloads).common/: shared utilities for data ingestion, feature parsing, TF dataset helpers, ODPS/OSS bridges, metric logging, etc.config/: per-model configuration packages containingtrain_config.py, HTTP body templates, cron samples, and DataWorks scripts.dataset/: dataset readers and preprocessing utilities.layers/,models/: model architecture definitions (MMoE, RankMixer blocks, TIGER encoder, HSTU sequence modules, etc.).scripts/andtest/: demo runners plus TF-Serving regression tests.
- Multi-task modeling: CTR, CVR, CTCVR, and auxiliary heads running on a unified parameter server via Estimator.
- Feature rich input pipeline: supports dense/sparse features, scripted slot selection, and sequential histories with custom padding logic.
- Cluster-friendly execution:
TF_CONFIGdriven distribution, mirrored strategy on GPU, and job orchestration throughbin/run.sh. - Artifact management: hooks that write predictions/metrics back to ODPS tables and keep checkpoints/exported models in OSS.
- Serving preview: ready-to-use TF-Serving scripts (
test/run_tfserving_test_http.sh) and sample request bodies for sanity checks.
-
Environment
- Python 3.8+ with TensorFlow 2.x,
tensorflow-recommenders-addons,pandas,redis,oss2,pyodps, etc. - (Optional) GPU runtime with CUDA/cuDNN if you want to enable multi-GPU training.
- Python 3.8+ with TensorFlow 2.x,
-
Install dependencies
pip install -r requirements.txt # update the file to match your environment -
Select a task
- Copy one of the
config/<task>/train_config*.pytemplates and adjust schema paths, feature settings, optimization params, and ODPS/OSS destinations. - Set the task name before running scripts:
export MODEL_TASK=O35_mutil_cvr_v10(for example).
- Copy one of the
-
Download data + train
cd bin bash run.sh ${MODEL_TASK} 20250101 20250102 # <start_date> <end_date> [batch_days]
run.shorchestrates ODPS downloads, trains viamain.py, optionally evaluates, exports, and triggers inference jobs. -
Evaluate/Export/Infer separately
bash eval.sh ${MODEL_TASK} 20250102bash export.sh ${MODEL_TASK} 20250102bash infer.sh ${MODEL_TASK} 20250102
-
Serving smoke test
- Point
MODEL_ROOTandMODEL_TASK, then runtest/run_tfserving_test_http.shto start a local TF-Serving container and issue sample HTTP requests.
- Point
All credentials and server addresses have been placeholdered so nothing sensitive ships with the repo. Populate them via environment variables before executing any job.
| Purpose | Environment variable | Description |
|---|---|---|
| OSS uploads | UPLOAD_OSS_ACCESS_KEY_ID, UPLOAD_OSS_ACCESS_KEY_SECRET, UPLOAD_OSS_ENDPOINT, UPLOAD_OSS_BUCKET |
Credentials and endpoint used by common/upload.py when syncing SavedModels. |
| ODPS connectivity | ODPS_ACCESS_KEY_ID, ODPS_ACCESS_KEY_SECRET, ODPS_ENDPOINT, ODPS_TUNNEL_ENDPOINT, ODPS_DEFAULT_PROJECT |
Required for any ODPS read/write (data download, metrics upload, SQL execution). |
| Generic OSS access | OSS_TEST_ACCESS_KEY_ID, OSS_TEST_ACCESS_KEY_SECRET, OSS_TEST_BUCKET, OSS_TEST_ENDPOINT, OSS_TEST_INTERNAL_ENDPOINT, OSS_SOURCE_BUCKET |
Used in helper scripts (common/aliyun.py, DataWorks checkers) and for feature sync jobs. |
| Redis instances | YOYO_REDIS_HOST, YOYO_REDIS_PORT, YOYO_REDIS_USERNAME, YOYO_REDIS_PASSWORD, YOYO_REDIS_DB |
Primary Redis instance used for online feature pushes. |
| Feature Redis | YOYO_FEATURE_REDIS_HOST, YOYO_FEATURE_REDIS_PORT, YOYO_FEATURE_REDIS_USERNAME, YOYO_FEATURE_REDIS_PASSWORD, YOYO_FEATURE_REDIS_DB |
Secondary Redis instance that caches feature bins. |
Keep these variables in a secure vault or CI secret store and never commit actual values. The defaults inside the repo are non-functional placeholders like YOUR_OSS_ACCESS_KEY_ID so running code without overrides will raise authentication errors—this is intentional to prevent accidental leakage.
- Configure schema & slots: Update the CSV/JSON assets referenced in
train_config.py(schema definitions, slot mappings, feature groups, sequence configs). - Schedule jobs: Use the sample crontabs under each
config/<task>/crontabas a starting point for production scheduling. - Monitor metrics:
common/save_eval_metric.pycan push evaluation results to ODPS tables, which you can visualize downstream. - Manage artifacts: After export, models land in
${MODEL_ROOT}/${MODEL_TASK}/export_dir.common/upload.pythen uploads them to the OSS path configured in the train config. - Serve online: Deploy the exported SavedModel via TF-Serving (Docker example under
test). Use the provided HTTP bodies underconfig/*/body.jsonor adapt them to your business requests.
- Internal server URLs, Redis hosts, ODPS endpoints, and tokens have been replaced with placeholders. Double-check
common/connect_config.py,common/aliyun.py, and the DataWorks scripts inconfig/*/run_datawork_*.pyto confirm your own secrets are injected via env vars before deployment. - The
bin/logsdirectory contains historical log snapshots. Remove or git-ignore it if you do not want to publish operational traces. - Make sure any additional artifacts (datasets, dumps, screenshots) undergo the same sanitization pass before sharing this project.
- Dataset decode errors:
bin/run.shautomatically retries downloads and clears corrupted inputs viacommon/clear_history_data.pywhen it detects decoding issues. - Distributed runs: Ensure
TF_CONFIGis exported correctly;tfconfig.pygenerates a single-worker setup by default but can be extended to multi-worker if required. - OSS/ODPS auth failures: Verify the environment variables listed above and confirm the network path (internal vs external endpoint) matches your runtime.
Feel free to adapt the configs, add new architectures under models/, or wrap different feature stores while keeping this scaffold as the backbone of your UCSD MSDS submission.