This is the README file for the project Alpha Go Everywhere: Machine Learning and International Stock Returns (SSRN link) accepted by Review of Asset Pricing Studies. It provides an overview of the project structure and instructions on how to use and contribute to the codebase.
The project is organized as follows (key scripts highlighted):
- ❗️
Rank_Norm.py: Rank-normalize the data, like GKX's paper. - 📂
Load_Data.py: Necessary functions for loading or preprocessing data - ⚙️
SetUp.py: Variable definitions - 🛠️
Local{US}_Factor{GapQ}.py: Create Local{US} factor{GapQ} - 🔗
Merge_Factor+GapQ.py: Merge US factors, US gaps, and local factors - 🌐
International_Pool.py: Integrate all standardized market data into one dataset - 🤖
ML{NN}_Market.py: Train various ML{NN} models for each market - 🗽
ML{NN}_Market_USmodel.py: Predict international markets using the USA model (no further training) - 🚀
ML{NN}_Market_Enhanced.py: Train enhanced market-specific models with USA factors, gaps, and local features
To use the project, follow these steps:
- Run
Rank_Norm.pyto rank-normalize the predictors (as in GKX’s paper). - Run
Local{US}_Factor{GapQ}.pyto create Local{US} factor{GapQ}. - Run
Merge_Factor+GapQ.pyto merge US factors, gaps, and local factors. - Run
International_Pool.pyto integrate all standardized market data into one international dataset. - Run
ML{NN}_Market.pyto train ML models for each market. - Run
ML{NN}_Market_USmodel.pyto predict international markets using the USA model. - Run
ML{NN}_Market_Enhanced.pyto train enhanced models with additional features.
- US data from CRSP
- China data from CSMAR
- Other markets data from DataStream
To run the reproducibility checks, the following environment and packages might be required:
-
Hardware
- Nvidia A100 GPU (40 GB)
- AMD EPYC 7713 64-Core @ 1.80 GHz (128 cores)
- 1.0 TB RAM
- Ubuntu 20.04.4 LTS
-
Software
- 🐍 Python 3.8.18
- 🔥 PyTorch 2.0.1+cu117
- 📊 numpy 1.22.3
- 📑 pandas 2.0.3
- 📈 scikit-learn 1.3.0
- 📊 matplotlib 3.7.2