config/: Contains YAML configuration files.src/: Contains all Python source code.train.py: Main entry point of the project.config.py: Defines Pydantic models for configuration.dataset.py: PyTorch dataset and data loader related code.model.py: Model architectures such asESM2Effect.trainer.py:Trainerclass that encapsulates training and ReST logic.utils.py: Utility functions such as data processing and plotting.
requirements.txt: Project dependencies.
-
Install dependencies:
pip install -r requirements.txt
-
Prepare data: Ensure your
data.csvfile path is correct and updatedata_pathandwt_seqinconfig/base_config.yaml. -
Start training: Run from the
protein_projectroot directory:python src/train.py --config_path config/base_config.yaml
-
Override parameters (optional): You can override parameters from the configuration file directly in the command line:
python src/train.py --config_path config/base_config.yaml --lr 5e-5 --batch_size 16
