AF3Score Pipeline

A pipeline for evaluating protein structure quality using AF3Score.

Environment Setup

1. Create and Activate Conda Environment

conda create -n af3score python=3.11
conda activate af3score
conda install gxx_linux-64 gxx_impl_linux-64 gcc_linux-64 gcc_impl_linux-64=13.2.0

2. Install HMMER (Required for MSA Generation)

mkdir ~/hmmer_build ~/hmmer
wget http://eddylab.org/software/hmmer/hmmer-3.4.tar.gz -P ~/hmmer_build
cd ~/hmmer_build
tar -zxf hmmer-3.4.tar.gz
cd hmmer-3.4
./configure --prefix=~/hmmer
make -j8
make install

Add HMMER to your PATH:

export PATH="~/hmmer/bin:$PATH"

Verify installation:

hmmsearch -h

3. Install AF3Score and Dependencies

git clone https://github.com/Mingchenchen/AF3Score.git
cd AF3Score/

# Download required databases
bash fetch_databases.sh <DB_DIR>  # Replace <DB_DIR> with your database directory

# Install Python dependencies
pip install -r dev-requirements.txt
pip install --no-deps -e .
build_data

# Install additional dependencies
conda install -c conda-forge biopython h5py pandas

Usage Pipeline

1. Extract Chains and Generate CIF Files

python 1_extract_chains.py

Input: PDB files in ./pdb directory
Output:

Individual chain CIF files in ./complex_chain_cifs/
Sequence information in complex_chain_sequences.csv

2. Convert PDB to JAX Arrays

python 2_pdb2jax.py

Input: PDB files in ./pdb directory
Output: H5 files in ./complex_h5/

3. Generate Configuration Files

python 3_generate_json.py

Input: complex_chain_sequences.csv
Output: JSON configuration files in ./complex_json_files/

4. Run AlphaFold3 Scoring

AlphaFold3 scoring can be run in two modes: single file mode or batch mode.

Single File Mode

Use this mode when you want to process a single protein structure:

python run_af3score.py \
  --db_dir=/path/to/alphafold_databases \
  --model_dir=/path/to/alphafold3_model_parameters \
  --json_path=/path/to/complex_json_files/your_protein.json \
  --path=/path/to/complex_h5/your_protein.h5 \
  --output_dir=/path/to/score_results/ \
  --run_data_pipeline=False \
  --run_inference=true \
  --init_guess=true \
  --num_samples=1

Batch Mode

Use this mode to process multiple protein structures at once:

python run_af3score.py \
  --db_dir=/path/to/alphafold_databases \
  --model_dir=/path/to/alphafold3_model_parameters \
  --batch_json_dir=/path/to/complex_json_files/ \
  --batch_h5_dir=/path/to/complex_h5/ \
  --output_dir=/path/to/score_results/ \
  --run_data_pipeline=False \
  --run_inference=true \
  --init_guess=true \
  --num_samples=1

Important Notes:

In batch mode, JSON and H5 files must have matching filenames (e.g., protein1.json and protein1.h5)
You cannot use both modes simultaneously (e.g., specify both --json_path and --batch_json_dir)
Batch mode efficiently processes multiple structures by loading the chemical component dictionary (CCD) only once

Computational Efficiency

Our method consists of two main stages: data preprocessing and model inference. The preprocessing stage runs on CPU and includes two steps: structure processing and coordinate conversion. These CPU-based steps are computationally efficient, taking less than 0.3 seconds combined even for proteins with 1024 residues. For the model inference stage, which runs on GPU (NVIDIA GeForce RTX 4090), the computational time scales with protein sequence length, ranging from 20 seconds for proteins with 256 residues to 60 seconds for those with 1024 residues.

5. Extract Scoring Metrics

python 4_extract_iptm-ipae-pae-interaction.py

Output Metrics

The pipeline generates the following scoring metrics:

Metric	Description
AF3Score_monomer_ca_plddt	pLDDT value for monomer CA atoms
AF3Score_monomer_pae	PAE value for monomers
AF3Score_monomer_ptm	PTM value for monomers
AF3Score_complex_ca_plddt	pLDDT value for complex CA atoms
AF3Score_complex_pae	PAE value for complexes
AF3Score_complex_ptm	PTM value for complexes
AF3Score_complex_iptm	iPTM value for complexes
AF3Score_pae_interaction	PAE value for interaction chains
AF3Score_ipae	PAE value for interfaces

Reference

For more information about AlphaFold3, please visit their GitHub Repository

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docker		docker
docs		docs
example		example
src/alphafold3		src/alphafold3
.DS_Store		.DS_Store
.gitattributes		.gitattributes
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
OUTPUT_TERMS_OF_USE.md		OUTPUT_TERMS_OF_USE.md
README.md		README.md
WEIGHTS_PROHIBITED_USE_POLICY.md		WEIGHTS_PROHIBITED_USE_POLICY.md
WEIGHTS_TERMS_OF_USE.md		WEIGHTS_TERMS_OF_USE.md
dev-requirements.txt		dev-requirements.txt
environment.yml		environment.yml
fetch_databases.sh		fetch_databases.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_af3score.py		run_af3score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AF3Score Pipeline

Environment Setup

1. Create and Activate Conda Environment

2. Install HMMER (Required for MSA Generation)

3. Install AF3Score and Dependencies

Usage Pipeline

1. Extract Chains and Generate CIF Files

2. Convert PDB to JAX Arrays

3. Generate Configuration Files

4. Run AlphaFold3 Scoring

Single File Mode

Batch Mode

Computational Efficiency

5. Extract Scoring Metrics

Output Metrics

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AF3Score Pipeline

Environment Setup

1. Create and Activate Conda Environment

2. Install HMMER (Required for MSA Generation)

3. Install AF3Score and Dependencies

Usage Pipeline

1. Extract Chains and Generate CIF Files

2. Convert PDB to JAX Arrays

3. Generate Configuration Files

4. Run AlphaFold3 Scoring

Single File Mode

Batch Mode

Computational Efficiency

5. Extract Scoring Metrics

Output Metrics

Reference

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages