ProjectAbDesign

About

This project transforms a workflow to generate the Antibody-Antigen (AbAg) database in the antibody design framework developed by Aguilar Rangel et al. cite into an automated pipeline. AbAg database is a collection of antigen-like and CDR-like regions in all non-redundant general protein structures reported in the PDB database.

Getting started

Prerequisites

Install Conda with python 3.11
Install Master: a rapid structural similarity search program

Installation

Clone the repository

git clone https://github.com/FahsaiNak/ProjectAbDesign.git

Create environment from environment.yml
```
conda env create -f environment.yml
```

Implementation

Utilizing Snake Make

Update the paths in your configuration file found in the run directory.
Update somePDB.csv in the datasets directory with your complete list of PDB structures of interest
To run the snakefile use the following command: snakemake -c1

Methods

Step 1: CDR fragment collection

Navigate to SAbDab
Click on Downloads on the left hand side of the page
Click on Download an archived zip file to download structures
Once the zip file is downloaded, extract the chothia zip file and save it into the Datasets directory in the local repository
Change directory to the run directory
On the command line run:
```
 bash CDR_fragment_database.sh
```

Step 2: Target protein collection and curation

Ensure you have enough free space for the PDB90 files of interest.
Add a CSV list of target PDB proteins to download in the Datasets directory. The list most have a similar format to the somePDB.csv file in the Datasets directory.
Once the sequence list has been added, modify the PDB90.sh script in the run folder. It currently has python ../src/PDB90.py --output_folder "../Datasets/all_PDB" --csv_file "../Datasets/somePDB.csv" written, change "../Datasets/somePDB.csv" with the location of the downloaded PDB csv file.
Move to the run directory and run the bash script with:
```
bash PDB90.sh
```
The script might take a long time to run depending on the number of PDB files of interest that will be downloaded, uncompressed and cleaned.

Step 3: CDR-like region identification

Remain in run directory
Run Prep_queries_AbAg.sh and Prep_targets_AbAg.sh to convert fragmented-CDRs (queries) and cleaned-PDB90 proteins (targets) in PDB to PDS that is readable for Master program.
```
./Prep_queries_AbAg.sh
./Prep_targets_AbAg.sh
```
Run Ablike.sh search for antigen(CDR)-like regions in every target proteins (PDB90 proteins)
```
./Ablike.sh
```
Run get_matchInfo.sh to collect antigen(CDR)-like structure information from the Master match files.
```
./get_matchInfo.sh
```

Step 4: Antigen-like region identification and AbAg database generation

Remain in run directory
Run Aglike.sh to search for contacting/interacting residues of each antigen(CDR)-like region in the same protein. Then, all the antibody-antigen-like information from each PDB90 proteins is combined into a compressed file named AbAb.pkl
```
./Aglike.sh
```

Step 5: AbAg tranformation

Move to src directory, and run python process_AbAg.py with the following command (third argument optional):

python process_AbAg.py --abag_filename 'AbAg.pkl' --output_filename 'regions_for_vis.csv' --subset_csv_filename 'PDB_IDs_to_filter.csv'

Structure visualization

If desired, use the structure_visualization.ipynb notebook and the csv file generated by process_AgAg.py to visualize a protein of interest with Ab-like and Ag-like regions highlighted.
Upload the notebook to Google Colab along with regions_for_vis.csv and vis_test_bad.csv from the test/Datasets/test_pkl_and_regions_for_vis and set the variables pdb_id and AbAg_pair_index.
Run each cell.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github/workflows		.github/workflows
Coding		Coding
Datasets		Datasets
run		run
src		src
test		test
AbAg.png		AbAg.png
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProjectAbDesign

About

Getting started

Prerequisites

Installation

Implementation

Utilizing Snake Make

Methods

Step 1: CDR fragment collection

Step 2: Target protein collection and curation

Step 3: CDR-like region identification

Step 4: Antigen-like region identification and AbAg database generation

Step 5: AbAg tranformation

Structure visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

FahsaiNak/ProjectAbDesign

Folders and files

Latest commit

History

Repository files navigation

ProjectAbDesign

About

Getting started

Prerequisites

Installation

Implementation

Utilizing Snake Make

Methods

Step 1: CDR fragment collection

Step 2: Target protein collection and curation

Step 3: CDR-like region identification

Step 4: Antigen-like region identification and AbAg database generation

Step 5: AbAg tranformation

Structure visualization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages