This project transforms a workflow to generate the Antibody-Antigen (AbAg) database in the antibody design framework developed by Aguilar Rangel et al. cite into an automated pipeline. AbAg database is a collection of antigen-like and CDR-like regions in all non-redundant general protein structures reported in the PDB database.

-
Clone the repository
git clone https://github.com/FahsaiNak/ProjectAbDesign.git
-
Create environment from environment.yml
conda env create -f environment.yml
-
Update the paths in your configuration file found in the run directory.
-
Update somePDB.csv in the datasets directory with your complete list of PDB structures of interest
-
To run the snakefile use the following command:
snakemake -c1
- Navigate to SAbDab
- Click on Downloads on the left hand side of the page
- Click on Download an archived zip file to download structures
- Once the zip file is downloaded, extract the chothia zip file and save it into the Datasets directory in the local repository
- Change directory to the run directory
- On the command line run:
bash CDR_fragment_database.sh
- Ensure you have enough free space for the PDB90 files of interest.
- Add a CSV list of target PDB proteins to download in the Datasets directory. The list most have a similar format to the somePDB.csv file in the Datasets directory.
- Once the sequence list has been added, modify the PDB90.sh script in the run folder. It currently has python ../src/PDB90.py --output_folder "../Datasets/all_PDB" --csv_file "../Datasets/somePDB.csv" written, change "../Datasets/somePDB.csv" with the location of the downloaded PDB csv file.
- Move to the run directory and run the bash script with:
bash PDB90.sh
- The script might take a long time to run depending on the number of PDB files of interest that will be downloaded, uncompressed and cleaned.
- Remain in run directory
- Run Prep_queries_AbAg.sh and Prep_targets_AbAg.sh to convert fragmented-CDRs (queries) and cleaned-PDB90 proteins (targets) in PDB to PDS that is readable for Master program.
./Prep_queries_AbAg.sh ./Prep_targets_AbAg.sh
- Run Ablike.sh search for antigen(CDR)-like regions in every target proteins (PDB90 proteins)
./Ablike.sh
- Run get_matchInfo.sh to collect antigen(CDR)-like structure information from the Master match files.
./get_matchInfo.sh
- Remain in run directory
- Run Aglike.sh to search for contacting/interacting residues of each antigen(CDR)-like region in the same protein. Then, all the antibody-antigen-like information from each PDB90 proteins is combined into a compressed file named AbAb.pkl
./Aglike.sh
- Move to src directory, and run python process_AbAg.py with the following command (third argument optional):
python process_AbAg.py --abag_filename 'AbAg.pkl' --output_filename 'regions_for_vis.csv' --subset_csv_filename 'PDB_IDs_to_filter.csv'
- If desired, use the structure_visualization.ipynb notebook and the csv file generated by process_AgAg.py to visualize a protein of interest with Ab-like and Ag-like regions highlighted.
- Upload the notebook to Google Colab along with regions_for_vis.csv and vis_test_bad.csv from the test/Datasets/test_pkl_and_regions_for_vis and set the variables pdb_id and AbAg_pair_index.
- Run each cell.