-
Ssh into your Wynton account and then into a dev node. If you do not have a Wynton account and need to request one, see this link. If you're unfamiliar with Wynton, there is very good documentation, you can start here; read at least the first 4 tabs on the "Get Started" header before continuing.
-
Clone the github repo to a Wynton working directory
git clone https://github.com/kroganlab/af3.template.git
- Move into
af3.template. This will be your project working directory
cd af3.template
2b. First time only you run this on Wynton, make sure you have the R packages you need. This is only necessary for some of the post processing after AlphaFold completes.
bash installPackages.sh
-
Make a new
AlphaFoldJobList.csvfile (match format topten.jobTable.txt) -
Make a new
masterFasta.fastafile (match format topten_preys.fa) -
Edit submission script
af.jobs.sh, most importantly the number of tasks, but also new file names or job names if desired -
Submit job with
qsub af.jobs.sh -
View queue and job status with
qstat
-
Once the job is finised, check your output folder. Each PPI should have its own subdirectory. Within the output directory, the key output files are:
{PPI_name}_summaryScores.csv,{PPI_name}.msa.pngand{PPI_name}.pae.png -
Sometimes AF jobs fail to complete, which can be often due to timeout. To check for incomplete runs, you can use the following bash one-liner. Just
cdinto the output directory and run:
for dir in *;do if [[ ! -e ./$dir/${dir}_model.cif ]]; then echo $dir;fi; done
This checks for the existance of the top scoring model, and prints the name of the directory if this isn't found (you can remove the ! symbol to print names of completed runs)
- You can capture these incomplete runs and convert to a new
AlphaFoldJobList_remainingRuns.csvfor a fresh submission, using the following one-liner: (assumption here is you are searching from the parent directory ofoutputDiri.e./outputDir/protein1__protein2/outputFiles)
find ./outputDir -maxdepth 1 -type d '!' -exec test -e "{}/ranking_scores.csv" ';' -print | cut -d '/' -f3 | awk '{FS="__";OFS=","} {print(NR-1,toupper($1),toupper($2))}' > ./AlphaFoldJobList_remainingRuns.csv
Be sure to edit the af.jobs.sh script and extend the job runtime beyond two hours to avoid another timeout!
- A handy one-liner for collating all summaryScores.csv into a single file with only a single header:
awk -FS, 'BEGIN{FS=","} NR == 1 || $1 != "model" { print }' output/*/*_summaryScores.csv > AllSummaryScores.csv