Graph Attention - SARS RBD

Analysis code for "Graph attention with energy features improves the generalizability of identifying functional sequences at a protein interface"

Note: The manuscript treats libraries LY010 and LY011 described in this repository as a single library, LY010. Within this repository, LY010 and Cas2 refer to the sequences in this library that have mutations in the second cassette, while LY011 and Cas3 refer to the sequences in this library that have mutations in the third cassette.

LY010 & LY011 Data Processing

Sequencing Coverage Analysis:

sequence_search_csv.py

Compares two lists of sequences (csv of merged sequences, excel sheet of oligo's that were ordered) and returns only the merged sequences that are present in the list of ordered oligo's

sequence_search_fastq.py

Compares two lists of sequences (fastq of merged sequences, excel sheet of oligo's that were ordered) and returns only the merged sequences that are present in the list of ordered oligo's
same as sequence_search_csv.py but for a different file format

Variant Filtering:

get_mutations.py

Determines list of amino acid mutations corresponding to each DNA sequence in a CSV file

wuhan_mut_naming.py

takes in mutation assignments and re-names mutations to correspond to the true WT sequence
- initial WT sequence used for mutation assignments was incorrect in some positions (498, 501, and 505)
Also prints out a list of the unique mutations found within the data file

variant_filter.py

first part of the code is a direct copy of wuhan_mut_naming.py
- generates the list of expected mutations that should be seen in the sequencing data
- input sheet should be manually edited to ensure that it contains a line for the WT sequence
second part of the code takes in data generated by the SpikeRBDStabilization code
- "LY010_LY011_10Jul24_SPK.xlsx" is the probabilities worksheet from "LY010_LY011_17Jun24_Probabilities.xlsx"
  - "LY010_LY011_17Jun24_Probabilities.xlsx" is a single workbook containing all of the output data from SpikeRBDStabilization code
- first checks to make sure all sequences have WT mutations at positions 417, 477, and 484, and a mutation at positions 498, 501, and 505
- then removes WT mutations from all mutation sets
- finally crosschecks the mutations corresponding to the sequencing data and expected sets of mutations, and removes any lines corresponding to unexpected mutation sets

Resequencing Data Processing:

Flash

sequence read files from the limited re-sequencing data are merged via Flash

mutations_and_counts.py

takes in merged fastq file and assigns mutations based on a given WT sequence
outputs a file with the list of mutation sets and corresponding read counts

variant_filter_resequencing.py

largely the same as variant_filter.py
makes sure all mutation sets contain expected mutations and then removes mutation sets that were not encoded in the oligo pool

count_comparison.py

combines the processed data from the original sequencing run, and the processed data from the re-sequencing run
adjusts the total counts in the 1 nM column to adjust for WT contamination
performs calculations necessary to classify each variant as "like-WT", "worse than WT", or "non-binder"

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
initial_variant_filtering		initial_variant_filtering
reseq_filtering_final_bins		reseq_filtering_final_bins
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph Attention - SARS RBD

LY010 & LY011 Data Processing

Sequencing Coverage Analysis:

sequence_search_csv.py

sequence_search_fastq.py

Variant Filtering:

get_mutations.py

wuhan_mut_naming.py

variant_filter.py

Resequencing Data Processing:

Flash

mutations_and_counts.py

variant_filter_resequencing.py

count_comparison.py

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

WhiteheadGroup/Graph-Attention-SARS-RBD

Folders and files

Latest commit

History

Repository files navigation

Graph Attention - SARS RBD

LY010 & LY011 Data Processing

Sequencing Coverage Analysis:

sequence_search_csv.py

sequence_search_fastq.py

Variant Filtering:

get_mutations.py

wuhan_mut_naming.py

variant_filter.py

Resequencing Data Processing:

Flash

mutations_and_counts.py

variant_filter_resequencing.py

count_comparison.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages