Skip to content

demolnau/FigWasps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identification of fig wasp olfactory receptor proteins

  1. Gather hymenopteran sequences from NCBI and literature. These are saved in hymenopteran OR protein sequences and then combined as hym_OR_prot.fasta in the sequence/ folder.
cd sequences/
find hymenopteran_OR_protein_sequences/ -maxdepth 1| grep “_OR_prot.fasta”|while read fn; do cat "$fn" >> hym_OR_prot.fasta; done
  1. Filter to make sure there are no illegal characters.
python ../scripts/allowed_letters.py hym_OR_prot.fasta filtered_hym.fasta
  1. Filter the hymenopteran sequences by size. A complete OR protein sequence with one 7tm_6 domain should be roughly between 350 aa - 500 aa.
python ../scripts/filter_reads_by_size.py filtered_hym.fasta filtered_by_size.fasta

  1. Download the 7tm_6 Pfam family from the Pfam website. This contains 10148 protein sequences
    http://pfam.xfam.org/family/7tm_6#tabview=tab1

  2. Remove duplicates from 10148 sequences

python ../scripts/remove_duplicates.py Pfam_7tm_6.fasta no_dup_pfam_7tm_6.fasta
  1. Remove sequences that might contain additional 7tm_6 domains. The easiest way to do this is by size selection between 350aa and 500 aa again.
python ../scripts/filter_reads_by_size.py no_dup_pfam_7tm_6.fasta pfam_7tm_6_size_filtered_350aa.fasta
  1. Now that we have examples of OR protein sequences with one 7tm_6 motif in them, we can make a profile using HMMBUILD. After creating a profile, we can identify hymenopteran sequences that contain good examples of the 7tm_6 domain using HMMSEARCH. This command was done on a hpc.
cd ../hmmsearch/
sbatch combined_hmmsearch_cmds.sh
  1. Navigating the HMMSEARCH results: We want to select just the lines that identify one 7tm_6 motif. We select for N=1 . Do this command while still in the HMMSEARCH folder.
 tail -n+15 hmmsearch_hym_pfam_7tm_6.output| tr -s ' '| sed 's/ /\t/g'| awk -F '\t' '{if ($9 == 1) print $0}' > hym_7tm_6_N1.out
  1. Now that we can identify which hymenopteran OR sequences contain one 7tm_6 motif we can match and pull out the fasta sequences of interest
cd ../
python scripts/match_hmmsearch_outputs.py sequences/filtered_by_size.fasta hmmsearch/hym_7tm_6_N1.out sequences/exonerate_input.fasta
  1. Running Exonerate to idenitfy OR sequences in the genome of interest.
mkdir exonerate/
cd exonerate/
sbatch exonerate_obtusifolia_cmd.sh
  1. Double check file endings

  2. Taking Exonerate output, genome of interest and the exonerate input fasta sequences and using InsectOR to get consensus sequences.

http://caps.ncbs.res.in/cgi-bin/gws_ors/load_form.py?module_name=gws_search&module_desc=Search%20for%20OR%20genes%20in%20a%20genome

About

Work done in Dr.Nason's Lab on the coevolution of figs and fig wasps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors