Skip to content

Too much information for Phyluce / SPAdes? #372

@jb23590

Description

@jb23590

Dear Phyluce Team,

I am hoping there may be an solution to issue we have come across. This may be an easy fix (I hope so) but we believe we may be feeding either Phyluce or SPAdes too much information for it to give accurate UCE counts. We have 95 samples sequenced at a proposed depth of 40M PE reads, and enriched with UCE baits from this paper
https://www.sciencedirect.com/science/article/pii/S1055790320302165?casa_token=9MG8aNwNNm4AAAAA:OUgk-MFQ3hhlznIn2NyzdbKOXmrylU_HYU5ThgxhzejTNFEcapjF0TBNPYLsEDjp_cVXGQCud0X6

Raw reads counts were on average 25M (min 8M, max 56M). After SPAdes, assembled contig numbers were 288 thousand (min 48T, max 1M). These are roughly as we expected. However, when running phyluce_assembly_match_contigs_to_probes and phyluce_assembly_get_match_counts, UCE counts are greatly lower than anticipated. UCE contigs are 780 on average (min 474, max 1190). The number of UCE loci removed for matching
multiple contigs averaged 389 (min 127, max 1259).

I also ran phyluce_assembly_match_contigs_to_probes with the min-identity and -coverage values set to 70, as here:

phyluce_assembly_match_contigs_to_probes
--contigs path-to-spades/contigs
--probes Cowman_etal_APPENDIX_C-hexa-v2-final-probes.fasta
--output Plate1ProbeMatches70
--min-identity 70
--min-coverage 70

I thought this would reduce the number of removed loci but the UCE counts are the same as when run with default values. We wonder if downsampling the 40M PE reads before running spades would also help, but have not tried this yet. At the moment, we are a little stuck as we had expected upwards of 1500 UCE contigs per sample.

Thanks in advance,
Jason

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions