Skip to content

Overrepresented sequences remain after adapter trimming #39

@judy-m2

Description

@judy-m2

Hello -

I am working on RNA data and I am trying to remove the adapter sequences from my reads. My raw data looks something like this:

Raw data

When I run the recommended settings for adapters [ILLUMINACLIP:/$EBROOTTRIMMOMATIC/adapters/TruSeq3-PE.fa:2:30:10:2:True],
the "Adapter Content" tab on the fastqc report no longer gives a warning but all the overrepresented sequences are still there.

I tried to adjust the settings of the adapter trimming step, and got some better results, but I still have adapter content in the overrepresented sequences.

I ran trimmomatic like this

java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.39.jar PE RawReads/GMCF-1049-DMD-1_S1_L001_R1_001.fastq.gz RawReads/GMCF-1049-DMD-1_S1_L001_R2_001.fastq.gz -trimlog DMD1-logfile.log -baseout trimmedReads_v2/DMD_1.fq ILLUMINACLIP:/$EBROOTTRIMMOMATIC/adapters/TruSeq3-PE.fa:2:40:15:1:True LEADING:3 TRAILING:3 MINLEN:36 HEADCROP:10

and my overrepresented sequences still look like this:
Screen Shot 2022-09-20 at 3 14 18 PM

Now I know that with RNA seq data, you're suppose to get overrepresented sequences because those are the over expressed genes. However, my concern is that the overrepresented sequences are still being identified as adapters. Is this a problem? Should I change the settings on the adapter trimming step again to allow for a higher threshold, or do I run the risk of cutting sequences that I want to keep.

Any advice would be helpful. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions