Skip to content

Question about dealing with 10x data on trimmomatic #47

@mat10d

Description

@mat10d

I have a 10x experiment where R1 contains the cell barcode and UMI, and R2 contains the cDNA sequence. I want to trim R2 with a sliding window approach where a 10 bp window that has an average score of below 28 is my threshold: SLIDINGWINDOW:10:28

My initial approach was just to run trimmomatic as such on R2, and then use cutadapt to filter any reads of length 0 before mapping:

`
java -jar trimmomatic-0.39.jar SE -threads 16 -phred33 ${FASTQ_base}/FASTQ/L458_898_S2_L001_R2_001.fastq.gz ${FASTQ_base}/FASTQ_qa_28_tm/trimmed/trimmed_L458_898_S2_L001_R2_001.fastq.gz SLIDINGWINDOW:10:28

cutadapt -j 0 --minimum-length :1 -o ${FASTQ_base}/FASTQ_qa_28_tm/L458_898_S2_L001_R1_001.fastq.gz -p ${FASTQ_base}/FASTQ_qa_28_tm/L458_898_S2_L001_R2_001.fastq.gz ${FASTQ_base}/FASTQ/L458_898_S2_L001_R1_001.fastq.gz ${FASTQ_base}/FASTQ_qa_28_tm/trimmed/trimmed_L458_898_S2_L001_R2_001.fastq.gz
`

However, I notice that trimmomatic does remove a small subset of reads (Input Reads: 153435707 Surviving: 150653678 (98.19%) Dropped: 2782029 (1.81%)), which then makes quickly filtering using cutadapt impossible.

Should I be using a paired end approach? Or do you have another suggestion for how to tackle this? I don't want to do any read trimming on R1 as it just contains the barcode and UMI. Thanks so much,

Matteo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions