Question about dealing with 10x data on trimmomatic

I have a 10x experiment where R1 contains the cell barcode and UMI, and R2 contains the cDNA sequence. I want to trim R2 with a sliding window approach where a 10 bp window that has an average score of below 28 is my threshold: SLIDINGWINDOW:10:28

My initial approach was just to run trimmomatic as such on R2, and then use cutadapt to filter any reads of length 0 before mapping:

`
java -jar trimmomatic-0.39.jar SE -threads 16 -phred33 ${FASTQ_base}/FASTQ/L458_898_S2_L001_R2_001.fastq.gz ${FASTQ_base}/FASTQ_qa_28_tm/trimmed/trimmed_L458_898_S2_L001_R2_001.fastq.gz SLIDINGWINDOW:10:28

cutadapt -j 0 --minimum-length :1 -o ${FASTQ_base}/FASTQ_qa_28_tm/L458_898_S2_L001_R1_001.fastq.gz -p ${FASTQ_base}/FASTQ_qa_28_tm/L458_898_S2_L001_R2_001.fastq.gz ${FASTQ_base}/FASTQ/L458_898_S2_L001_R1_001.fastq.gz ${FASTQ_base}/FASTQ_qa_28_tm/trimmed/trimmed_L458_898_S2_L001_R2_001.fastq.gz
`

However, I notice that trimmomatic does remove a small subset of reads (Input Reads: 153435707 Surviving: 150653678 (98.19%) Dropped: 2782029 (1.81%)), which then makes quickly filtering using cutadapt impossible.

Should I be using a paired end approach? Or do you have another suggestion for how to tackle this? I don't want to do any read trimming on R1 as it just contains the barcode and UMI. Thanks so much,

Matteo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about dealing with 10x data on trimmomatic #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about dealing with 10x data on trimmomatic #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions