New Kallisto-0.48.0, usage of -x BDWTA, outputting too many barcodes

Hi,
I am using the 0.48.0 version of kallisto, as well as bustools (0.41.0) to demultiplex and obtain gene count tables for my BD Rhapsody WTA data.  This is my initial kallisto bus script:
**kallisto bus --index ./mus_musculus/transcriptome.idx -o /${f} --technology=BDWTA --threads=16 --fr-stranded ${f}_R1.fastq ${f}_R2.fastq -g /mus_musculus/Mus_musculus.GRCm38.96.gtf**
Example result:
[index] k-mer length: 31
[index] number of targets: 118,489
[index] number of k-mers: 100,614,952
[index] number of equivalence classes: 433,624
[quant] will process sample 1: control_R1.fastq
                               control_R2.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 289,230,676 reads, 224,235,182 reads pseudoaligned
From there I sorted my .bus file and tried to generate a count table:
bustools sort -o sorted.bus output.bus
bustools count --genecounts -g /mus_musculus/transcripts_to_genes.txt -t transcripts.txt -e matrix.ec -o counts sorted.bus
This through me an odd matrix with dimensions:  13 18494348
From there I decided to correct the .bus file with bustools correct.  I didn't see any whitelists for the BDWTA data so I also generated my own whitelists for each set of data and then sorted it:
**bustools whitelist -o control_whitelist output.bus**
Example results:
Read in 102086448 BUS records, wrote 232194 barcodes to whitelist with threshold 61
**bustools correct -o corr_control.bus --whitelist control_whitelist output.bus**
Example results:
Found 232194 barcodes in the whitelist
Processed 224235182 BUS records
In whitelist = 176801187
Corrected    = 5916173
Uncorrected  = 41517822
Then I sorted the .bus file
**bustools sort -o sorted_corr_control.bus corr_control.bus**
and ran bustools count:
**bustools count --genecounts -g /mus_musculus/transcripts_to_genes.txt -t transcripts.txt -e matrix.ec -o control_counts sorted_corr_control.bus**
I now have a matrix with more reasonable dimensions: 16632  9838 (with 9838 barcodes detected), but I am expecting to see ~2500 unique barcodes per sample.  I am actually seeing a range between ~10,000 to 2500 barcodes per sample (across 4 samples).  Do I have a mistake in how I am generating the whitelist? Is there already a built-in whitelist for the BDWTA data?
Thank you for your time!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Kallisto-0.48.0, usage of -x BDWTA, outputting too many barcodes #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Kallisto-0.48.0, usage of -x BDWTA, outputting too many barcodes #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions