-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Hi I am pretty new to kallisto/bustools so thank you for your help in advance. I followed directions from this issue/Google collab: Issue #75
I ran kallisto bus against the human reference downloaded from https://github.com/pachterlab/kallisto-transcriptome-indices/releases. The log file looks something like this:
[index] k-mer length: 31
[index] number of targets: 188,753
[index] number of k-mers: 109,544,288
[index] number of equivalence classes: 760,757
[quant] will process sample 1: R1_mod/12BH02_S96_R1_001_mod.fastq.gz
output_fastq/12BH02_S96_R2_001.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 4,927,842 reads, 2,397,115 reads pseudoaligned
Here is a run_info.json example from one of the samples of our run:
"n_targets": 0,
"n_bootstraps": 0,
"n_processed": 4927842,
"n_pseudoaligned": 2360469,
"n_unique": 1085578,
"p_pseudoaligned": 47.9,
"p_unique": 22.0,
"kallisto_version": "0.46.2",
"index_version": 0,
"start_time": "Tue Jul 12 13:25:09 2022",
"call": "kallisto/build/src/kallisto bus -i sci-RNA-seq3/reference/transcriptome.idx -x SciRnaSeq -t 2 -o bus_output/12BH02_S96 R1_mod/12BH02_S96_R1_001_mod.fastq.gz output_fastq/12BH02_S96_R2_001.fastq.gz"
My first question is why don't some values in run_info match the log (n_targets, n_pseudoaligned)? And next I was hoping to get some insight on why we might be getting low pseudoalignment? We even tried building a new index with kbref with these attributes and our p_pseudoaligned was still 55%:
kb ref -i $REFERENCE_DIR/kbref/include_attribute/h_index.idx \
-g $REFERENCE_DIR/kbref/include_attribute/h_t2g.txt \
-f1 $REFERENCE_DIR/kbref/include_attribute/cdna.fa \
-f2 $REFERENCE_DIR/kbref/include_attribute/intron.fa \
-c1 $REFERENCE_DIR/kbref/include_attribute/cdna_t2c.txt \
-c2 $REFERENCE_DIR/kbref/include_attribute/intron_t2c.txt \
--workflow lamanno \
$REFERENCE_DIR/ensembl_107/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz $REFERENCE_DIR/ensembl_107/Homo_sapiens.GRCh38.107.gtf.gz \
--include-attribute gene_biotype:protein_coding \
--include-attribute gene_biotype:lincRNA \
--include-attribute gene_biotype:antisense \
--include-attribute gene_biotype:IG_LV_gene \
--include-attribute gene_biotype:IG_V_gene \
--include-attribute gene_biotype:IG_V_pseudogene \
--include-attribute gene_biotype:IG_D_gene \
--include-attribute gene_biotype:IG_J_gene \
--include-attribute gene_biotype:IG_J_pseudogene \
--include-attribute gene_biotype:IG_C_gene \
--include-attribute gene_biotype:IG_C_pseudogene \
--include-attribute gene_biotype:TR_V_gene \
--include-attribute gene_biotype:TR_V_pseudogene \
--include-attribute gene_biotype:TR_D_gene \
--include-attribute gene_biotype:TR_J_gene \
--include-attribute gene_biotype:TR_J_pseudogene \
--include-attribute gene_biotype:TR_C_gene
Thanks again for your help!
Metadata
Metadata
Assignees
Labels
No labels