Skip to content

Bams with the same name cause no differences detected in pairwise #12

@hbeale

Description

@hbeale

If the manifest has lists bam files with the same name in different directories (e.g. SRR12801019/Aligned.sortedByCoord.out.bam), the corresponding bed files created by splicedice bam_to_junc_bed will all be named _junction_beds/Aligned.sortedByCoord.out.junc.bed and will overwrite each other.

example of failure

Bam manifest used as input to splicedice bam_to_junc_bed

A19     /mnt/output/star_2.7.11b_2024.12.13/SRR12801019/Aligned.sortedByCoord.out.bam   HEK     control
A20     /mnt/output/star_2.7.11b_2024.12.13/SRR12801020/Aligned.sortedByCoord.out.bam   HEK     SUGP1_knockdown
A23     /mnt/output/star_2.7.11b_2024.12.13/SRR12801023/Aligned.sortedByCoord.out.bam   HEK     control
A24     /mnt/output/star_2.7.11b_2024.12.13/SRR12801024/Aligned.sortedByCoord.out.bam   HEK     SUGP1_knockdown

Bed manifest generated by splicedice bam_to_junc_bed; note that all bed files have the same path

A19     /mnt/output/splicedice_2024.12.18_02.10.36/_junction_beds/Aligned.sortedByCoord.out.junc.bed    HEK     control
A20     /mnt/output/splicedice_2024.12.18_02.10.36/_junction_beds/Aligned.sortedByCoord.out.junc.bed    HEK     SUGP1_knockdown
A23     /mnt/output/splicedice_2024.12.18_02.10.36/_junction_beds/Aligned.sortedByCoord.out.junc.bed    HEK     control
A24     /mnt/output/splicedice_2024.12.18_02.10.36/_junction_beds/Aligned.sortedByCoord.out.junc.bed    HEK     SUGP1_knockdown

looking for differences in the pairwise.tsv output from the end of the pipeline

cat pairwise.tsv | grep -v clusterID | cut -f2-7 | sort | uniq | wc -l

1

example of success

create links to bam files containing a unique name and update the manifest

Bam manifest used as input to splicedice bam_to_junc_bed

A19     /mnt/output/star_2.7.11b_2024.12.13/SRR12801019/SRR12801019_Aligned.sortedByCoord.out.bam       HEK     control
A20     /mnt/output/star_2.7.11b_2024.12.13/SRR12801020/SRR12801020_Aligned.sortedByCoord.out.bam       HEK     SUGP1_knockdown
A23     /mnt/output/star_2.7.11b_2024.12.13/SRR12801023/SRR12801023_Aligned.sortedByCoord.out.bam       HEK     control
A24     /mnt/output/star_2.7.11b_2024.12.13/SRR12801024/SRR12801024_Aligned.sortedByCoord.out.bam       HEK     SUGP1_knockdown

Bed manifest generated by splicedice bam_to_junc_bed; note that all bed files are unique

A19     /mnt/output/splicedice_2024.12.19_23.40.49/_junction_beds/SRR12801019_Aligned.sortedByCoord.out.junc.bed       HEK      control
A20     /mnt/output/splicedice_2024.12.19_23.40.49/_junction_beds/SRR12801020_Aligned.sortedByCoord.out.junc.bed       HEK      SUGP1_knockdown
A23     /mnt/output/splicedice_2024.12.19_23.40.49/_junction_beds/SRR12801023_Aligned.sortedByCoord.out.junc.bed       HEK      control
A24     /mnt/output/splicedice_2024.12.19_23.40.49/_junction_beds/SRR12801024_Aligned.sortedByCoord.out.junc.bed       HEK      SUGP1_knockdown

looking for differences in the pairwise.tsv output from the end of the pipeline

cat pairwise.tsv | grep -v clusterID | cut -f2-7 | sort | uniq | wc -l

41222

potential solutions

  • we could add the requirement to have unique bam file names to the documentation
  • bed output files could be named using the first column of the manifest (e.g. A19) instead of a permutation of the bam file name

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions