Skip to content

One estimated species changes with example datasets when running forward and reverse files together #44

@samuell

Description

@samuell

I tried running abundance on the example datasets, as demonstrated in .gitlab-ci.yml.

How to reproduce

$ ./emu abundance --output-dir manual_test_results --db emu_database --type sr example/short_read_f.fq
$ ./emu abundance --output-dir manual_test_results --db emu_database --type sr example/short_read_f.fq example/short_read_r.f

This produces two abundance files:

$ l -1
short_read_f_rel-abundance.tsv
short_read_f-short_read_r_rel-abundance.tsv

The content of these are as follows:

$ cat short_read_f_rel-abundance.tsv | column -t
tax_id               abundance  species         genus    family          order              class       phylum   clade       superkingdom   subspecies  species   subgroup  species  group
1290                 1.0        Staphylococcus  hominis  Staphylococcus  Staphylococcaceae  Bacillales  Bacilli  Firmicutes  Terrabacteria  group       Bacteria                     
unmapped             0.0                                                                                                                                                             
mapped_unclassified  0.0
$ cat short_read_f-short_read_r_rel-abundance.tsv | column -t
tax_id               abundance           species         genus     family          order               class             phylum               clade           superkingdom   subspecies  species   subgroup  species  group
1280                 0.3333333333333333  Staphylococcus  aureus    Staphylococcus  Staphylococcaceae   Bacillales        Bacilli              Firmicutes      Terrabacteria  group       Bacteria                     
28901                0.3333333333333333  Salmonella      enterica  Salmonella      Enterobacteriaceae  Enterobacterales  Gammaproteobacteria  Proteobacteria  Bacteria                                                
1355                 0.3333333333333333  Enterococcus    columbae  Enterococcus    Enterococcaceae     Lactobacillales   Bacilli              Firmicutes      Terrabacteria  group       Bacteria                     
unmapped             0.0                                                                                                                                                                                              
mapped_unclassified  0.0

Actual output

As you can see, when running abundance on both the forward and reverse files (*_f.fq and *_r.fq), we no longer find "Staphylococcus hominis", but instead "Staphylococcus aureus".

Expected output

I would not have expected that "Staphylococcus hominis", which was the only finding for the forward file in the first command, to change into "Staphylococcus aureus" when combined with the reverse reads file.

Any comments on this? Is this to be expected?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions