Skip to content

XM data corruption #29

@KirillKryukov

Description

@KirillKryukov

XM compressor still has data corruption issue. Compressing some input and decompressing it back produces corrupted output. I.e., decompressed data is different from original file.

Test data size: 30,244 bytes
Test data link: http://kirill.med.u-tokai.ac.jp/data/temp/xm-repro-4-input.zip

Commands to reproduce:

Compress:
jsa.xm.compress --hashSize=11 --context=15 --limit=200 --threshold=0.15 --chance=20 --real=archive.xm original.fasta

Decompress:
jsa.xm.compress --hashSize=11 --context=15 --limit=200 --threshold=0.15 --chance=20 --decode=archive.xm --output=decompressed.fasta

Compare:
cmp original.fasta decompressed.fasta

Produces: original.fasta decompressed.fasta differ: byte 27512, line 274

The decompressed file has correct size, but corrupted sequence data. It was found during testing for Sequence Compression Benchmark.

Let me know if you need any additional information or help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions