Skip to content

ParseError with single-line files stored as io.BytesIO #370

@william-watson-swri

Description

@william-watson-swri

Version
pymzml: 2.5.10
Python: 3.11.7

Description
I'm receiving mzML files as bytes, wrapping these in io.BytesIO, and then passing that to pymzml.run.Reader:

reader = pymzml.run.Reader(io.BytesIO(mzml_bytes))

This sometimes raises the following exception:

ParseError: no element found: line 1, column 0

Why
Some of the mzML files I'm using do not have line breaks - i.e. they are all on a single line, and the _guess_encoding function breaks these. Looking at the pymzml source, the io.BytesIO objects travel through this line, which in turn calls the culprit, _guess_encoding:

match = regex_patterns.FILE_ENCODING_PATTERN.search(mzml_file.readline())

After the .readline(), there's no data left in the BytesIO if the file has no line breaks, and thus the later XML parsing fails.

Workaround/fix
I'm current inserting a line break at the start of the XML data before passing it to pymzml:

data = re.sub(br'(<\?xml[^>]+>)', br'\1\n', mzml_bytes, count=1)

I believe this could also be fixed by just adding mzml_file.seek(0) after the offending line.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions