Skip to content

TypeError in ported_string when parsing headers as email.header.Header objects #148

@AndreasLF

Description

@AndreasLF

Describe the bug
The ported_string utility function raises a TypeError when it receives an instance of email.header.Header.

In certain edge cases (specifically involving "dirty" Outlook conversions or specific email library configurations), the parser extracts an email.header.Header object instead of a string. Because ported_string does not check for this object type, the parser crashes when it attempts to sanitize these complex headers.

To Reproduce

  1. Create a minimal Python script that simulates the edge case by creating a Header object manually.
  2. Pass this object to ported_string.
from email.header import Header
from mailparser.utils import ported_string

# This simulates the return value of p.get() in complex parsing scenarios
h = Header("Test Filename.pdf", charset="utf-8")

# This causes a TypeError because ported_string expects str or bytes
ported_string(h)

This causes the followingTypeError because ported_string expects str or bytes:
TypeError: decoding to str: need a bytes-like object, Header found

Expected behavior
ported_string should detect that the input is an email.header.Header object and convert it to a string (using str(obj) or six.text_type(obj)) before returning.

Raw mail
I cannot provide the full original .eml file due to GDPR restrictions. Additionally, creating a synthetic file that forces the Python standard email library to return a Header object is non-deterministic, as it often depends on specific system locales and Python versions.

Relevant Header Context: The crash happens specifically when parsing the Content-Disposition header (see traceback). In my environment, the header contained mixed encoding/dirty bytes (likely from an Outlook conversion) similar to: Content-Disposition: attachment; filename="Report \r\n\t\x96\x96\x96 Final.pdf"

While mailparser usually receives a string here, the traceback confirms that in this specific case, the email library returned an email.header.Header object, which mailparser then failed to handle.

Looking at the docstring for decode_header from the Python standard email library, it seems that the library explicitly supports Header objects in this flow:

header may be a string that may or may not contain RFC2047 encoded words,
or it may be a Header object.

This indicates that mailparser (specifically ported_string) needs to handle Header objects defensively, as they are a valid state within the email library ecosystem.

Environment:

  • OS: MacOS 15.7.3 (Python 3.12.8)
  • Docker: no
  • mail-parser version 4.1.4

Traceback

return mailparser.parse_from_bytes(inner_email_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.virtualenvs/project/lib/python3.11/site-packages/mailparser/core.py", line 113, in parse_from_bytes
    return MailParser.from_bytes(bt)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.virtualenvs/venv/lib/python3.11/site-packages/mailparser/core.py", line 236, in from_bytes
    return cls(message)
           ^^^^^^^^^^^^
  File "/Users/user/.virtualenvs/venv/lib/python3.11/site-packages/mailparser/core.py", line 132, in _init_
    self.parse()
  File "/Users/user/.virtualenvs/venv/lib/python3.11/site-packages/mailparser/core.py", line 395, in parse
    content_disposition = ported_string(p.get("content-disposition"))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.virtualenvs/venv/lib/python3.11/site-packages/mailparser/utils.py", line 88, in wrapper
    return normalize("NFC", func(*args, **kwargs))
                            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.virtualenvs/venv/lib/python3.11/site-packages/mailparser/utils.py", line 122, in ported_string
    return six.text_type(raw_data, encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: decoding to str: need a bytes-like object, Header found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions