Skip to content

Docx mime type incorrectly guessed #117

@AlfonsoUceda

Description

@AlfonsoUceda

Hi!

I've been experienced an error with Marcel where having a docx file (a real one) gets detected as application/zip instead of application/vnd.openxmlformats-officedocument.wordprocessingml.document, instead using file --mime-type command correctly detects the right mime type.

I've been researching why this happens and it seems the matchers defined here https://github.com/rails/marcel/blob/main/lib/marcel/tables.rb#L2416 expect a right order of those strings in the file.
I've compared two docx files one that is correctly detected by Marcel and the other one not. It seems that identifier aren't in the correct order. E.g. the [Content_Type].xml check is almost at the end of the file.

The following snippet works but I guess the gem reads first bytes for performance reasons so doesn't have to check include in the whole file.

file = Pathname.new('PATH_DOCX').open
content = file.read
content.include?('[Content_Types].xml') # => true
content.include?('word/') # => true
content.include?('_rels/.rels') # => true

Cheers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions