Skip to content

Implement hints for filetype detection #1

@PeterKraus

Description

@PeterKraus

This issue is a follow-up of marda-alliance/metadata_extractors_schema#45.

In marda-alliance/metadata_extractors_schema#48, we have implemented the associated_file_extensions slot in the FileType schema, to specify some metadata that can be used to match files to FileTypes.

However, further hints could be included, such as common MIME types or magic bits. This idea needs a bit of planning work.

See also here:

This is my usecase. If someone uploads an arbitrary file to my ELN and I have a whole registry of tools to process it, the ELN still needs to figure out which tool to use. Identifying the FileType would give you the connection. Otherwise, I need to rely on the source (e.g. user) to tell me the type.

To apply a tool, the ELN needs to figure out the FileType one way or another. This is why you ask for a FileType identifier, right? Maybe it is difficult, but If you agree that it is a valid use-case, why wait for the next MaRDA WG to figure it out? I am not sure how additional information would reduce the useful-ness.

Let's say we are not using the registry to identify FileTypes. The tools in the registry still need to somehow tell what their intended input FileType is. And it ought to be more specific than JSON, HDF5, csv, etc. Why not describe the FileType by characteristics that would help to identify a file's type?

Originally posted by @markus1978 in marda-alliance/metadata_extractors_schema#9 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions