feat(reporting): report meta-data information about chunks.#557
feat(reporting): report meta-data information about chunks.#557
Conversation
0b150ba to
1cc1169
Compare
cfffb33 to
2ed0237
Compare
|
@e3krisztian implemented the changes we talked about and introduced a test. |
| end_offset: int | ||
| size: int | ||
| is_encrypted: bool | ||
| metadata: dict = attr.ib(factory=dict) |
There was a problem hiding this comment.
just wondering if we want to validate metadata dict, do we want to enforce that key is a string and value is of a certain type, or we are ok we anything, even nested meta-data?
There was a problem hiding this comment.
Maybe it would be great to somehow have a "namespace" or at least some convention on metadata variable naming?
What if we want to push data from multiple headers, or permissions, etc?
There was a problem hiding this comment.
I'm ok with enforcing a convention on metadata variable naming. Having a namespace would be too complicated since we can't foresee the metadata field names used by handlers.
I would enforce that metadata is a dict without nested data, keys must be strings and values must be base types.
I would convey information about files created (timestamps, permissions, owner) with something different since it involves way more complex structures.
unblob/handlers/archive/sevenzip.py
Outdated
| end_offset = first_db_header + header.next_header_size | ||
| return ValidChunk(start_offset=start_offset, end_offset=end_offset) | ||
| return ValidChunk( | ||
| start_offset=start_offset, end_offset=end_offset, metadata=header |
There was a problem hiding this comment.
do we want to pass all attributes from the header as metadata?
There was a problem hiding this comment.
This point came up when discussing with @e3krisztian yesterday. I think it's better to only pass the most relevant header attributes rather than the whole instance.
5a7bf12 to
443985f
Compare
443985f to
77cb778
Compare
77cb778 to
0f5d9f2
Compare
601f123 to
6ef2737
Compare
6ef2737 to
a312492
Compare
a312492 to
ef6e981
Compare
Allow handlers to provide a dict value as part of a ValidChunk metadata attribute. That dictionnary can contain any relevant metadata information from the perspective of the handler, but we advise handler writers to report parsed information such as header values. This metadata dict is later reported as part of our ChunkReports and available in the JSON report file if the user requested one. The idea is to expose metadata to further analysis steps through the unblob report. For example, a binary analysis toolkit would read the load address and architecture from a uImage chunk to analyze the file extracted from that chunk with the right settings. A note on the 'as_dict' implementation. The initial idea was to implement it in dissect.cstruct (see fox-it/dissect.cstruct#29), but due to expected changes in the project's API I chose to implement it in unblob so we're not dependent on another project.
ef6e981 to
f6bad66
Compare
fe05dec to
064e1ad
Compare
Allow handlers to provide a dict value as part of a
ValidChunkmetadata attribute. That dictionary can contain any relevant metadata information from the perspective of the handler, but we advise handler writers to report parsed information such as header values.This metadata dict is later reported as part of our
ChunkReportsand available in the JSON report file if the user requested one.The idea is to expose metadata to further analysis steps through the unblob report. For example, a binary analysis toolkit would read the load address and architecture from a uImage chunk to analyze the file extracted from that chunk with the right settings.
A note on the 'as_dict' implementation.
The initial idea was to implement it in dissect.cstruct (see fox-it/dissect.cstruct#29), but due to expected changes in the project's API I chose to implement it in unblob so we're not dependent on another project.
Related to #16 and initial discussion in #16 (comment)
You can observe the changes like this: