Conversation
Add functionality to search for the specific index tables
This is for AsdfSnapshot that includes the offset inside the block
So that the table is required for return the table data and cleanup of itself
2ba0059 to
08f4761
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
==========================================
+ Coverage 71.44% 72.89% +1.44%
==========================================
Files 23 23
Lines 1387 1487 +100
==========================================
+ Hits 991 1084 +93
- Misses 396 403 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Intermediate flushing is added, and added logic to find the table entries and combine it with I can think of one limitation tho. Once the table flushed all its contents to disk, there is the issue that duplicate data can be written. Maybe it is a good idea to have an additional asdf tool to remove this kind of duplication from the file. |
|
|
||
|
|
||
| @dataclass | ||
| class ReadEntry: |
There was a problem hiding this comment.
There was a difference between the types getting used to insert into the table. So, to make sure the correct intent gets communicated, i thought of creating the ReadEntry. As it looked more confusing when I used table_entry.file_size for data_offset.
dissect/evidence/asdf/asdf.py
Outdated
| @@ -285,24 +418,13 @@ def _write_meta(self) -> None: | |||
|
|
|||
| def _write_table(self) -> None: | |||
dissect/evidence/asdf/asdf.py
Outdated
| return self._table.values() | ||
|
|
||
| def write(self, table_offset: int = -1) -> bytes: | ||
| """Creates a table to be writen to the fileheader""" |
There was a problem hiding this comment.
This docstring doesn't seem to be accurate.
dissect/evidence/asdf/asdf.py
Outdated
| def values(self) -> ValuesView[list[T]]: | ||
| return self._table.values() | ||
|
|
||
| def write(self, table_offset: int = -1) -> bytes: |
There was a problem hiding this comment.
Why not give this function a file-like object directly to write to?
dissect/evidence/asdf/asdf.py
Outdated
| footer = c_asdf.footer( | ||
| magic=FOOTER_MAGIC, | ||
| table_offset=self._table_offset, | ||
| table_offset=self._table.prev_table_offset, |
dissect/evidence/asdf/asdf.py
Outdated
| FOOTER_MAGIC = b"FT\xa5\xdf" | ||
| SPARSE_BYTES = b"\xa5\xdf" | ||
|
|
||
| DEFAULT_NR_OF_ENTRIES = 4 * 1024 * 1024 // len(c_asdf.table_entry) |
There was a problem hiding this comment.
Nr of entries of what? Maybe DEFAULT_TABLE_SIZE.
dissect/evidence/asdf/asdf.py
Outdated
| def streams(self) -> Iterator[AsdfStream]: | ||
| """Iterate over all streams in the file.""" | ||
| for i in sorted(self.table.keys()): | ||
| for i in sorted(self.table._table.keys()): |
| T = TypeVar("T", ReadEntry, c_asdf.table_entry) | ||
|
|
||
|
|
||
| class Table(Generic[T]): |
There was a problem hiding this comment.
I don't really understand how any of this code works, what it does and what the purpose is of the added structures.
There was a problem hiding this comment.
To try and explain my reasoning a bit for Table.
What I wanted to do was to create a single point where the table entries live for both ASDFWriter and ASDFSnapshot as both did similar things when either reading or writing data.
As for the added structures, c_asdf.table_index was created for the purpose of locating the previous flushed tables.
struct table_index {
uint64 prev_table; // Offset of the previous table FFFFFFFFF denotes last table
uint64 size; // Amount of bytes of the table
uint64 indexes[4]; // Which table entries are inside
};While testing for the worst case, and unrealistic, maximum table entries of 1; there were some issues with the tests due to the lookup offsets not matching the test data anymore. This gave birth to Table.lookup where it looks inside the previous tables and retrieves the offsets from there.
To speed up that process I added the indexes to table_index where it stores the stream indexes contained inside the table. This is stored in a 256 bits bitmap. Where we can reuse these indexes to search only for specific stream indexes required by ASDFStream. Although, this might be thinking too far ahead.
Of course I could have made table_index.indexes dynamic by using a buffer, but I thought a consistent size of the structure would be more beneficial.
When looking up data, we need to know what other tables are available. Which is why I added the _table_offsets. To keep track of any flushed tables indicated by c_asdf.table_index.
And I should have probably added such an explanation to Table in the first place.
|
I added documentation to the classes and such |
|
|
||
| // A structure to keep track of previously flushed tables | ||
| struct table_index { | ||
| uint64 prev_table; // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table |
There was a problem hiding this comment.
| uint64 prev_table; // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table | |
| uint64 prev_table; // Offset of the previous table, 0xFFFFFFFF_FFFFFFF denotes the last table |
| struct table_index { | ||
| uint64 prev_table; // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table | ||
| uint64 size; // Amount of bytes of the table | ||
| uint64 indexes[4]; // Which stream indexes are available inside the table |
There was a problem hiding this comment.
Expand the docstring on how this is stored. Based on this I assume there's only 4 stream indexes available, each stored as a uint64.
| indexes = sum(1 << key for key in self._table) | ||
| return [(indexes >> (x * 64)) & OFFSET_MASK for x in range(256 // 64)] | ||
|
|
||
| def lookup(self, idx: int, fh: BinaryIO) -> list[int]: |
| table_offset = self.fh.tell() | ||
| table_index = c_asdf.table_index(self.fh) | ||
| table_offsets.append((table_offset, table_index)) | ||
| if table_index.prev_table == OFFSET_MASK: |
There was a problem hiding this comment.
This constant is confusingly named.
|
|
||
|
|
||
| class Table(Generic[T]): | ||
| """A single point for the table entries to get collected for reading and writing.""" |
There was a problem hiding this comment.
Having one class be responsible for both reading and writing of the table feels a little awkward. Awkward new dataclasses are introduced and both APIs end up just slightly awkward.
No description provided.