Asdf Intermediate table entry flushing by Miauwkeru · Pull Request #48 · fox-it/dissect.evidence

Miauwkeru · 2026-01-21T12:48:23Z

No description provided.

Add functionality to search for the specific index tables

This is for AsdfSnapshot that includes the offset inside the block

So that the table is required for return the table data and cleanup of itself

…previous tables

codecov · 2026-01-28T10:42:44Z

Codecov Report

❌ Patch coverage is 94.52055% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.89%. Comparing base (72b05e7) to head (e7f8a2f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
dissect/evidence/asdf/asdf.py	94.48%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
+ Coverage   71.44%   72.89%   +1.44%     
==========================================
  Files          23       23              
  Lines        1387     1487     +100     
==========================================
+ Hits          991     1084      +93     
- Misses        396      403       +7

Flag	Coverage Δ
unittests	`72.89% <94.52%> (+1.44%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Miauwkeru · 2026-01-28T10:46:27Z

Intermediate flushing is added, and added logic to find the table entries and combine it with _table_fit.
The table_index struct contains information about the previous table, and the indexes that where flushed to the current table. This is to have a faster lookup if we are only interested in one specific index.

I can think of one limitation tho. Once the table flushed all its contents to disk, there is the issue that duplicate data can be written. Maybe it is a good idea to have an additional asdf tool to remove this kind of duplication from the file.

Schamper · 2026-01-29T12:28:23Z

dissect/evidence/asdf/asdf.py

+
+
+@dataclass
+class ReadEntry:


What's this?

There was a difference between the types getting used to insert into the table. So, to make sure the correct intent gets communicated, i thought of creating the ReadEntry. As it looked more confusing when I used table_entry.file_size for data_offset.

Schamper · 2026-02-10T11:27:41Z

dissect/evidence/asdf/asdf.py

@@ -285,24 +418,13 @@ def _write_meta(self) -> None:

    def _write_table(self) -> None:


Maybe rename to flush.

Schamper · 2026-02-10T11:28:14Z

dissect/evidence/asdf/asdf.py

+        return self._table.values()
+
+    def write(self, table_offset: int = -1) -> bytes:
+        """Creates a table to be writen to the fileheader"""


This docstring doesn't seem to be accurate.

Schamper · 2026-02-10T11:28:32Z

dissect/evidence/asdf/asdf.py

+    def values(self) -> ValuesView[list[T]]:
+        return self._table.values()
+
+    def write(self, table_offset: int = -1) -> bytes:


Why not give this function a file-like object directly to write to?

Schamper · 2026-02-10T11:29:05Z

dissect/evidence/asdf/asdf.py

        footer = c_asdf.footer(
            magic=FOOTER_MAGIC,
-            table_offset=self._table_offset,
+            table_offset=self._table.prev_table_offset,


Maybe last_table_offset.

Schamper · 2026-02-10T11:29:38Z

dissect/evidence/asdf/asdf.py

 FOOTER_MAGIC = b"FT\xa5\xdf"
 SPARSE_BYTES = b"\xa5\xdf"

+DEFAULT_NR_OF_ENTRIES = 4 * 1024 * 1024 // len(c_asdf.table_entry)


Nr of entries of what? Maybe DEFAULT_TABLE_SIZE.

Schamper · 2026-02-10T11:31:14Z

dissect/evidence/asdf/asdf.py

    def streams(self) -> Iterator[AsdfStream]:
        """Iterate over all streams in the file."""
-        for i in sorted(self.table.keys()):
+        for i in sorted(self.table._table.keys()):


Surely this can be nicer.

dissect/evidence/asdf/asdf.py

Schamper · 2026-02-10T11:36:07Z

dissect/evidence/asdf/asdf.py

+T = TypeVar("T", ReadEntry, c_asdf.table_entry)
+
+
+class Table(Generic[T]):


I don't really understand how any of this code works, what it does and what the purpose is of the added structures.

To try and explain my reasoning a bit for Table.

What I wanted to do was to create a single point where the table entries live for both ASDFWriter and ASDFSnapshot as both did similar things when either reading or writing data.

As for the added structures, c_asdf.table_index was created for the purpose of locating the previous flushed tables.

struct table_index { uint64 prev_table; // Offset of the previous table FFFFFFFFF denotes last table uint64 size; // Amount of bytes of the table uint64 indexes[4]; // Which table entries are inside };

While testing for the worst case, and unrealistic, maximum table entries of 1; there were some issues with the tests due to the lookup offsets not matching the test data anymore. This gave birth to Table.lookup where it looks inside the previous tables and retrieves the offsets from there.

To speed up that process I added the indexes to table_index where it stores the stream indexes contained inside the table. This is stored in a 256 bits bitmap. Where we can reuse these indexes to search only for specific stream indexes required by ASDFStream. Although, this might be thinking too far ahead.

Of course I could have made table_index.indexes dynamic by using a buffer, but I thought a consistent size of the structure would be more beneficial.

When looking up data, we need to know what other tables are available. Which is why I added the _table_offsets. To keep track of any flushed tables indicated by c_asdf.table_index.

And I should have probably added such an explanation to Table in the first place.

…table

Miauwkeru · 2026-02-19T15:05:52Z

I added documentation to the classes and such

Schamper · 2026-03-13T00:21:08Z

dissect/evidence/asdf/c_asdf.py


+// A structure to keep track of previously flushed tables
+struct table_index {
+    uint64      prev_table;     // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table


Suggested change

uint64 prev_table; // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table

uint64 prev_table; // Offset of the previous table, 0xFFFFFFFF_FFFFFFF denotes the last table

Schamper · 2026-03-13T00:21:46Z

dissect/evidence/asdf/c_asdf.py

+struct table_index {
+    uint64      prev_table;     // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table
+    uint64      size;           // Amount of bytes of the table
+    uint64      indexes[4];     // Which stream indexes are available inside the table


Expand the docstring on how this is stored. Based on this I assume there's only 4 stream indexes available, each stored as a uint64.

Schamper · 2026-03-13T00:22:25Z

dissect/evidence/asdf/asdf.py

+        indexes = sum(1 << key for key in self._table)
+        return [(indexes >> (x * 64)) & OFFSET_MASK for x in range(256 // 64)]
+
+    def lookup(self, idx: int, fh: BinaryIO) -> list[int]:


This is unused?

Schamper · 2026-03-13T00:23:40Z

dissect/evidence/asdf/asdf.py

+            table_offset = self.fh.tell()
+            table_index = c_asdf.table_index(self.fh)
+            table_offsets.append((table_offset, table_index))
+            if table_index.prev_table == OFFSET_MASK:


This constant is confusingly named.

Schamper · 2026-03-13T00:24:27Z

dissect/evidence/asdf/asdf.py

+
+
+class Table(Generic[T]):
+    """A single point for the table entries to get collected for reading and writing."""


Having one class be responsible for both reading and writing of the table feels a little awkward. Awkward new dataclasses are introduced and both APIs end up just slightly awkward.

Miauwkeru added 7 commits January 8, 2026 17:33

Use temporary table for adding entry table

ecd63e9

Add new table_index checks

a154376

Add functionality to search for the specific index tables

Change tuples to use c_asdf.table_entry

509db93

Make Table more generic to hold a ReadEntry

1636503

This is for AsdfSnapshot that includes the offset inside the block

Move all write logic to Table.write

6285c1a

So that the table is required for return the table data and cleanup of itself

Change Table.lookup to use _table_fit to find all related entries in …

2aa0db9

…previous tables

Adjust typings to be more specific

08f4761

Miauwkeru force-pushed the asdf-intermediate-data-flushing branch from 2ba0059 to 08f4761 Compare January 28, 2026 10:41

Miauwkeru marked this pull request as ready for review January 28, 2026 10:46

Miauwkeru requested a review from Schamper January 28, 2026 10:46

Schamper requested changes Feb 10, 2026

View reviewed changes

Miauwkeru added 6 commits February 19, 2026 16:04

Update documentation and add fh to lookup

ecaba55

Rename DEFAULT_NR_OF_ENTRIES to DEFAULT_TABLE_SIZE

58b8512

change Table.write to write to a filehandle directly

2fcd5eb

Rename prev_table_offset to last_table_offset

a453b77

Add Table.keys() which returns all the stream indexes in the current …

6febd9f

…table

Rename AsdfWriter._write_table to AsdfWriter.flush

e7f8a2f

Miauwkeru requested a review from Schamper February 19, 2026 15:05

Schamper requested changes Mar 13, 2026

View reviewed changes

		@@ -285,24 +418,13 @@ def _write_meta(self) -> None:

		def _write_table(self) -> None:

		T = TypeVar("T", ReadEntry, c_asdf.table_entry)


		class Table(Generic[T]):

	uint64 prev_table; // Offset of the previous table 0xFFFFFFFF_FFFFFFF denotes last table
	uint64 prev_table; // Offset of the previous table, 0xFFFFFFFF_FFFFFFF denotes the last table



		class Table(Generic[T]):
		"""A single point for the table entries to get collected for reading and writing."""

Conversation

Miauwkeru commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Miauwkeru commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miauwkeru commented Feb 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Miauwkeru commented Jan 21, 2026 •

edited

Loading

codecov bot commented Jan 28, 2026 •

edited

Loading