Skip to content

Conversation

@TeofilC
Copy link
Collaborator

@TeofilC TeofilC commented Dec 29, 2025

The eventlog format consists a stream of blocks. Each of these blocks starts with an EventBlock event, which tells us the capability the block belongs to, the size and the end time. Note that the difference between the time of the last block and the start of the current one denotes the time spent writing the eventlog to disk (neat!). The events in each block are sorted since each is exclusively owned by own capability.

Currently, ghc-events just discards these events. When we want to write stuff to disk again, we group all events from each capability together and create an artificial EventBlock. This creates a valid eventlog but we lose a great deal of information, and lose some nice properties.

Chunks are guaranteed to be smaller than GHC's eventlog buffer size, and when using eventlog-flush-interval they are also limited in time. If we have a chunk from each capability loaded into memory then we can efficiently sort our events without having to load the entire eventlog. In practice, we are merging a sorted stream for each capability.

Unfortunately this change on its own is not enough to allow roundtripping eventlogs faithfully, but it is a step towards doing so. We still need to preserve information about the order of blocks in the eventlog.

Resolves #99

@TeofilC TeofilC force-pushed the wip/preserve-blocks branch 5 times, most recently from 0a2956b to f07b3b8 Compare December 29, 2025 18:35
@TeofilC TeofilC changed the title wip: preserve EventBlock Preserver EventBlock events Dec 29, 2025
@TeofilC TeofilC force-pushed the wip/preserve-blocks branch from f07b3b8 to c5e55e1 Compare December 29, 2025 18:49
@TeofilC TeofilC changed the title Preserver EventBlock events Preserve EventBlock events Dec 29, 2025
@TeofilC TeofilC marked this pull request as ready for review December 29, 2025 18:49
The eventlog format consists a stream of blocks. Each of these blocks starts with an EventBlock event, which tells us the capability the block belongs to, the size and the end time. Note that the difference between the time of the last block and the start of the current one denotes the time spent writing the eventlog to disk (neat!). The events in each block are sorted since each is exclusively owned by own capability.

Currently, ghc-events just discards these events. When we want to write stuff to disk again, we group all events from each capability together and create an artificial EventBlock. This creates a valid eventlog but we lose a great deal of information, and lose some nice properties.

Chunks are guaranteed to be smaller than GHC's eventlog buffer size, and when using eventlog-flush-interval they are also limited in time. If we have a chunk from each capability loaded into memory then we can efficiently sort our events without having to load the entire eventlog. In practice, we are merging a sorted stream for each capability.

Unfortunately this change on its own is not enough to allow roundtripping eventlogs faithfully, but it is a step towards doing so. We still need to preserve information about the order of blocks in the eventlog.

Resolves #99
@TeofilC TeofilC force-pushed the wip/preserve-blocks branch from c5e55e1 to 3af1d9b Compare December 30, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handling EventBlocks and roundtripping

2 participants