jonathan's notes #1

schacon · 2026-03-16T13:47:44Z

schacon
Mar 16, 2026
Maintainer

Jonathan Tan wrote this in Discord:

Here are my thoughts on gmeta. I'm only writing based on what I read in https://schacon.github.io/gmeta/ and https://github.com/schacon/gmeta - in particular, I didn't look at the source code.

My summary of `gmeta`

This is a system of attaching metadata to certain "targets". The metadata is stored locally in a SQLite database and can be serialized into Git objects, which can be pushed, fetched, and "materialized" (combining fetched metadata with local metadata).

Local storage

Each metadata item consists of a "target" (commit, change ID, branch, path, or "project" representing a global value), a "key" (arbitrary string with limits on bytes allowed), and a "value". The value can be of type "string" (from what I can see, there are no disallowed bytes), "list", or tombstone.

Each modification is also written to a log, including its timestamp and the email address of the user who made the modification.

"List" is actually a set of (timestamp, string) pairs, but the UX mostly treats this as a list of strings ordered by timestamp (for example, when writing a list, the second element is written with the timestamp incremented by 1 and so on). The "lists" can be pushed to, popped from, and cleared, but more complicated manipulations like inserting an item in an arbitrary position don't seem possible without rewriting the timestamps of existing items or forging the timestamp of the newly added item.

Tombstone doesn't seem strictly necessary except possibly to make gmeta get --with-authorship (get the current value and provide information about the last modification) not need to check the log to see if a missing entry is due to a deletion.

Serialization

Whenever serialization happens, a commit is written to refs/meta/local. The commit's trees, as recursively seen by a command like git ls-tree -r, will contain:

for a "string", 100644 blob <blob-id> <target-path>/<key-path>/__value
for a "list", multiple 100644 blob <blob-id> <target-path>/<key-path>/__list/<timestamp>-<short-hash-of-contents> entries
for a tombstone, 100644 blob 6f31... <blob-id> <target-path>/<key-path>/__deleted pointing to a blob with JSON-stored timestamp and email of deleter

It is unclear to me what the parent of this commit is, if any. It is written:

We don't need the entire log to be serialized in the exchange format, only the most recent values.

This leads me to think that whenever serialization happens, a single commit with no parent is written. But subsequently, there is talk of a "three-way merge" and "fast-forward materialization", which are only possible if there is a commit for every modification, and if each commit has a parent representing the immediately previous modification. So I'm not sure.

Exchange

The refs/meta/local commit can be pushed and fetched as usual. A gmeta materialize <remote> command (which assumes that the metadata was fetched into refs/meta/remotes/<remote>) is included; it will combine the contents of that ref with the local contents. For "string" values, last timestamp wins (where the timestamp is stored doesn't seem to be described in the spec). For "list", all the entries from both sides are combined and deduplicated - this works because they are stored as <timestamp>-<short-hash-of-contents>.

Automated retargeting of metadata when a commit is rewritten?

The spec doesn't seem to describe this.

Comparison with `git notes`

git notes is more restricted in its targets (only supports any Git object; does not support arbitrary strings like change IDs) and has no concept of gmeta's "key" (it maps targets to values directly). It also supports
only the equivalent of gmeta's "string" value. Its main on-disk format is a commit with a flat tree containing entries whose filename is the target object ID in hexadecimal and whose blob ID is the value.

However, it is more integrated with other Git commands. git log can automatically display associated notes, and git can be configured to automatically rewrite notes when a commit is rewritten (from the documentation, currently rewriting through only amend and rebase seems to be supported). There is no automatic fetching/pushing of notes whenever its associated commit is fetched/pushed, though.

Possible hybrid solutions

It might be worth discussing hybrid solutions, especially if we plan to upstream gmeta. One option is to reuse the git notes format, but by convention treat the value as a JSON file (or Git trailers, and so on). This makes it cumbersome to write large payloads to one specific key, though (not only do we have to base64 encode or similar, we always have to rewrite the whole file).

Probably a better option is to greatly expand the target part of git notes into a target+key format. Since we are adding backwards incompatible entries anyway, we could add nested trees, much like in the gmeta serialization format. Existing notes would remain as-is (treated as metadata with a commit "target" and an empty "key"). The existing mechanism that automatically rewrites commits could be expanded.

Both options could also be combined. Small human-readable payloads like "agent model" and "agent provider" could go in the empty-string key part (all combined into one blob), which git log can be configured to automatically display, and large payloads like a transcript could go into a named key into its own blob, and git log will not automatically display them.

As a migration path, we could even have both refs/meta/local and refs/notes/commits (where git notes stores its data) operating at the same time. refs/meta/local could store whatever we want (whether the original proposal or one of my proposed hybrid solutions) and in refs/notes/commits, we note the original object ID of any commit that we have metadata for. Then, Git can track rewrites, and whenever the user performs an operation on metadata, we can automatically update the existing notes according to what's in refs/notes/commits before proceeding with the operation.

Conclusion

If we're interested in better integration with Git and/or upstreaming this, I think we should consider something more like the hybrid solutions proposed above, or at least explain why we can't do them (e.g. Git objects are too slow, which is why we're using a SQLite database, or we really need the "list" data type which is cumbersome to represent with Git objects, so we might as well design it from scratch). Even if not, it might be worth investigating if operating directly with the serialized form (instead of having a SQLite database) is possible, as that is simpler (only one data representation).

Also, we should clarify whether the serialized commit has a parent or not.

schacon · 2026-03-16T13:53:06Z

schacon
Mar 16, 2026
Maintainer Author

Tombstone doesn't seem strictly necessary except possibly to make gmeta get --with-authorship (get the current value and provide information about the last modification) not need to check the log to see if a missing entry is due to a deletion.

So, I wanted to have two possibilities. One is to explicitly say that a value is deleted. As in, if it exists in the sqlite db, it should explicitly be removed. One could see that a value was removed in data pulled, but I would also like to have the idea of being able to do graftable rewrites of these histories, where someone could, for example, reset this every year or whatever if it's getting large and it wouldn't remove anything, only add from that point on. However, perhaps we can just use a deletion from a head you have to one you fetch as an explicit removal and have a different methodology for ref fetches that all of a sudden don't share a parent.

0 replies

schacon · 2026-03-16T13:55:03Z

schacon
Mar 16, 2026
Maintainer Author

This leads me to think that whenever serialization happens, a single commit with no parent is written. But subsequently, there is talk of a "three-way merge" and "fast-forward materialization", which are only possible if there is a commit for every modification, and if each commit has a parent representing the immediately previous modification. So I'm not sure.

No, I certainly write a parent as the last thing that was serialized or materialized for this data. This is indeed how I do 3-way determination of conflict or fast detection of new values.

0 replies

schacon · 2026-03-16T13:55:51Z

schacon
Mar 16, 2026
Maintainer Author

which are only possible if there is a commit for every modification,

I only assumed writing state to refs when you're about to push. Not every modification would create a tree/commit

0 replies

schacon · 2026-03-16T13:57:17Z

schacon
Mar 16, 2026
Maintainer Author

For "string" values, last timestamp wins (where the timestamp is stored doesn't seem to be described in the spec).

I may have misdocumented this. I thought the spec said that the remote value wins. However, if we did need to look at a timestamp, there is a timestamp in the first commit that introduces the new key.

0 replies

schacon · 2026-03-16T14:00:25Z

schacon
Mar 16, 2026
Maintainer Author

Automated retargeting of metadata when a commit is rewritten?

I think this would need to be the domain of something that uses this spec/library (so, GitButler). I don't think it would neccesarily be automatic - there are some things that you would not want to retarget (CI attestations, gpg signatures, etc), and some things you would (patch id? signoff? branch id?) and some things we may want to add to new commits automatically (previous version, etc).

0 replies

schacon · 2026-03-16T14:08:33Z

schacon
Mar 16, 2026
Maintainer Author

However, it is more integrated with other Git commands. git log can automatically display associated notes, and git can be configured to automatically rewrite notes when a commit is rewritten (from the documentation, currently rewriting through only amend and rebase seems to be supported). There is no automatic fetching/pushing of notes whenever its associated commit is fetched/pushed, though.

So most of the point of this spec is to try to figure out a more flexible and scalable system for transmitting metadata than notes.

I started with considerations of the problem set for vcs metadata that Rodrigo outlined in his JJ talk

Namely a system that can accommodate wider use cases than notes, for example:

granularity
- commit, changeset, blob, path, byte?
mutability
- mutable - locking
- appendable - comments
- immutable - attestations, provenance
historical relevance
- localize - single point in time,
- propagated - cumulative/mergeable as changes get made
exchange (push/pull)
- local - only relevant in the current copy
- exchanged - relevant to everyone
- partially exchanged - only within the org?

git notes is: per commit, immutable, localized and exchanged. It is so limited in granularity specifically that it's almost impossible to do anything interesting with that we want.

I was trying to design a primitive set of instructions and storage formats that would let GitButler implement the gmeta concept in a way that could accommodate most if not all of those use combinations.

0 replies

schacon · 2026-03-16T14:13:56Z

schacon
Mar 16, 2026
Maintainer Author

Even if not, it might be worth investigating if operating directly with the serialized form (instead of having a SQLite database) is possible, as that is simpler (only one data representation).

I didn't want to do this (although it would be possible) both because IO would be slower (especially value writes, since we would have to write a new tree and commit on every mutation) and also I would like to be able to have a data source that could locally combine multiple meta sources (ie, an internal company metadata ref and a public one). If we're operating directly out of the refs, we would have to check all of them whenever we do any key read, which would be slow and cumbersome I think.

0 replies

schacon · 2026-03-16T14:18:45Z

schacon
Mar 16, 2026
Maintainer Author

Possible hybrid solutions

I'm not particularly interested in hybrid solutions with notes because of a few reasons. One is that notes is very rarely used in my opinion and there aren't a lot of fans. I just don't think we need to do something that improves upon it or migrates data explicitly. People could easily use both at the same time if they had existing systems.

Since this is built for our own porcelains (but stuff), we don't get almost any advantage from trying to make this backwards compatible in some way. For example, it's unimportant that git commit can be taught to rewrite notes, since people cannot use git commit in our ecosystem anyhow. We'll have to build compatibility into but commit, but that's the whole point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jonathan's notes #1

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 8 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

jonathan's notes #1

Uh oh!

Uh oh!

schacon Mar 16, 2026 Maintainer

My summary of gmeta

Local storage

Serialization

Exchange

Automated retargeting of metadata when a commit is rewritten?

Comparison with git notes

Possible hybrid solutions

Conclusion

Replies: 8 comments

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

Uh oh!

schacon Mar 16, 2026 Maintainer Author

schacon
Mar 16, 2026
Maintainer

My summary of `gmeta`

Comparison with `git notes`

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author

schacon
Mar 16, 2026
Maintainer Author