|
| 1 | ++++ |
| 2 | +title = "git-data" |
| 3 | +subtitle = "Design Specification" |
| 4 | +version = "0.1.0" |
| 5 | +date = 2026-03-19 |
| 6 | +status = "Draft" |
| 7 | ++++ |
| 8 | + |
| 9 | +# git-data |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +git-data is a workspace of three crates that provide structured data primitives over Git refs. |
| 14 | + |
| 15 | +Git has refs, commits, trees, and blobs. |
| 16 | +It has no built-in concept of structured annotations on objects, versioned records outside of branches, or bidirectional relationships between refs. |
| 17 | +These three patterns recur in any system that uses Git as a data store. |
| 18 | + |
| 19 | +git-data provides them as independent, composable libraries. |
| 20 | + |
| 21 | + |
| 22 | +## Crates |
| 23 | + |
| 24 | +### git-metadata |
| 25 | + |
| 26 | +Structured annotations on existing Git objects. |
| 27 | + |
| 28 | +A metadata entry is keyed by the OID of the object it annotates. |
| 29 | +The annotated object exists independently; the metadata describes it. |
| 30 | +This extends Git's notes (which map OIDs to blobs) to map OIDs to trees, allowing multiple tools to attach named entries under the same OID without conflict. |
| 31 | + |
| 32 | +**Ref structure:** |
| 33 | + |
| 34 | +```text |
| 35 | +refs/metadata/<namespace> → commit → tree |
| 36 | + <oid-prefix>/ |
| 37 | + <oid-suffix>/ |
| 38 | + <entry-name> # blob: arbitrary content |
| 39 | +``` |
| 40 | + |
| 41 | +The two-level fanout by OID prevents pathological tree sizes, matching the pattern Git uses internally for loose objects. |
| 42 | + |
| 43 | +**Operations:** |
| 44 | + |
| 45 | +- `attach(oid, path, content)` — write a blob at `<oid>/<path>` in the metadata tree. |
| 46 | +- `read(oid, path)` — read a single entry. |
| 47 | +- `read_all(oid)` — list all entries for an object. |
| 48 | +- `remove(oid, path)` — delete an entry. |
| 49 | + |
| 50 | +Every write is a new commit on the metadata ref. |
| 51 | +The commit history is the audit log of all annotation changes. |
| 52 | + |
| 53 | +**Concurrency:** |
| 54 | + |
| 55 | +Two writers annotating different OIDs touch disjoint tree paths. |
| 56 | +A three-way tree merge resolves these automatically. |
| 57 | +Two writers annotating the same OID at different paths also merge cleanly. |
| 58 | +Conflict occurs only when two writers modify the same entry on the same OID simultaneously — the correct resolution is rejection and retry. |
| 59 | + |
| 60 | + |
| 61 | +### git-ledger |
| 62 | + |
| 63 | +Versioned records stored as refs. |
| 64 | + |
| 65 | +A ledger entry is a standalone ref with its own lifecycle. |
| 66 | +It is not metadata on any object — it is an independent record with a sequential ID, commit history as an audit log, and tree-structured state. |
| 67 | + |
| 68 | +**Ref structure:** |
| 69 | + |
| 70 | +```text |
| 71 | +refs/<namespace>/<id> → commit → tree |
| 72 | + <field> # blob: field value |
| 73 | + <field> |
| 74 | + <subdir>/ |
| 75 | + <field> |
| 76 | +``` |
| 77 | + |
| 78 | +Each record is its own ref. |
| 79 | +Two writers modifying different records never conflict. |
| 80 | + |
| 81 | +**ID assignment:** |
| 82 | + |
| 83 | +Sequential IDs are assigned by scanning `refs/<namespace>/` to find the highest existing ID and incrementing. |
| 84 | +The ref creation itself is the compare-and-swap: if another writer created the same ID, the push fails and the creator rescans and retries. |
| 85 | + |
| 86 | +No counter ref is required. |
| 87 | +The source of truth for "what IDs exist" is the refs themselves. |
| 88 | + |
| 89 | +At large scale (thousands of records), scanning all refs to find the max becomes expensive. |
| 90 | +An optional counter ref can serve as an acceleration structure, same as any other derived index — a performance optimization, not a correctness requirement. |
| 91 | +If the counter is lost or stale, a rescan rebuilds it. |
| 92 | + |
| 93 | +**Operations:** |
| 94 | + |
| 95 | +- `create(namespace, fields)` — scan for the next available ID, create a new ref with an initial commit containing the given tree. Retry on conflict. |
| 96 | +- `read(namespace, id)` — read the current tree at a record's ref. |
| 97 | +- `update(namespace, id, mutations)` — commit a new tree to the record's ref. The previous state is preserved in history. |
| 98 | +- `list(namespace)` — prefix scan over `refs/<namespace>/` to enumerate records. |
| 99 | +- `history(namespace, id)` — walk the commit chain on a record's ref. |
| 100 | + |
| 101 | +**Namespace scoping:** |
| 102 | + |
| 103 | +Namespaces partition records into independent groups, each with its own ref subtree: |
| 104 | + |
| 105 | +```text |
| 106 | +refs/<namespace>/<scope>/<id> |
| 107 | +``` |
| 108 | + |
| 109 | +Scopes are fully independent: no cross-scope contention on ID assignment or record writes. |
| 110 | + |
| 111 | + |
| 112 | +### git-links |
| 113 | + |
| 114 | +Bidirectional relationships between refs. |
| 115 | + |
| 116 | +A link connects two keys. |
| 117 | +It does not belong to either of them. |
| 118 | +Both directions are written in a single commit to a single ref, guaranteeing consistency without multi-ref atomicity. |
| 119 | + |
| 120 | +**Ref structure:** |
| 121 | + |
| 122 | +```text |
| 123 | +refs/<namespace> → commit → tree |
| 124 | + <key-a>/ |
| 125 | + <key-b> # blob: empty or optional metadata |
| 126 | + <key-b>/ |
| 127 | + <key-a> # blob: empty or optional metadata |
| 128 | +``` |
| 129 | + |
| 130 | +Keys are opaque path segments. |
| 131 | +The library does not interpret them. |
| 132 | +Consumers assign meaning. |
| 133 | + |
| 134 | +When metadata is absent, the tree entry points to the empty blob (`e69de29...`). |
| 135 | +Every metadata-free link shares this single object. |
| 136 | + |
| 137 | +**Operations:** |
| 138 | + |
| 139 | +- `link(a, b, metadata?)` — write both directions in one commit. |
| 140 | +- `unlink(a, b)` — remove both directions in one commit. |
| 141 | +- `linked(key)` — list all keys linked to this key (single tree read). |
| 142 | +- `is_linked(a, b)` — check existence (single tree entry lookup). |
| 143 | + |
| 144 | +**Concurrency:** |
| 145 | + |
| 146 | +Two writers linking disjoint key pairs touch disjoint tree paths. |
| 147 | +A three-way tree merge resolves these automatically. |
| 148 | +Conflict occurs only when two writers modify the same link simultaneously. |
| 149 | + |
| 150 | +**Ref ownership:** |
| 151 | + |
| 152 | +The `namespace` is caller-provided. |
| 153 | +The library owns no ref namespace. |
| 154 | +A consumer passes `"refs/links"` or `"refs/my-tool/links"` — the library does not care. |
| 155 | + |
| 156 | +**Example: forge issue linking.** |
| 157 | + |
| 158 | +Forge uses `git-links` with `refs/forge/links` as the namespace. |
| 159 | +Keys are type/ID strings that forge constructs; the library stores them verbatim. |
| 160 | + |
| 161 | +Linking issue 42 to review 7 and commit `abc123`: |
| 162 | + |
| 163 | +```rust |
| 164 | +let links = LinkStore::new(&repo, "refs/forge/links"); |
| 165 | + |
| 166 | +links.link("issue/42", "review/7", None, &sig)?; |
| 167 | +links.link("issue/42", "commit/abc123", None, &sig)?; |
| 168 | +``` |
| 169 | + |
| 170 | +This produces: |
| 171 | + |
| 172 | +```text |
| 173 | +refs/forge/links → commit → tree |
| 174 | + issue/42/ |
| 175 | + review/7 # empty blob |
| 176 | + commit/abc123 # empty blob |
| 177 | + review/7/ |
| 178 | + issue/42 # empty blob |
| 179 | + commit/abc123/ |
| 180 | + issue/42 # empty blob |
| 181 | +``` |
| 182 | + |
| 183 | +Querying everything linked to issue 42: |
| 184 | + |
| 185 | +```rust |
| 186 | +let related = links.linked("issue/42")?; |
| 187 | +// → ["review/7", "commit/abc123"] |
| 188 | +``` |
| 189 | + |
| 190 | +Querying the reverse — all issues referencing commit `abc123`: |
| 191 | + |
| 192 | +```rust |
| 193 | +let related = links.linked("commit/abc123")?; |
| 194 | +// → ["issue/42"] |
| 195 | +``` |
| 196 | + |
| 197 | +Both directions are tree reads. |
| 198 | +Forge parses the key strings to recover type and ID. |
| 199 | +The library never does. |
| 200 | + |
| 201 | + |
| 202 | +## Layering |
| 203 | + |
| 204 | +```text |
| 205 | +git (objects, refs, transport) |
| 206 | +├── git-metadata (annotations on objects) |
| 207 | +├── git-ledger (versioned records as refs) |
| 208 | +└── git-links (bidirectional relationships) |
| 209 | +``` |
| 210 | + |
| 211 | +The three crates are independent. |
| 212 | +None depends on another. |
| 213 | +A consumer may use any combination. |
| 214 | + |
| 215 | +The shared machinery — ref → commit → tree reads and writes, tree merging, commit signing — is either inlined or extracted to a shared internal crate if duplication warrants it. |
| 216 | +This is a code organization decision, not an architectural one. |
| 217 | + |
| 218 | + |
| 219 | +## What git-data Is Not |
| 220 | + |
| 221 | +git-data is not a framework. |
| 222 | +It imposes no schema, no workflow, no naming convention beyond ref structure. |
| 223 | + |
| 224 | +git-data does not run hooks or enforce policy. |
| 225 | +Consumers (forge, kiln, other tools) own domain logic. |
| 226 | + |
| 227 | +git-data does not handle transport. |
| 228 | +Push, fetch, and ref advertisement filtering are the consumer's responsibility. |
| 229 | + |
| 230 | +git-data does not handle merge strategy selection. |
| 231 | +It provides the primitives (tree reads, tree writes, atomic commits) that make auto-merge possible. |
| 232 | +The consumer decides when and how to merge. |
| 233 | + |
| 234 | + |
| 235 | +## Workspace Layout |
| 236 | + |
| 237 | +```text |
| 238 | +git-data/ |
| 239 | +├── Cargo.toml # workspace root |
| 240 | +├── crates/ |
| 241 | +│ ├── git-metadata/ |
| 242 | +│ │ ├── Cargo.toml |
| 243 | +│ │ └── src/ |
| 244 | +│ ├── git-ledger/ |
| 245 | +│ │ ├── Cargo.toml |
| 246 | +│ │ └── src/ |
| 247 | +│ └── git-links/ |
| 248 | +│ ├── Cargo.toml |
| 249 | +│ └── src/ |
| 250 | +``` |
| 251 | + |
| 252 | +Each crate publishes independently to crates.io. |
| 253 | +The workspace shares test infrastructure, CI, and release tooling. |
| 254 | + |
| 255 | + |
| 256 | +## CLI |
| 257 | + |
| 258 | +git-metadata ships a CLI as `git-metadata` (invoked as `git metadata`). |
| 259 | +It is the only crate with a CLI at this time. |
| 260 | + |
| 261 | +git-ledger and git-links are library-only. |
| 262 | +They may gain CLIs if direct human use outside of a consumer tool proves valuable. |
| 263 | +This is unlikely — the operations are meaningful only in the context of a specific schema, which the consumer defines. |
0 commit comments