Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Documentation/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ TECH_DOCS += technical/bundle-uri
TECH_DOCS += technical/commit-graph
TECH_DOCS += technical/directory-rename-detection
TECH_DOCS += technical/hash-function-transition
TECH_DOCS += technical/large-object-promisors
TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
Expand Down
24 changes: 24 additions & 0 deletions Documentation/RelNotes/2.52.0.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ UI, Workflows & Features
(e.g. blame.ignorerevsfile) can be marked as optional by prefixing
":(optoinal)" before its value.

* Show 'P'ipe command in "git add -p".


Performance, Internal Implementation, Development Support etc.
--------------------------------------------------------------
Expand Down Expand Up @@ -133,6 +135,9 @@ Performance, Internal Implementation, Development Support etc.

* The beginning of SHA1-SHA256 interoperability work.

* Build procedure for a few credential helpers (in contrib/) have
been updated.


Fixes since v2.51
-----------------
Expand Down Expand Up @@ -352,6 +357,25 @@ including security updates, are included in this release.
corrected.
(merge c0bec06cfe jk/diff-no-index-with-pathspec-fix later to maint).

* The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things. The command has been
updated to color all elements consistently in such a case.
(merge 50927f4f68 jk/status-z-short-fix later to maint).

* Unicode width table update.
(merge 330a54099e tb/unicode-width-table-17 later to maint).

* GPG signing test set-up has been broken for a year, which has been
corrected.
(merge 516bf45749 jc/t1016-setup-fix later to maint).

* Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".
(merge b7fb2194b9 ps/t7528-ssh-agent-uds-workaround later to maint).

* Other code cleanup, docfix, build fix, etc.
(merge 823d537fa7 kh/doc-git-log-markup-fix later to maint).
(merge cf7efa4f33 rj/t6137-cygwin-fix later to maint).
Expand Down
1 change: 1 addition & 0 deletions Documentation/git-add.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,7 @@ patch::
s - split the current hunk into smaller hunks
e - manually edit the current hunk
p - print the current hunk
P - print the current hunk using the pager
? - print help
+
After deciding the fate for all hunks, if there is any hunk
Expand Down
29 changes: 19 additions & 10 deletions Documentation/technical/commit-graph.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph:
Values 1-4 satisfy the requirements of parse_commit_gently().

There are two definitions of generation number:

1. Corrected committer dates (generation number v2)
2. Topological levels (generation number v1)

Expand Down Expand Up @@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs,
we enable fast writes of new commit data without rewriting the entire commit
history -- at least, most of the time.

## File Layout
File Layout
~~~~~~~~~~~

A commit-graph chain uses multiple files, and we use a fixed naming convention
to organize these files. Each commit-graph file has a name
Expand All @@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest".

For example, if the `commit-graph-chain` file contains the lines

```
----
{hash0}
{hash1}
{hash2}
```
----

then the commit-graph chain looks like the following diagram:

Expand Down Expand Up @@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example,
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
`{hash0}` and `{hash1}`.

## Merging commit-graph files
Merging commit-graph files
~~~~~~~~~~~~~~~~~~~~~~~~~~

If we only added a new commit-graph file on every write, we would run into a
linear search problem through many commit-graph files. Instead, we use a merge
Expand All @@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
file.

....
+---------------------+
| |
| (new commits) |
Expand All @@ -250,21 +254,23 @@ file.
| |
| |
+-----------------------+
....

During this process, the commits to write are combined, sorted and we write the
contents to a temporary file, all while holding a `commit-graph-chain.lock`
lock-file. When the file is flushed, we rename it to `graph-{hash3}`
according to the computed `{hash3}`. Finally, we write the new chain data to
`commit-graph-chain.lock`:

```
----
{hash3}
{hash0}
```
----

We then close the lock-file.

## Merge Strategy
Merge Strategy
~~~~~~~~~~~~~~

When writing a set of commits that do not exist in the commit-graph stack of
height N, we default to creating a new file at level N + 1. We then decide to
Expand All @@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
number of commits) could be extracted into config settings for full
flexibility.

## Handling Mixed Generation Number Chains
Handling Mixed Generation Number Chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With the introduction of generation number v2 and generation data chunk, the
following scenario is possible:
Expand Down Expand Up @@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus,
rewriting split commit-graph as a single file (`--split=replace`) creates a
single layer with corrected commit dates.

## Deleting graph-{hash} files
Deleting graph-\{hash\} files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After a new tip file is written, some `graph-{hash}` files may no longer
be part of a chain. It is important to remove these files from disk, eventually.
Expand All @@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window
defaults to zero, but can be changed using command-line arguments or a config
setting.

## Chains across multiple object directories
Chains across multiple object directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In a repo with alternates, we look for the `commit-graph-chain` file starting
in the local object directory and then in each alternate. The first file that
Expand Down
64 changes: 32 additions & 32 deletions Documentation/technical/large-object-promisors.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in:

https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/

0) Non goals
------------
Non goals
---------

- We will not discuss those client side improvements here, as they
would require changes in different parts of Git than this effort.
Expand Down Expand Up @@ -90,8 +90,8 @@ later in this document:
even more to host content with larger blobs or more large blobs
than currently.

I) Issues with the current situation
------------------------------------
I Issues with the current situation
-----------------------------------

- Some statistics made on GitLab repos have shown that more than 75%
of the disk space is used by blobs that are larger than 1MB and
Expand Down Expand Up @@ -138,8 +138,8 @@ I) Issues with the current situation
complaining that these tools require significant effort to set up,
learn and use correctly.

II) Main features of the "Large Object Promisors" solution
----------------------------------------------------------
II Main features of the "Large Object Promisors" solution
---------------------------------------------------------

The main features below should give a rough overview of how the
solution may work. Details about needed elements can be found in
Expand All @@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the
other objects.

Note 1
++++++
^^^^^^

To clarify, a LOP is a normal promisor remote, except that:

Expand All @@ -178,21 +178,21 @@ To clarify, a LOP is a normal promisor remote, except that:
itself.

Note 2
++++++
^^^^^^

Git already makes it possible for a main remote to also be a promisor
remote storing both regular objects and large blobs for a client that
clones from it with a filter on blob size. But here we explicitly want
to avoid that.

Rationale
+++++++++
^^^^^^^^^

LOPs aim to be good at handling large blobs while main remotes are
already good at handling other objects.

Implementation
++++++++++++++
^^^^^^^^^^^^^^

Git already has support for multiple promisor remotes, see
link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation].
Expand All @@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the
underlying object storage appear like a remote to Git.

Note
++++
^^^^

A LOP can be a promisor remote accessed using a remote helper by
both some clients and the main remote.

Rationale
+++++++++
^^^^^^^^^

This looks like the simplest way to create LOPs that can cheaply
handle many large blobs.

Implementation
++++++++++++++
^^^^^^^^^^^^^^

Remote helpers are quite easy to write as shell scripts, but it might
be more efficient and maintainable to write them using other languages
Expand All @@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as
storage for large files handled by Git LFS.

Rationale
+++++++++
^^^^^^^^^

This would simplify the server side if it wants to both use a LOP and
act as a Git LFS server.
Expand All @@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a
LOP all its blobs with a size over a configurable threshold.

Rationale
+++++++++
^^^^^^^^^

This makes it easy to set things up and to clean things up. For
example, an admin could use this to manually convert a repo not using
Expand All @@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this
to regularly make sure the large blobs are moved to the LOP.

Implementation
++++++++++++++
^^^^^^^^^^^^^^

Using something based on `git repack --filter=...` to separate the
blobs we want to offload from the other Git objects could be a good
Expand All @@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also
perhaps pushed, into it.

Rationale
+++++++++
^^^^^^^^^

A main remote containing many oversize blobs would defeat the purpose
of LOPs.

Implementation
++++++++++++++
^^^^^^^^^^^^^^

The way to offload to a LOP discussed in 4) above can be used to
regularly offload oversize blobs. About preventing oversize blobs from
Expand Down Expand Up @@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to
fetch those blobs from the LOP to be able to serve the client.

Note
++++
^^^^

For fetches instead of clones, a protocol negotiation might not always
happen, see the "What about fetches?" FAQ entry below for details.

Rationale
+++++++++
^^^^^^^^^

Security, configurability and efficiency of setting things up.

Implementation
++++++++++++++
^^^^^^^^^^^^^^

A "promisor-remote" protocol v2 capability looks like a good way to
implement this. The way the client and server use this capability
Expand All @@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched,
but might not need anymore, to the LOP.

Note
++++
^^^^

It might depend on the context if it should be OK or not for clients
to offload large blobs they have created, instead of fetched, directly
Expand All @@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to
implementing this feature.

Rationale
+++++++++
^^^^^^^^^

On the client, the easiest way to deal with unneeded large blobs is to
offload them.

Implementation
++++++++++++++
^^^^^^^^^^^^^^

This is very similar to what 4) above is about, except on the client
side instead of the server side. So a good solution to 4) could likely
Expand All @@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from
a LOP, it is likely, and can easily be confirmed, that the LOP still
has them, so that they can just be removed from the client.

III) Benefits of using LOPs
---------------------------
III Benefits of using LOPs
--------------------------

Many benefits are related to the issues discussed in "I) Issues with
the current situation" above:
Expand All @@ -406,8 +406,8 @@ the current situation" above:

- Reduced storage needs on the client side.

IV) FAQ
-------
IV FAQ
------

What about using multiple LOPs on the server and client side?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are
on a promisor remote.

Regular fetch
+++++++++++++
^^^^^^^^^^^^^

In a regular fetch, the client will contact the main remote and a
protocol negotiation will happen between them. It's a good thing that
Expand All @@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch,
using, or not using, the same LOP(s) as last time.

"Backfill" or "lazy" fetch
++++++++++++++++++++++++++
^^^^^^^^^^^^^^^^^^^^^^^^^^

When there is a backfill fetch, the client doesn't necessarily contact
the main remote first. It will try to fetch from its promisor remotes
Expand All @@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the
token when performing a protocol negotiation with the main remote (see
section II.6 above).

V) Future improvements
----------------------
V Future improvements
---------------------

It is expected that at the beginning using LOPs will be mostly worth
it either in a corporate context where the Git version that clients
Expand Down
Loading