[pull] master from git:master#56
Merged
pull[bot] merged 28 commits intoturkdevops:masterfrom May 31, 2025
Merged
Conversation
Ensure that logic added in 5f11669 (name-hash: don't add directories to name_hash, 2021-04-12) also applies in multithreaded hashtable init path. As per the original single-threaded change above: sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a 32 bit system "git multi-pack-index --repack --batch-size=120M"
failed with
fatal: size_t overflow: 6038786 * 1289
The calculation to estimated size of the objects in the pack referenced
by the multi-pack-index uses st_mult() to multiply the pack size by the
number of referenced objects before dividing by the total number of
objects in the pack. As size_t is 32 bits on 32 bit systems this
calculation easily overflows. Fix this by using 64bit arithmetic instead.
Also fix a potential overflow when caluculating the total size of the
objects referenced by the multipack index with a batch size larger
than SIZE_MAX / 2. In that case
total_size += estimated_size
can overflow as both total_size and estimated_size can be greater that
SIZE_MAX / 2. This is addressed by using saturating arithmetic for the
addition. Although estimated_size is of type uint64_t by the time we
reach this sum it is bounded by the batch size which is of type size_t
and so casting estimated_size to size_t does not truncate the value.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a 64 bit system the calculation
p->pack_size * pack_info[i].referenced_objects
could overflow. If a pack file contains 2^28 objects with an average
compressed size of 1KB then the pack size will be 2^38B. If all of the
objects are referenced by the multi-pack index the sum above will
overflow. Avoid this by using shifted integer arithmetic and changing
the order of the calculation so that the pack size is divided by the
total number of objects in the pack before multiplying by the number of
objects referenced by the multi-pack index. Using a shift of 14 bits
should give reasonable accuracy while avoiding overflow for pack sizes
less that 1PB.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
nth_midxed_pack_int_id() returns the index of the pack file in the multi pack index's list of packfiles that the specified object. The index is returned as a uint32_t. Storing this in an int will make the index negative if the most significant bit is set. Fix this by using uint32_t as the rest of the code does. This is unlikely to be a practical problem as it requires the multipack index to reference 2^31 packfiles. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify what happens when an object exists in more than one pack, but not in the preferred pack. "git multi-pack-index repack" relies on ties for objects that are not in the preferred pack being resolved in favor of the newest pack that contains a copy of the object. If ties were resolved in favor of the oldest pack as the current documentation suggests the multi-pack index would not reference any of the objects in the pack created by "git multi-pack-index repack". Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no test covering what commit 01aff0a (apply: correctly reverse patch's pre- and post-image mode bits, 2023-12-26) addressed. Prior to that commit, git apply was erroneously unaware of a file's expected mode while reverse-patching a file whose mode was not changing. Add the missing test coverage to assure that git apply is aware of the expected mode of a file being patched when the patch does not indicate that the file's mode is changing. This is achieved by arranging a file mode so that it doesn't agree with patch being applied, and checking git apply's output for the warning it's supposed to raise in this situation. Test in both reverse and normal (forward) directions. Signed-off-by: Mark Mentovai <mark@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 01aff0a (apply: correctly reverse patch's pre- and post-image mode bits, 2023-12-26) revised reverse_patches() to maintain the desired property that when only one of patch::old_mode and patch::new_mode is set, the mode will be carried in old_mode. That property is generally correct, with one notable exception: when creating a file, only new_mode will be set. Since reversing a deletion results in a creation, new_mode must be set in that case. Omitting handling for this case means that reversing a patch that removes an executable file will not result in the executable permission being set on the re-created file. Existing test coverage for file modes focuses only on mode changes of existing files. Swap old_mode and new_mode in reverse_patches() for what's represented in the patch as a file deletion, as it is transformed into a file creation under reversal. This causes git apply --reverse to set the executable permission properly when re-creating a deleted executable file. Add tests ensuring that git apply sets file modes correctly on file creation, both in the forward and reverse directions. Signed-off-by: Mark Mentovai <mark@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Function 'escapeRefName' introduced in 51a7e6d has never been used. Despite being dead code, changes in Perl 5.41.4 exposed precedence warning within its logic, which then caused test failures in t9402 by logging the warnings to stderr while parsing the code. The affected tests are t9402.30, t9402.31, t9402.32 and t9402.34. Remove this unused function to simplify the codebase and stop the warnings and test failures. Its corresponding unescapeRefName function, which remains in use, has had its comments updated. Reported-by: Jitka Plesnikova <jplesnik@redhat.com> Signed-off-by: Ondřej Pohořelský <opohorel@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also quote `#` in line with the modern formatting convention. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mention it in parentheses since we are in a configuration context. Refer to the default as such, not as “the” character. Also don’t mention `#` again; just say “comment character”. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split these out so that they are easier to search for.[1] [1]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/ Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document this option by copying the bullet list from git-stripspace(1).
A bullet list is cleaner when there are this many points to consider.
We also get a more standardized description of the multiple-blank-lines
behavior. Compare the repeating (git-notes(1)):
empty lines other than a single line between paragraphs
With (git-stripspace(1)):
multiple consecutive empty lines
And:
leading [...] whitespace
With:
empty lines from the beginning
Leading whitespace in the form of spaces (indentation) are not removed.
However, empty lines at the start of the message are removed.
Note that we drop the mentions of comment line handling because they are
wrong; this option does not control how lines which can be recognized as
comment lines are handled. Only interactivity controls that:
• Comment lines are stripped after editing interactively
• Lines which could be recognized as comment lines are left alone when
the message is given non-interactively
So it is misleading to document the comment line behavior on
this option.
Further, the text is wrong:
Lines starting with `#` will be stripped out in non-editor cases
like `-m`, [...]
Comment lines are still indirectly discussed on other options. We will
deal with them in the next commit.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cleaning up whitespace in metadata is typical porcelain behavior and
this default does not need to be pointed out.[1] Only speak up when
the default `--stripspace` is not used.
Also remove all misleading mentions of comment lines in the process;
see the previous commit.
Also remove the period that trails the parenthetical here.
† 1: See `-F` in git-commit(1) which has nothing to say about whitespace
cleanup. The cleanup discussion is on `--cleanup`.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clearly state when which of the regular and negated form of the option take effect.[1] Also mention the subtle behavior that occurs when you mix options like `-m` and `-C`, including a note that it might be fixed in the future. The topic was brought up on v8 of the `--separator` series.[2][3] [1]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/ [2]: https://lore.kernel.org/git/xmqq4jp326oj.fsf@gitster.g/ † 3: v11 was the version that landed Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike `remove --stdin`, this option cannot be combined with object names given via the command line. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4653801 (notes remove: --stdin reads from the standard input, 2011-05-18) added `--stdin` for the `remove` subcommand, documenting it in the “Options” section. But `copy --stdin` was added before that, in 160baa0 (notes: implement 'git notes copy --stdin', 2010-03-12). Treat this option equally between the two subcommands: • remove: mention `--stdin` on the subcommand as well, like for `copy` • copy: mention it as well under the option documentation Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitcli(7) recommends the *stuck form*. `--ref` is the only one which does not use it. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding a packfile to an object database we perform four syscalls:
- Three calls to access(3p) are done to check for auxiliary data
structures.
- One call to stat(3p) is done to check for the ".pack" itself.
One curious bit is that we perform the access(3p) calls before checking
for the packfile itself, but if the packfile doesn't exist we discard
all results. The access(3p) calls are thus essentially wasted, so one
may be triggered to reorder those calls so that we can short-circuit the
other syscalls in case the packfile does not exist.
The order in which we look up files is quite important though to help
avoid races:
- When installing a packfile we move auxiliary data structures into
place before we install the ".idx" file.
- When deleting a packfile we first delete the ".idx" and ".pack"
files before deleting auxiliary data structures.
As such, to avoid any races with concurrently created or deleted packs
we need to make sure that we _first_ read auxiliary data structures
before we read the corresponding ".idx" or ".pack" file. Otherwise it
may easily happen that we return a populated but misclassified pack.
Add a comment to `add_packed_git()` to make future readers aware of this
ordering requirement.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack index acts as a cache across a set of packfiles so that
we can quickly look up which of those packfiles contains a given object.
As such, the multi-pack index naturally needs to be updated every time
one of the packfiles goes away, or otherwise the multi-pack index has
grown stale.
A stale multi-pack index should be handled gracefully by Git though, and
in fact it is: if the indexed pack cannot be found we simply ignore it
and eventually we fall back to doing the object lookup by just iterating
through all packs, even if those aren't indexed.
But while this fallback works, it has one significant downside: we don't
cache the fact that a pack has vanished. This leads to us repeatedly
trying to look up the same pack only to realize that it (still) doesn't
exist.
This issue can be easily demonstrated by creating a repository with a
stale multi-pack index and a couple of objects. We do so by creating a
repository with two packfiles, both of which are indexed by the
multi-pack index, and then repack those two packfiles. Note that we have
to move the multi-pack-index before doing the final repack, as Git knows
to delete it otherwise.
$ git init repo
$ cd repo/
$ git config set maintenance.auto false
$ for i in $(seq 1000); do printf "%d-original" $i >file-$i; done
$ git add .
$ git commit -moriginal
$ git repack -dl
$ for i in $(seq 1000); do printf "%d-modified" $i >file-$i; done
$ git commit -a -mmodified
$ git repack -dl
$ git multi-pack-index write
$ mv .git/objects/pack/multi-pack-index .
$ git repack -Adl
$ mv multi-pack-index .git/objects/pack/
Commands that cause a lot of objects lookups will now repeatedly invoke
`add_packed_git()`, which leads to three failed access(3p) calls as well
as one failed stat(3p) call. The following strace for example is done
for `git log --patch` in the above repository:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
74.67 0.024693 1 18038 18031 access
25.33 0.008378 1 6045 6017 newfstatat
------ ----------- ----------- --------- --------- ----------------
100.00 0.033071 1 24083 24048 total
Fix the issue by introducing a negative lookup cache for indexed packs.
This cache works by simply storing an invalid pointer for a missing pack
when `prepare_midx_pack()` fails to look up the pack. Most users of the
`packs` array don't need to be adjusted, either, as they all know to
call `prepare_midx_pack()` before accessing the array.
With this change in place we can now see a significantly reduced number
of syscalls:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
73.58 0.000323 5 60 28 newfstatat
26.42 0.000116 5 23 16 access
------ ----------- ----------- --------- --------- ----------------
100.00 0.000439 5 83 44 total
Furthermore, this change also results in a speedup:
Benchmark 1: git log --patch (revision = HEAD~)
Time (mean ± σ): 50.4 ms ± 2.5 ms [User: 22.0 ms, System: 24.4 ms]
Range (min … max): 45.4 ms … 54.9 ms 53 runs
Benchmark 2: git log --patch (revision = HEAD)
Time (mean ± σ): 12.7 ms ± 0.4 ms [User: 11.1 ms, System: 1.6 ms]
Range (min … max): 12.4 ms … 15.0 ms 191 runs
Summary
git log --patch (revision = HEAD) ran
3.96 ± 0.22 times faster than git log --patch (revision = HEAD~)
In the end, it should in theory never be necessary to have this negative
lookup cache given that we know to update the multi-pack index together
with repacks. But as the change is quite contained and as the speedup
can be significant as demonstrated above, it does feel sensible to have
the negative lookup cache regardless.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since f93b2a0 (reftable/basics: introduce `REFTABLE_UNUSED` annotation, 2025-02-18), the reftable library was migrated to use an internal version of `UNUSED`, which unconditionally sets a GNU __attribute__ to avoid warnings function parameters that are not being used. Make the definition conditional to prevent breaking the build with non GNU compilers. Reported-by: "Randall S. Becker" <rsbecker@nexbridge.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build fix. * cb/reftable-unused-portability-fix: reftable: make REFTABLE_UNUSED C99 compatible
Integer overflow fix around code paths for "git multi-pack-index repack".. * pw/midx-repack-overflow-fix: midx docs: clarify tie breaking midx: avoid negative array index midx repack: avoid potential integer overflow on 64 bit systems midx repack: avoid integer overflow on 32 bit systems
Avoid adding directory path to a sparse-index tree entries to the name-hash, since they would bloat the hashtable without anybody querying for them. This was done already for a single threaded part of the code, but now the multi-threaded code also does the same. * am/sparse-index-name-hash-fix: name-hash: don't add sparse directories in threaded lazy init
Recent versions of Perl started warning against "! A =~ /pattern/" which does not negate the result of the matching. As it turns out that the problematic function is not even called, it was removed. * op/cvsserver-perl-warning: cvsserver: remove unused escapeRefName function
"git apply --index/--cached" when applying a deletion patch in reverse failed to give the mode bits of the path "removed" by the patch to the file it creates, which has been corrected. * mm/apply-reverse-mode-of-deleted-path: apply: set file mode when --reverse creates a deleted file t4129: test that git apply warns for unexpected mode changes
"git notes --help" documentation updates. * kh/notes-doc-fixes: doc: notes: use stuck form throughout doc: notes: treat --stdin equally between copy/remove doc: notes: point out copy --stdin use with argv doc: notes: clearly state that --stripspace is the default doc: notes: remove stripspace discussion from other options doc: notes: rework --[no-]stripspace doc: notes: split out options with negated forms doc: config: mention core.commentChar on commit.cleanup doc: stripspace: mention where the default comes from
When a stale .midx file refers to .pack files that no longer exist, we ended up checking for these non-existent files repeatedly, which has been optimized by memoizing the non-existence. * ps/midx-negative-packfile-cache: midx: stop repeatedly looking up nonexistent packfiles packfile: explain ordering of how we look up auxiliary pack files
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )