Skip to content

[pull] master from git:master#81

Merged
pull[bot] merged 11 commits intoturkdevops:masterfrom
git:master
Jul 24, 2025
Merged

[pull] master from git:master#81
pull[bot] merged 11 commits intoturkdevops:masterfrom
git:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Jul 24, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

chriscool and others added 11 commits July 9, 2025 16:08
A recent commit, d9cb0e6 (fast-export, fast-import: add support for
signed-commits, 2025-03-10), added support for signed commits to
fast-export and fast-import.

When a signed commit is processed, fast-export can output either
"gpgsig sha1" or "gpgsig sha256" depending on whether the signed
commit uses the SHA-1 or SHA-256 Git object format.

However, this implementation has a number of limitations:

  - the output format was not properly described in the documentation,
  - the output format is not very informative as it doesn't even say
    if the signature is an OpenPGP, an SSH, or an X509 signature,
  - the implementation doesn't support having both one signature on
    the SHA-1 object and one on the SHA-256 object.

Let's improve on these limitations by improving fast-export and
fast-import so that:

  - all the signatures are exported,
  - at most one signature on the SHA-1 object and one on the SHA-256
    are imported,
  - if there is more than one signature on the SHA-1 object or on
    the SHA-256 object, fast-import emits a warning for each
    additional signature,
  - the output format is "gpgsig <git-hash-algo> <signature-format>",
    where <git-hash-algo> is the Git object format as before, and
    <signature-format> is the signature type ("openpgp", "x509",
    "ssh" or "unknown"),
  - the output is properly documented.

About the output format:

  - <git-hash-algo> allows to know which representation of the commit
    was signed (the SHA-1 or the SHA-256 version) which helps with
    both signature verification and interoperability between repos
    with different hash functions,

  - <signature-format> helps tools that process the fast-export
    stream, so they don't have to parse the ASCII armor to identify
    the signature type.

It could be even better to be able to import more than one signature
on the SHA-1 object and on the SHA-256 object, but other parts of
Git don't handle that well for now, so this is left for future
improvements.

Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compiling Git fails on Amazon Linux 2 when using GCC 7.3.1 with the
following compiler error:

    In file included from compat/posix.h:449:0,
                     from git-compat-util.h:26,
                     from daemon.c:3:
    compat/../sane-ctype.h:29:60: error: expected expression before ']' token
     #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
                                                                ^
    compat/../sane-ctype.h:29:72: error: expected ')' before '!=' token
     #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
                                                                            ^
    compat/../sane-ctype.h:29:60: error: expected expression before ']' token
     #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
                                                                ^
    ... lots of similar lines ...

    compat/../sane-ctype.h:45:50: error: expected declaration specifiers or '...' before numeric constant
     #define toupper(x) sane_case((unsigned char)(x), 0)
                                                      ^
    /usr/include/ctype.h:142:12: error: expected identifier or '(' before 'int'
     extern int isascii (int __c) __THROW;
                ^
    compat/../sane-ctype.h:30:26: error: expected ')' before '&' token
     #define isascii(x) (((x) & ~0x7f) == 0)
                              ^
    compat/../sane-ctype.h:30:35: error: expected ')' before '==' token
     #define isascii(x) (((x) & ~0x7f) == 0)
                                       ^
    In file included from /usr/include/features.h:423:0,
                     from /usr/include/unistd.h:25,
                     from compat/posix.h:90,
                     from git-compat-util.h:26,
                     from daemon.c:3:
    compat/../sane-ctype.h:44:30: error: expected declaration specifiers or '...' before '(' token
     #define tolower(x) sane_case((unsigned char)(x), 0x20)
                                  ^
    compat/../sane-ctype.h:44:50: error: expected declaration specifiers or '...' before numeric constant
     #define tolower(x) sane_case((unsigned char)(x), 0x20)
                                                      ^
    compat/../sane-ctype.h:45:30: error: expected declaration specifiers or '...' before '(' token
     #define toupper(x) sane_case((unsigned char)(x), 0)
                                  ^
    compat/../sane-ctype.h:45:50: error: expected declaration specifiers or '...' before numeric constant
     #define toupper(x) sane_case((unsigned char)(x), 0)
                                                      ^

This error bisect back to 75a044f (git-compat-util.h: split out
POSIX-emulating bits, 2025-02-18), where lots of bits got split out of
"git-compat-util.h" into a new "compat/posix.h" header.

The compiler error isn't immediately obvious, doubly so because the
actual errors are ~3x as long as the above snippet. But what happens
here is that we transitively include <ctype.h> after we have included
our own "sane-ctype.h" header. Consequently, the function declarations
that exist in <ctype.h> for isascii(3p) et al will be mangled by our
macros of the same type. The result is of course completely broken.

It's unclear why this issue only happens on Amazon Linux 2. My guess is
that it's either specific to the compiler version or specific to the
glibc version. We don't explicitly include <ctypes.h> anywhere, but it's
being transitively included. So chances are that later versions of the
toolchain reorganized their headers so that <ctypes.h> is not included
transitively anymore.

Fix the issue by explicitly including <ctype.h> in "sane-ctype.h". This
ensures that the header guards will be activated and that any subsequent
include of the same header will become a no-op. With this we can then
safely override the function declarations with our own macros.

Reported-by: Stan Hu <stanhu@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In bloom.h, murmur3_seeded_v2() is exported for the use of test murmur3
hash. To clarify that murmur3_seeded_v2() is exported solely for testing
purposes, a new helper function test_murmur3_seeded() was added instead
of exporting murmur3_seeded_v2() directly.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git code style requires that functions operating on a struct S
should be named in the form S_verb. However, the functions operating
on struct bloom_key do not follow this convention. Therefore,
fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill()
and bloom_key_clear(), respectively.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, we stored bloom keys in a flat array and marked a commit
as NOT TREESAME if any key reported "definitely not changed".

To support multiple pathspec items, we now require that for each
pathspec item, there exists a bloom key reporting "definitely not
changed".

This "for every" condition makes a flat array insufficient, so we
introduce a new structure to group keys by a single pathspec item.
`struct bloom_keyvec` is introduced to replace `struct bloom_key *`
and `bloom_key_nr`. And because we want to support multiple pathspec
items, we added a bloom_keyvec * and a bloom_keyvec_nr field to
`struct rev_info` to represent an array of bloom_keyvecs. This commit
still optimize only one pathspec item, thus bloom_keyvec_nr can only
be 0 or 1.

New bloom_keyvec_* functions are added to create and destroy a keyvec.
bloom_filter_contains_vec() is added to check if all key in keyvec is
contained in a bloom filter.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing to use bloom filters in a revision walk, Git populates a
boom_keyvec with an array of bloom keys for the components of a path.
Before we create the ability to map multiple pathspecs to multiple
bloom_keyvecs, extract the conversion from a pathspec to a bloom_keyvec
into its own helper method. This simplifies the state that persists in
prepare_to_use_bloom_filter() as well as makes the future change much
simpler.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To enable optimize multiple pathspec items in revision traversal,
return 0 if all pathspec item is literal in forbid_bloom_filters().
Add for loops to initialize and check each pathspec item's bloom_keyvec
when optimization is possible.

Add new test cases in t/t4216-log-bloom.sh to ensure
 - consistent results between the optimization for multiple pathspec
   items using bloom filter and the case without bloom filter
   optimization.
 - does not use bloom filter if any pathspec item is not literal.

With these optimizations, we get some improvements for multi-pathspec runs
of 'git log'. First, in the Git repository we see these modest results:

Benchmark 1: old
 Time (mean ± σ):      73.1 ms ±   2.9 ms
 Range (min … max):    69.9 ms …  84.5 ms    42 runs

Benchmark 2: new
 Time (mean ± σ):      55.1 ms ±   2.9 ms
 Range (min … max):    51.1 ms …  61.2 ms    52 runs

Summary
 'new' ran
   1.33 ± 0.09 times faster than 'old'

But in a larger repo, such as the LLVM project repo below, we get even
better results:

Benchmark 1: old
 Time (mean ± σ):      1.974 s ±  0.006 s
 Range (min … max):    1.960 s …  1.983 s    10 runs

Benchmark 2: new
 Time (mean ± σ):     262.9 ms ±   2.4 ms
 Range (min … max):   257.7 ms … 266.2 ms    11 runs

Summary
 'new' ran
   7.51 ± 0.07 times faster than 'old'

Signed-off-by: Derrick Stolee <stolee@gmail.com>
[ly: rename convert_pathspec_to_filter() to convert_pathspec_to_bloom_keyvec()]
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lift the limitation to use changed-path filter in "git log" so that
it can be used for a pathspec with multiple literal paths.

* ly/changed-paths-traversal:
  bloom: optimize multiple pathspec items in revision
  revision: make helper for pathspec to bloom keyvec
  bloom: replace struct bloom_key * with struct bloom_keyvec
  bloom: rename function operates on bloom_key
  bloom: add test helper to return murmur3 hash
Our <sane-ctype.h> header file relied on that the system-supplied
<ctype.h> header is not later included, which would override our
macro definitions, but "amazon linux" broke this assumption.  Fix
this by preemptively including <ctype.h> near the beginning of
<sane-ctype.h> ourselves.

* ps/sane-ctype-workaround:
  sane-ctype: fix compiler error on Amazon Linux 2
Clean up the way how signature on commit objects are exported to
and imported from fast-import stream.

* cc/fast-import-export-signature-names:
  fast-(import|export): improve on commit signature output format
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot locked and limited conversation to collaborators Jul 24, 2025
@pull pull bot added the ⤵️ pull label Jul 24, 2025
@pull pull bot merged commit 97e14d9 into turkdevops:master Jul 24, 2025
2 of 3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants