From b5b3ddbe5c56c7ded95e7c47c985dc6d61f73ea0 Mon Sep 17 00:00:00 2001 From: Christian Couder Date: Wed, 9 Jul 2025 16:12:53 +0200 Subject: [PATCH 1/8] fast-(import|export): improve on commit signature output format A recent commit, d9cb0e6ff8 (fast-export, fast-import: add support for signed-commits, 2025-03-10), added support for signed commits to fast-export and fast-import. When a signed commit is processed, fast-export can output either "gpgsig sha1" or "gpgsig sha256" depending on whether the signed commit uses the SHA-1 or SHA-256 Git object format. However, this implementation has a number of limitations: - the output format was not properly described in the documentation, - the output format is not very informative as it doesn't even say if the signature is an OpenPGP, an SSH, or an X509 signature, - the implementation doesn't support having both one signature on the SHA-1 object and one on the SHA-256 object. Let's improve on these limitations by improving fast-export and fast-import so that: - all the signatures are exported, - at most one signature on the SHA-1 object and one on the SHA-256 are imported, - if there is more than one signature on the SHA-1 object or on the SHA-256 object, fast-import emits a warning for each additional signature, - the output format is "gpgsig ", where is the Git object format as before, and is the signature type ("openpgp", "x509", "ssh" or "unknown"), - the output is properly documented. About the output format: - allows to know which representation of the commit was signed (the SHA-1 or the SHA-256 version) which helps with both signature verification and interoperability between repos with different hash functions, - helps tools that process the fast-export stream, so they don't have to parse the ASCII armor to identify the signature type. It could be even better to be able to import more than one signature on the SHA-1 object and on the SHA-256 object, but other parts of Git don't handle that well for now, so this is left for future improvements. Helped-by: brian m. carlson Helped-by: Elijah Newren Signed-off-by: Christian Couder Signed-off-by: Junio C Hamano --- Documentation/git-fast-export.adoc | 17 +++++ Documentation/git-fast-import.adoc | 38 ++++++++-- builtin/fast-export.c | 62 ++++++++++++---- builtin/fast-import.c | 113 +++++++++++++++++++++++------ gpg-interface.c | 12 +++ gpg-interface.h | 12 +++ t/t9350-fast-export.sh | 102 +++++++++++++++++++++++++- 7 files changed, 312 insertions(+), 44 deletions(-) diff --git a/Documentation/git-fast-export.adoc b/Documentation/git-fast-export.adoc index 43bbb4f63ca6bf..297b57bb2efdc2 100644 --- a/Documentation/git-fast-export.adoc +++ b/Documentation/git-fast-export.adoc @@ -50,6 +50,23 @@ resulting tag will have an invalid signature. is the same as how earlier versions of this command without this option behaved. + +When exported, a signature starts with: ++ +gpgsig ++ +where is the Git object hash so either "sha1" or +"sha256", and is the signature type, so "openpgp", +"x509", "ssh" or "unknown". ++ +For example, an OpenPGP signature on a SHA-1 commit starts with +`gpgsig sha1 openpgp`, while an SSH signature on a SHA-256 commit +starts with `gpgsig sha256 ssh`. ++ +While all the signatures of a commit are exported, an importer may +choose to accept only some of them. For example +linkgit:git-fast-import[1] currently stores at most one signature per +Git hash algorithm in each commit. ++ NOTE: This is highly experimental and the format of the data stream may change in the future without compatibility guarantees. diff --git a/Documentation/git-fast-import.adoc b/Documentation/git-fast-import.adoc index 250d8666521781..d2327842003879 100644 --- a/Documentation/git-fast-import.adoc +++ b/Documentation/git-fast-import.adoc @@ -445,7 +445,7 @@ one). original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF - ('gpgsig' SP LF data)? + ('gpgsig' SP SP LF data)? ('encoding' SP LF)? data ('from' SP LF)? @@ -518,13 +518,39 @@ their syntax. ^^^^^^^^ The optional `gpgsig` command is used to include a PGP/GPG signature -that signs the commit data. +or other cryptographic signature that signs the commit data. -Here specifies which hashing algorithm is used for this -signature, either `sha1` or `sha256`. +.... + 'gpgsig' SP SP LF data +.... + +The `gpgsig` command takes two arguments: + +* `` specifies which Git object format this signature + applies to, either `sha1` or `sha256`. This allows to know which + representation of the commit was signed (the SHA-1 or the SHA-256 + version) which helps with both signature verification and + interoperability between repos with different hash functions. + +* `` specifies the type of signature, such as + `openpgp`, `x509`, `ssh`, or `unknown`. This is a convenience for + tools that process the stream, so they don't have to parse the ASCII + armor to identify the signature type. + +A commit may have at most one signature for the SHA-1 object format +(stored in the "gpgsig" header) and one for the SHA-256 object format +(stored in the "gpgsig-sha256" header). + +See below for a detailed description of the `data` command which +contains the raw signature data. + +Signatures are not yet checked in the current implementation +though. (Already setting the `extensions.compatObjectFormat` +configuration option might help with verifying both SHA-1 and SHA-256 +object format signatures when it will be implemented.) -NOTE: This is highly experimental and the format of the data stream may -change in the future without compatibility guarantees. +NOTE: This is highly experimental and the format of the `gpgsig` +command may change in the future without compatibility guarantees. `encoding` ^^^^^^^^^^ diff --git a/builtin/fast-export.c b/builtin/fast-export.c index fcf6b00d5fe10a..7b4e6a6e41279a 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -29,6 +29,7 @@ #include "quote.h" #include "remote.h" #include "blob.h" +#include "gpg-interface.h" static const char *const fast_export_usage[] = { N_("git fast-export []"), @@ -652,6 +653,38 @@ static const char *find_commit_multiline_header(const char *msg, return strbuf_detach(&val, NULL); } +static void print_signature(const char *signature, const char *object_hash) +{ + if (!signature) + return; + + printf("gpgsig %s %s\ndata %u\n%s\n", + object_hash, + get_signature_format(signature), + (unsigned)strlen(signature), + signature); +} + +static const char *append_signatures_for_header(struct string_list *signatures, + const char *pos, + const char *header, + const char *object_hash) +{ + const char *signature; + const char *start = pos; + const char *end = pos; + + while ((signature = find_commit_multiline_header(start + 1, + header, + &end))) { + string_list_append(signatures, signature)->util = (void *)object_hash; + free((char *)signature); + start = end; + } + + return end; +} + static void handle_commit(struct commit *commit, struct rev_info *rev, struct string_list *paths_of_changed_objects) { @@ -660,7 +693,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, const char *author, *author_end, *committer, *committer_end; const char *encoding = NULL; size_t encoding_len; - const char *signature_alg = NULL, *signature = NULL; + struct string_list signatures = STRING_LIST_INIT_DUP; const char *message; char *reencoded = NULL; struct commit_list *p; @@ -700,10 +733,11 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, } if (*commit_buffer_cursor == '\n') { - if ((signature = find_commit_multiline_header(commit_buffer_cursor + 1, "gpgsig", &commit_buffer_cursor))) - signature_alg = "sha1"; - else if ((signature = find_commit_multiline_header(commit_buffer_cursor + 1, "gpgsig-sha256", &commit_buffer_cursor))) - signature_alg = "sha256"; + const char *after_sha1 = append_signatures_for_header(&signatures, commit_buffer_cursor, + "gpgsig", "sha1"); + const char *after_sha256 = append_signatures_for_header(&signatures, commit_buffer_cursor, + "gpgsig-sha256", "sha256"); + commit_buffer_cursor = (after_sha1 > after_sha256) ? after_sha1 : after_sha256; } message = strstr(commit_buffer_cursor, "\n\n"); @@ -769,30 +803,30 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, printf("%.*s\n%.*s\n", (int)(author_end - author), author, (int)(committer_end - committer), committer); - if (signature) { + if (signatures.nr) { switch (signed_commit_mode) { case SIGN_ABORT: die("encountered signed commit %s; use " "--signed-commits= to handle it", oid_to_hex(&commit->object.oid)); case SIGN_WARN_VERBATIM: - warning("exporting signed commit %s", - oid_to_hex(&commit->object.oid)); + warning("exporting %"PRIuMAX" signature(s) for commit %s", + (uintmax_t)signatures.nr, oid_to_hex(&commit->object.oid)); /* fallthru */ case SIGN_VERBATIM: - printf("gpgsig %s\ndata %u\n%s", - signature_alg, - (unsigned)strlen(signature), - signature); + for (size_t i = 0; i < signatures.nr; i++) { + struct string_list_item *item = &signatures.items[i]; + print_signature(item->string, item->util); + } break; case SIGN_WARN_STRIP: - warning("stripping signature from commit %s", + warning("stripping signature(s) from commit %s", oid_to_hex(&commit->object.oid)); /* fallthru */ case SIGN_STRIP: break; } - free((char *)signature); + string_list_clear(&signatures, 0); } if (!reencoded && encoding) printf("encoding %.*s\n", (int)encoding_len, encoding); diff --git a/builtin/fast-import.c b/builtin/fast-import.c index b2839c5f439b7b..b51a17b95a85d0 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -29,6 +29,7 @@ #include "commit-reach.h" #include "khash.h" #include "date.h" +#include "gpg-interface.h" #define PACK_ID_BITS 16 #define MAX_PACK_ID ((1<hash_algo is freed */ + char *space = strchr(args, ' '); + + if (!space) + die("Expected gpgsig format: 'gpgsig ', " + "got 'gpgsig %s'", args); + *space = '\0'; + + sig->hash_algo = args; + sig->sig_format = space + 1; + + /* Validate hash algorithm */ + if (strcmp(sig->hash_algo, "sha1") && + strcmp(sig->hash_algo, "sha256")) + die("Unknown git hash algorithm in gpgsig: '%s'", sig->hash_algo); + + /* Validate signature format */ + if (!valid_signature_format(sig->sig_format)) + die("Invalid signature format in gpgsig: '%s'", sig->sig_format); + if (!strcmp(sig->sig_format, "unknown")) + warning("'unknown' signature format in gpgsig"); + + /* Read signature data */ + read_next_command(); + parse_data(&sig->data, 0, NULL); +} + +static void add_gpgsig_to_commit(struct strbuf *commit_data, + const char *header, + struct signature_data *sig) +{ + struct string_list siglines = STRING_LIST_INIT_NODUP; + + if (!sig->hash_algo) + return; + + strbuf_addstr(commit_data, header); + string_list_split_in_place(&siglines, sig->data.buf, "\n", -1); + strbuf_add_separated_string_list(commit_data, "\n ", &siglines); + strbuf_addch(commit_data, '\n'); + string_list_clear(&siglines, 1); + strbuf_release(&sig->data); + free(sig->hash_algo); +} + +static void store_signature(struct signature_data *stored_sig, + struct signature_data *new_sig, + const char *hash_type) +{ + if (stored_sig->hash_algo) { + warning("multiple %s signatures found, " + "ignoring additional signature", + hash_type); + strbuf_release(&new_sig->data); + free(new_sig->hash_algo); + } else { + *stored_sig = *new_sig; + } +} + static void parse_new_commit(const char *arg) { - static struct strbuf sig = STRBUF_INIT; static struct strbuf msg = STRBUF_INIT; - struct string_list siglines = STRING_LIST_INIT_NODUP; + struct signature_data sig_sha1 = { NULL, NULL, STRBUF_INIT }; + struct signature_data sig_sha256 = { NULL, NULL, STRBUF_INIT }; struct branch *b; char *author = NULL; char *committer = NULL; - char *sig_alg = NULL; char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; @@ -2750,13 +2818,23 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); - if (skip_prefix(command_buf.buf, "gpgsig ", &v)) { - sig_alg = xstrdup(v); - read_next_command(); - parse_data(&sig, 0, NULL); + + /* Process signatures (up to 2: one "sha1" and one "sha256") */ + while (skip_prefix(command_buf.buf, "gpgsig ", &v)) { + struct signature_data sig = { NULL, NULL, STRBUF_INIT }; + + parse_one_signature(&sig, v); + + if (!strcmp(sig.hash_algo, "sha1")) + store_signature(&sig_sha1, &sig, "SHA-1"); + else if (!strcmp(sig.hash_algo, "sha256")) + store_signature(&sig_sha256, &sig, "SHA-256"); + else + BUG("parse_one_signature() returned unknown hash algo"); + read_next_command(); - } else - strbuf_setlen(&sig, 0); + } + if (skip_prefix(command_buf.buf, "encoding ", &v)) { encoding = xstrdup(v); read_next_command(); @@ -2830,23 +2908,14 @@ static void parse_new_commit(const char *arg) strbuf_addf(&new_data, "encoding %s\n", encoding); - if (sig_alg) { - if (!strcmp(sig_alg, "sha1")) - strbuf_addstr(&new_data, "gpgsig "); - else if (!strcmp(sig_alg, "sha256")) - strbuf_addstr(&new_data, "gpgsig-sha256 "); - else - die("Expected gpgsig algorithm sha1 or sha256, got %s", sig_alg); - string_list_split_in_place(&siglines, sig.buf, "\n", -1); - strbuf_add_separated_string_list(&new_data, "\n ", &siglines); - strbuf_addch(&new_data, '\n'); - } + + add_gpgsig_to_commit(&new_data, "gpgsig ", &sig_sha1); + add_gpgsig_to_commit(&new_data, "gpgsig-sha256 ", &sig_sha256); + strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); - string_list_clear(&siglines, 1); free(author); free(committer); - free(sig_alg); free(encoding); if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) diff --git a/gpg-interface.c b/gpg-interface.c index 0896458de5a988..6f2d87475fc5a0 100644 --- a/gpg-interface.c +++ b/gpg-interface.c @@ -144,6 +144,18 @@ static struct gpg_format *get_format_by_sig(const char *sig) return NULL; } +const char *get_signature_format(const char *buf) +{ + struct gpg_format *format = get_format_by_sig(buf); + return format ? format->name : "unknown"; +} + +int valid_signature_format(const char *format) +{ + return (!!get_format_by_name(format) || + !strcmp(format, "unknown")); +} + void signature_check_clear(struct signature_check *sigc) { FREE_AND_NULL(sigc->payload); diff --git a/gpg-interface.h b/gpg-interface.h index e09f12e8d04d92..60ddf8bbfa3833 100644 --- a/gpg-interface.h +++ b/gpg-interface.h @@ -47,6 +47,18 @@ struct signature_check { void signature_check_clear(struct signature_check *sigc); +/* + * Return the format of the signature (like "openpgp", "x509", "ssh" + * or "unknown"). + */ +const char *get_signature_format(const char *buf); + +/* + * Is the signature format valid (like "openpgp", "x509", "ssh" or + * "unknown") + */ +int valid_signature_format(const char *format); + /* * Look at a GPG signed tag object. If such a signature exists, store it in * signature and the signed content in payload. Return 1 if a signature was diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 76619765fcbc98..46700dbc401734 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -314,7 +314,7 @@ test_expect_success GPG 'signed-commits=abort' ' test_expect_success GPG 'signed-commits=verbatim' ' git fast-export --signed-commits=verbatim --reencode=no commit-signing >output && - grep "^gpgsig sha" output && + test_grep -E "^gpgsig $GIT_DEFAULT_HASH openpgp" output && grep "encoding ISO-8859-1" output && ( cd new && @@ -328,7 +328,7 @@ test_expect_success GPG 'signed-commits=verbatim' ' test_expect_success GPG 'signed-commits=warn-verbatim' ' git fast-export --signed-commits=warn-verbatim --reencode=no commit-signing >output 2>err && - grep "^gpgsig sha" output && + test_grep -E "^gpgsig $GIT_DEFAULT_HASH openpgp" output && grep "encoding ISO-8859-1" output && test -s err && ( @@ -369,6 +369,62 @@ test_expect_success GPG 'signed-commits=warn-strip' ' ' +test_expect_success GPGSM 'setup X.509 signed commit' ' + + git checkout -b x509-signing main && + test_config gpg.format x509 && + test_config user.signingkey $GIT_COMMITTER_EMAIL && + echo "X.509 content" >file && + git add file && + git commit -S -m "X.509 signed commit" && + X509_COMMIT=$(git rev-parse HEAD) && + git checkout main + +' + +test_expect_success GPGSM 'round-trip X.509 signed commit' ' + + git fast-export --signed-commits=verbatim x509-signing >output && + test_grep -E "^gpgsig $GIT_DEFAULT_HASH x509" output && + ( + cd new && + git fast-import && + git cat-file commit refs/heads/x509-signing >actual && + grep "^gpgsig" actual && + IMPORTED=$(git rev-parse refs/heads/x509-signing) && + test $X509_COMMIT = $IMPORTED + ) file && + git add file && + git commit -S -m "SSH signed commit" && + SSH_COMMIT=$(git rev-parse HEAD) && + git checkout main + +' + +test_expect_success GPGSSH 'round-trip SSH signed commit' ' + + git fast-export --signed-commits=verbatim ssh-signing >output && + test_grep -E "^gpgsig $GIT_DEFAULT_HASH ssh" output && + ( + cd new && + git fast-import && + git cat-file commit refs/heads/ssh-signing >actual && + grep "^gpgsig" actual && + IMPORTED=$(git rev-parse refs/heads/ssh-signing) && + test $SSH_COMMIT = $IMPORTED + ) explicit-sha256/B && + git -C explicit-sha256 add B && + test_tick && + git -C explicit-sha256 commit -S -m "signed" B && + SHA256_B=$(git -C explicit-sha256 rev-parse dual-signed) && + + # Create the corresponding SHA-1 commit + SHA1_B=$(git -C explicit-sha256 rev-parse --output-object-format=sha1 dual-signed) && + + # Check that the resulting SHA-1 commit has both signatures + echo $SHA1_B | git -C explicit-sha256 cat-file --batch >out && + test_grep -E "^gpgsig " out && + test_grep -E "^gpgsig-sha256 " out +' + +test_expect_success GPG 'export and import of doubly signed commit' ' + git -C explicit-sha256 fast-export --signed-commits=verbatim dual-signed >output && + test_grep -E "^gpgsig sha1 openpgp" output && + test_grep -E "^gpgsig sha256 openpgp" output && + + ( + cd new && + git fast-import && + git cat-file commit refs/heads/dual-signed >actual && + test_grep -E "^gpgsig " actual && + test_grep -E "^gpgsig-sha256 " actual && + IMPORTED=$(git rev-parse refs/heads/dual-signed) && + if test "$GIT_DEFAULT_HASH" = "sha1" + then + test $SHA1_B = $IMPORTED + else + test $SHA256_B = $IMPORTED + fi + ) Date: Thu, 10 Jul 2025 08:46:27 +0200 Subject: [PATCH 2/8] sane-ctype: fix compiler error on Amazon Linux 2 Compiling Git fails on Amazon Linux 2 when using GCC 7.3.1 with the following compiler error: In file included from compat/posix.h:449:0, from git-compat-util.h:26, from daemon.c:3: compat/../sane-ctype.h:29:60: error: expected expression before ']' token #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) ^ compat/../sane-ctype.h:29:72: error: expected ')' before '!=' token #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) ^ compat/../sane-ctype.h:29:60: error: expected expression before ']' token #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) ^ ... lots of similar lines ... compat/../sane-ctype.h:45:50: error: expected declaration specifiers or '...' before numeric constant #define toupper(x) sane_case((unsigned char)(x), 0) ^ /usr/include/ctype.h:142:12: error: expected identifier or '(' before 'int' extern int isascii (int __c) __THROW; ^ compat/../sane-ctype.h:30:26: error: expected ')' before '&' token #define isascii(x) (((x) & ~0x7f) == 0) ^ compat/../sane-ctype.h:30:35: error: expected ')' before '==' token #define isascii(x) (((x) & ~0x7f) == 0) ^ In file included from /usr/include/features.h:423:0, from /usr/include/unistd.h:25, from compat/posix.h:90, from git-compat-util.h:26, from daemon.c:3: compat/../sane-ctype.h:44:30: error: expected declaration specifiers or '...' before '(' token #define tolower(x) sane_case((unsigned char)(x), 0x20) ^ compat/../sane-ctype.h:44:50: error: expected declaration specifiers or '...' before numeric constant #define tolower(x) sane_case((unsigned char)(x), 0x20) ^ compat/../sane-ctype.h:45:30: error: expected declaration specifiers or '...' before '(' token #define toupper(x) sane_case((unsigned char)(x), 0) ^ compat/../sane-ctype.h:45:50: error: expected declaration specifiers or '...' before numeric constant #define toupper(x) sane_case((unsigned char)(x), 0) ^ This error bisect back to 75a044f748 (git-compat-util.h: split out POSIX-emulating bits, 2025-02-18), where lots of bits got split out of "git-compat-util.h" into a new "compat/posix.h" header. The compiler error isn't immediately obvious, doubly so because the actual errors are ~3x as long as the above snippet. But what happens here is that we transitively include after we have included our own "sane-ctype.h" header. Consequently, the function declarations that exist in for isascii(3p) et al will be mangled by our macros of the same type. The result is of course completely broken. It's unclear why this issue only happens on Amazon Linux 2. My guess is that it's either specific to the compiler version or specific to the glibc version. We don't explicitly include anywhere, but it's being transitively included. So chances are that later versions of the toolchain reorganized their headers so that is not included transitively anymore. Fix the issue by explicitly including in "sane-ctype.h". This ensures that the header guards will be activated and that any subsequent include of the same header will become a no-op. With this we can then safely override the function declarations with our own macros. Reported-by: Stan Hu Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- sane-ctype.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/sane-ctype.h b/sane-ctype.h index cbea1b299b79a5..4f476c4381680b 100644 --- a/sane-ctype.h +++ b/sane-ctype.h @@ -1,6 +1,15 @@ #ifndef SANE_CTYPE_H #define SANE_CTYPE_H +/* + * Explicitly include so that its header guards kick in from here on. + * This ensures that the file won't get included after "sane-ctype.h", as that + * would otherwise lead to a compiler error because the function declarations + * for `int isascii(int c)` et al would be mangled by our macros with the same + * name. + */ +#include + /* Sane ctype - no locale, and works with signed chars */ #undef isascii #undef isspace From 4ca70179020b6a33bb5334302e7c79faf7eeaf52 Mon Sep 17 00:00:00 2001 From: Lidong Yan Date: Sat, 12 Jul 2025 17:35:13 +0800 Subject: [PATCH 3/8] bloom: add test helper to return murmur3 hash In bloom.h, murmur3_seeded_v2() is exported for the use of test murmur3 hash. To clarify that murmur3_seeded_v2() is exported solely for testing purposes, a new helper function test_murmur3_seeded() was added instead of exporting murmur3_seeded_v2() directly. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- bloom.c | 13 ++++++++++++- bloom.h | 12 +++--------- t/helper/test-bloom.c | 4 ++-- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/bloom.c b/bloom.c index 0c8d2cebf9c4ed..946c5e8c985bbc 100644 --- a/bloom.c +++ b/bloom.c @@ -107,7 +107,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g, * Not considered to be cryptographically secure. * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm */ -uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len) +static uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len) { const uint32_t c1 = 0xcc9e2d51; const uint32_t c2 = 0x1b873593; @@ -540,3 +540,14 @@ int bloom_filter_contains(const struct bloom_filter *filter, return 1; } + +uint32_t test_bloom_murmur3_seeded(uint32_t seed, const char *data, size_t len, + int version) +{ + assert(version == 1 || version == 2); + + if (version == 2) + return murmur3_seeded_v2(seed, data, len); + else + return murmur3_seeded_v1(seed, data, len); +} diff --git a/bloom.h b/bloom.h index 6e46489a20129a..a9ded1822faebb 100644 --- a/bloom.h +++ b/bloom.h @@ -78,15 +78,6 @@ int load_bloom_filter_from_graph(struct commit_graph *g, struct bloom_filter *filter, uint32_t graph_pos); -/* - * Calculate the murmur3 32-bit hash value for the given data - * using the given seed. - * Produces a uniformly distributed hash value. - * Not considered to be cryptographically secure. - * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm - */ -uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len); - void fill_bloom_key(const char *data, size_t len, struct bloom_key *key, @@ -137,4 +128,7 @@ int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, const struct bloom_filter_settings *settings); +uint32_t test_bloom_murmur3_seeded(uint32_t seed, const char *data, size_t len, + int version); + #endif diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 9aa2c5a5926d47..6a24b6e0a6de9b 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -61,13 +61,13 @@ int cmd__bloom(int argc, const char **argv) uint32_t hashed; if (argc < 3) usage(bloom_usage); - hashed = murmur3_seeded_v2(0, argv[2], strlen(argv[2])); + hashed = test_bloom_murmur3_seeded(0, argv[2], strlen(argv[2]), 2); printf("Murmur3 Hash with seed=0:0x%08x\n", hashed); } if (!strcmp(argv[1], "get_murmur3_seven_highbit")) { uint32_t hashed; - hashed = murmur3_seeded_v2(0, "\x99\xaa\xbb\xcc\xdd\xee\xff", 7); + hashed = test_bloom_murmur3_seeded(0, "\x99\xaa\xbb\xcc\xdd\xee\xff", 7, 2); printf("Murmur3 Hash with seed=0:0x%08x\n", hashed); } From b187353ed2b92745a903d321eaafac342a5df8d4 Mon Sep 17 00:00:00 2001 From: Lidong Yan Date: Sat, 12 Jul 2025 17:35:14 +0800 Subject: [PATCH 4/8] bloom: rename function operates on bloom_key git code style requires that functions operating on a struct S should be named in the form S_verb. However, the functions operating on struct bloom_key do not follow this convention. Therefore, fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill() and bloom_key_clear(), respectively. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- blame.c | 2 +- bloom.c | 10 ++++------ bloom.h | 6 ++---- line-log.c | 5 +++-- revision.c | 8 ++++---- t/helper/test-bloom.c | 4 ++-- 6 files changed, 16 insertions(+), 19 deletions(-) diff --git a/blame.c b/blame.c index 57daa45e8996e8..811c6d8f9f6cc5 100644 --- a/blame.c +++ b/blame.c @@ -1310,7 +1310,7 @@ static void add_bloom_key(struct blame_bloom_data *bd, } bd->keys[bd->nr] = xmalloc(sizeof(struct bloom_key)); - fill_bloom_key(path, strlen(path), bd->keys[bd->nr], bd->settings); + bloom_key_fill(bd->keys[bd->nr], path, strlen(path), bd->settings); bd->nr++; } diff --git a/bloom.c b/bloom.c index 946c5e8c985bbc..5523d198c8d382 100644 --- a/bloom.c +++ b/bloom.c @@ -221,9 +221,7 @@ static uint32_t murmur3_seeded_v1(uint32_t seed, const char *data, size_t len) return seed; } -void fill_bloom_key(const char *data, - size_t len, - struct bloom_key *key, +void bloom_key_fill(struct bloom_key *key, const char *data, size_t len, const struct bloom_filter_settings *settings) { int i; @@ -243,7 +241,7 @@ void fill_bloom_key(const char *data, key->hashes[i] = hash0 + i * hash1; } -void clear_bloom_key(struct bloom_key *key) +void bloom_key_clear(struct bloom_key *key) { FREE_AND_NULL(key->hashes); } @@ -500,9 +498,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; - fill_bloom_key(e->path, strlen(e->path), &key, settings); + bloom_key_fill(&key, e->path, strlen(e->path), settings); add_key_to_filter(&key, filter, settings); - clear_bloom_key(&key); + bloom_key_clear(&key); } cleanup: diff --git a/bloom.h b/bloom.h index a9ded1822faebb..603bc1f90f5dcc 100644 --- a/bloom.h +++ b/bloom.h @@ -78,11 +78,9 @@ int load_bloom_filter_from_graph(struct commit_graph *g, struct bloom_filter *filter, uint32_t graph_pos); -void fill_bloom_key(const char *data, - size_t len, - struct bloom_key *key, +void bloom_key_fill(struct bloom_key *key, const char *data, size_t len, const struct bloom_filter_settings *settings); -void clear_bloom_key(struct bloom_key *key); +void bloom_key_clear(struct bloom_key *key); void add_key_to_filter(const struct bloom_key *key, struct bloom_filter *filter, diff --git a/line-log.c b/line-log.c index 628e3fe3ae18dc..07f2154e84b843 100644 --- a/line-log.c +++ b/line-log.c @@ -1172,12 +1172,13 @@ static int bloom_filter_check(struct rev_info *rev, return 0; while (!result && range) { - fill_bloom_key(range->path, strlen(range->path), &key, rev->bloom_filter_settings); + bloom_key_fill(&key, range->path, strlen(range->path), + rev->bloom_filter_settings); if (bloom_filter_contains(filter, &key, rev->bloom_filter_settings)) result = 1; - clear_bloom_key(&key); + bloom_key_clear(&key); range = range->next; } diff --git a/revision.c b/revision.c index 2c36a9c179efb5..a9902d807e0af8 100644 --- a/revision.c +++ b/revision.c @@ -739,15 +739,15 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) revs->bloom_keys_nr = path_component_nr; ALLOC_ARRAY(revs->bloom_keys, revs->bloom_keys_nr); - fill_bloom_key(path, len, &revs->bloom_keys[0], + bloom_key_fill(&revs->bloom_keys[0], path, len, revs->bloom_filter_settings); path_component_nr = 1; p = path + len - 1; while (p > path) { if (*p == '/') - fill_bloom_key(path, p - path, - &revs->bloom_keys[path_component_nr++], + bloom_key_fill(&revs->bloom_keys[path_component_nr++], + path, p - path, revs->bloom_filter_settings); p--; } @@ -3230,7 +3230,7 @@ void release_revisions(struct rev_info *revs) oidset_clear(&revs->missing_commits); for (int i = 0; i < revs->bloom_keys_nr; i++) - clear_bloom_key(&revs->bloom_keys[i]); + bloom_key_clear(&revs->bloom_keys[i]); FREE_AND_NULL(revs->bloom_keys); revs->bloom_keys_nr = 0; } diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 6a24b6e0a6de9b..3283544bd33db6 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -12,13 +12,13 @@ static struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; static void add_string_to_filter(const char *data, struct bloom_filter *filter) { struct bloom_key key; - fill_bloom_key(data, strlen(data), &key, &settings); + bloom_key_fill(&key, data, strlen(data), &settings); printf("Hashes:"); for (size_t i = 0; i < settings.num_hashes; i++) printf("0x%08x|", key.hashes[i]); printf("\n"); add_key_to_filter(&key, filter, &settings); - clear_bloom_key(&key); + bloom_key_clear(&key); } static void print_bloom_filter(struct bloom_filter *filter) { From 90d5518a7dd53ccc7d967a3a066d688da1d7e214 Mon Sep 17 00:00:00 2001 From: Lidong Yan Date: Sat, 12 Jul 2025 17:35:15 +0800 Subject: [PATCH 5/8] bloom: replace struct bloom_key * with struct bloom_keyvec Previously, we stored bloom keys in a flat array and marked a commit as NOT TREESAME if any key reported "definitely not changed". To support multiple pathspec items, we now require that for each pathspec item, there exists a bloom key reporting "definitely not changed". This "for every" condition makes a flat array insufficient, so we introduce a new structure to group keys by a single pathspec item. `struct bloom_keyvec` is introduced to replace `struct bloom_key *` and `bloom_key_nr`. And because we want to support multiple pathspec items, we added a bloom_keyvec * and a bloom_keyvec_nr field to `struct rev_info` to represent an array of bloom_keyvecs. This commit still optimize only one pathspec item, thus bloom_keyvec_nr can only be 0 or 1. New bloom_keyvec_* functions are added to create and destroy a keyvec. bloom_filter_contains_vec() is added to check if all key in keyvec is contained in a bloom filter. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- bloom.c | 61 +++++++++++++++++++++++++++++++++++++++++++ bloom.h | 38 +++++++++++++++++++++++++++ revision.c | 76 +++++++++++++++++++++--------------------------------- revision.h | 6 ++--- 4 files changed, 132 insertions(+), 49 deletions(-) diff --git a/bloom.c b/bloom.c index 5523d198c8d382..b86015f6d1babb 100644 --- a/bloom.c +++ b/bloom.c @@ -278,6 +278,55 @@ void deinit_bloom_filters(void) deep_clear_bloom_filter_slab(&bloom_filters, free_one_bloom_filter); } +struct bloom_keyvec *bloom_keyvec_new(const char *path, size_t len, + const struct bloom_filter_settings *settings) +{ + struct bloom_keyvec *vec; + const char *p; + size_t sz; + size_t nr = 1; + + p = path; + while (*p) { + /* + * At this point, the path is normalized to use Unix-style + * path separators. This is required due to how the + * changed-path Bloom filters store the paths. + */ + if (*p == '/') + nr++; + p++; + } + + sz = sizeof(struct bloom_keyvec); + sz += nr * sizeof(struct bloom_key); + vec = (struct bloom_keyvec *)xcalloc(1, sz); + if (!vec) + return NULL; + vec->count = nr; + + bloom_key_fill(&vec->key[0], path, len, settings); + nr = 1; + p = path + len - 1; + while (p > path) { + if (*p == '/') { + bloom_key_fill(&vec->key[nr++], path, p - path, settings); + } + p--; + } + assert(nr == vec->count); + return vec; +} + +void bloom_keyvec_free(struct bloom_keyvec *vec) +{ + if (!vec) + return; + for (size_t nr = 0; nr < vec->count; nr++) + bloom_key_clear(&vec->key[nr]); + free(vec); +} + static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, const struct hashmap_entry *eptr, const struct hashmap_entry *entry_or_key, @@ -539,6 +588,18 @@ int bloom_filter_contains(const struct bloom_filter *filter, return 1; } +int bloom_filter_contains_vec(const struct bloom_filter *filter, + const struct bloom_keyvec *vec, + const struct bloom_filter_settings *settings) +{ + int ret = 1; + + for (size_t nr = 0; ret > 0 && nr < vec->count; nr++) + ret = bloom_filter_contains(filter, &vec->key[nr], settings); + + return ret; +} + uint32_t test_bloom_murmur3_seeded(uint32_t seed, const char *data, size_t len, int version) { diff --git a/bloom.h b/bloom.h index 603bc1f90f5dcc..92ab2100d390e7 100644 --- a/bloom.h +++ b/bloom.h @@ -74,6 +74,16 @@ struct bloom_key { uint32_t *hashes; }; +/* + * A bloom_keyvec is a vector of bloom_keys, which + * can be used to store multiple keys for a single + * pathspec item. + */ +struct bloom_keyvec { + size_t count; + struct bloom_key key[FLEX_ARRAY]; +}; + int load_bloom_filter_from_graph(struct commit_graph *g, struct bloom_filter *filter, uint32_t graph_pos); @@ -82,6 +92,23 @@ void bloom_key_fill(struct bloom_key *key, const char *data, size_t len, const struct bloom_filter_settings *settings); void bloom_key_clear(struct bloom_key *key); +/* + * bloom_keyvec_new - Allocate and populate a bloom_keyvec with keys for the + * given path. + * + * This function splits the input path by '/' and generates a bloom key for each + * prefix, in reverse order of specificity. For example, given the input + * "a/b/c", it will generate bloom keys for: + * - "a/b/c" + * - "a/b" + * - "a" + * + * The resulting keys are stored in a newly allocated bloom_keyvec. + */ +struct bloom_keyvec *bloom_keyvec_new(const char *path, size_t len, + const struct bloom_filter_settings *settings); +void bloom_keyvec_free(struct bloom_keyvec *vec); + void add_key_to_filter(const struct bloom_key *key, struct bloom_filter *filter, const struct bloom_filter_settings *settings); @@ -126,6 +153,17 @@ int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, const struct bloom_filter_settings *settings); +/* + * bloom_filter_contains_vec - Check if all keys in a key vector are in the + * Bloom filter. + * + * Returns 1 if **all** keys in the vector are present in the filter, + * 0 if **any** key is not present. + */ +int bloom_filter_contains_vec(const struct bloom_filter *filter, + const struct bloom_keyvec *v, + const struct bloom_filter_settings *settings); + uint32_t test_bloom_murmur3_seeded(uint32_t seed, const char *data, size_t len, int version); diff --git a/revision.c b/revision.c index a9902d807e0af8..d106fa417338ef 100644 --- a/revision.c +++ b/revision.c @@ -685,13 +685,14 @@ static int forbid_bloom_filters(struct pathspec *spec) return 0; } +static void release_revisions_bloom_keyvecs(struct rev_info *revs); + static void prepare_to_use_bloom_filter(struct rev_info *revs) { struct pathspec_item *pi; char *path_alloc = NULL; - const char *path, *p; + const char *path; size_t len; - int path_component_nr = 1; if (!revs->commits) return; @@ -708,6 +709,8 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) if (!revs->pruning.pathspec.nr) return; + revs->bloom_keyvecs_nr = 1; + CALLOC_ARRAY(revs->bloom_keyvecs, 1); pi = &revs->pruning.pathspec.items[0]; /* remove single trailing slash from path, if needed */ @@ -718,53 +721,30 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) path = pi->match; len = strlen(path); - if (!len) { - revs->bloom_filter_settings = NULL; - free(path_alloc); - return; - } - - p = path; - while (*p) { - /* - * At this point, the path is normalized to use Unix-style - * path separators. This is required due to how the - * changed-path Bloom filters store the paths. - */ - if (*p == '/') - path_component_nr++; - p++; - } - - revs->bloom_keys_nr = path_component_nr; - ALLOC_ARRAY(revs->bloom_keys, revs->bloom_keys_nr); + if (!len) + goto fail; - bloom_key_fill(&revs->bloom_keys[0], path, len, - revs->bloom_filter_settings); - path_component_nr = 1; - - p = path + len - 1; - while (p > path) { - if (*p == '/') - bloom_key_fill(&revs->bloom_keys[path_component_nr++], - path, p - path, - revs->bloom_filter_settings); - p--; - } + revs->bloom_keyvecs[0] = + bloom_keyvec_new(path, len, revs->bloom_filter_settings); if (trace2_is_enabled() && !bloom_filter_atexit_registered) { atexit(trace2_bloom_filter_statistics_atexit); bloom_filter_atexit_registered = 1; } + return; + +fail: + revs->bloom_filter_settings = NULL; free(path_alloc); + release_revisions_bloom_keyvecs(revs); } static int check_maybe_different_in_bloom_filter(struct rev_info *revs, struct commit *commit) { struct bloom_filter *filter; - int result = 1, j; + int result = 0; if (!revs->repo->objects->commit_graph) return -1; @@ -779,10 +759,10 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, return -1; } - for (j = 0; result && j < revs->bloom_keys_nr; j++) { - result = bloom_filter_contains(filter, - &revs->bloom_keys[j], - revs->bloom_filter_settings); + for (size_t nr = 0; !result && nr < revs->bloom_keyvecs_nr; nr++) { + result = bloom_filter_contains_vec(filter, + revs->bloom_keyvecs[nr], + revs->bloom_filter_settings); } if (result) @@ -823,7 +803,7 @@ static int rev_compare_tree(struct rev_info *revs, return REV_TREE_SAME; } - if (revs->bloom_keys_nr && !nth_parent) { + if (revs->bloom_keyvecs_nr && !nth_parent) { bloom_ret = check_maybe_different_in_bloom_filter(revs, commit); if (bloom_ret == 0) @@ -850,7 +830,7 @@ static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit, if (!t1) return 0; - if (!nth_parent && revs->bloom_keys_nr) { + if (!nth_parent && revs->bloom_keyvecs_nr) { bloom_ret = check_maybe_different_in_bloom_filter(revs, commit); if (!bloom_ret) return 1; @@ -3200,6 +3180,14 @@ static void release_revisions_mailmap(struct string_list *mailmap) static void release_revisions_topo_walk_info(struct topo_walk_info *info); +static void release_revisions_bloom_keyvecs(struct rev_info *revs) +{ + for (size_t nr = 0; nr < revs->bloom_keyvecs_nr; nr++) + bloom_keyvec_free(revs->bloom_keyvecs[nr]); + FREE_AND_NULL(revs->bloom_keyvecs); + revs->bloom_keyvecs_nr = 0; +} + static void free_void_commit_list(void *list) { free_commit_list(list); @@ -3228,11 +3216,7 @@ void release_revisions(struct rev_info *revs) clear_decoration(&revs->treesame, free); line_log_free(revs); oidset_clear(&revs->missing_commits); - - for (int i = 0; i < revs->bloom_keys_nr; i++) - bloom_key_clear(&revs->bloom_keys[i]); - FREE_AND_NULL(revs->bloom_keys); - revs->bloom_keys_nr = 0; + release_revisions_bloom_keyvecs(revs); } static void add_child(struct rev_info *revs, struct commit *parent, struct commit *child) diff --git a/revision.h b/revision.h index 6d369cdad6a473..ac843f58d0cf81 100644 --- a/revision.h +++ b/revision.h @@ -62,7 +62,7 @@ struct repository; struct rev_info; struct string_list; struct saved_parents; -struct bloom_key; +struct bloom_keyvec; struct bloom_filter_settings; struct option; struct parse_opt_ctx_t; @@ -360,8 +360,8 @@ struct rev_info { /* Commit graph bloom filter fields */ /* The bloom filter key(s) for the pathspec */ - struct bloom_key *bloom_keys; - int bloom_keys_nr; + struct bloom_keyvec **bloom_keyvecs; + int bloom_keyvecs_nr; /* * The bloom filter settings used to generate the key. From 937153dece3c2b1e04b0e071298745abd57cd347 Mon Sep 17 00:00:00 2001 From: Lidong Yan Date: Sat, 12 Jul 2025 17:35:16 +0800 Subject: [PATCH 6/8] revision: make helper for pathspec to bloom keyvec When preparing to use bloom filters in a revision walk, Git populates a boom_keyvec with an array of bloom keys for the components of a path. Before we create the ability to map multiple pathspecs to multiple bloom_keyvecs, extract the conversion from a pathspec to a bloom_keyvec into its own helper method. This simplifies the state that persists in prepare_to_use_bloom_filter() as well as makes the future change much simpler. Signed-off-by: Derrick Stolee Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- revision.c | 45 +++++++++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/revision.c b/revision.c index d106fa417338ef..ed849b31c704b7 100644 --- a/revision.c +++ b/revision.c @@ -687,13 +687,37 @@ static int forbid_bloom_filters(struct pathspec *spec) static void release_revisions_bloom_keyvecs(struct rev_info *revs); -static void prepare_to_use_bloom_filter(struct rev_info *revs) +static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out, + const struct pathspec_item *pi, + const struct bloom_filter_settings *settings) { - struct pathspec_item *pi; char *path_alloc = NULL; const char *path; size_t len; + int res = 0; + + /* remove single trailing slash from path, if needed */ + if (pi->len > 0 && pi->match[pi->len - 1] == '/') { + path_alloc = xmemdupz(pi->match, pi->len - 1); + path = path_alloc; + } else + path = pi->match; + + len = strlen(path); + if (!len) { + res = -1; + goto cleanup; + } + *out = bloom_keyvec_new(path, len, settings); + +cleanup: + free(path_alloc); + return res; +} + +static void prepare_to_use_bloom_filter(struct rev_info *revs) +{ if (!revs->commits) return; @@ -711,22 +735,12 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) revs->bloom_keyvecs_nr = 1; CALLOC_ARRAY(revs->bloom_keyvecs, 1); - pi = &revs->pruning.pathspec.items[0]; - /* remove single trailing slash from path, if needed */ - if (pi->len > 0 && pi->match[pi->len - 1] == '/') { - path_alloc = xmemdupz(pi->match, pi->len - 1); - path = path_alloc; - } else - path = pi->match; - - len = strlen(path); - if (!len) + if (convert_pathspec_to_bloom_keyvec(&revs->bloom_keyvecs[0], + &revs->pruning.pathspec.items[0], + revs->bloom_filter_settings)) goto fail; - revs->bloom_keyvecs[0] = - bloom_keyvec_new(path, len, revs->bloom_filter_settings); - if (trace2_is_enabled() && !bloom_filter_atexit_registered) { atexit(trace2_bloom_filter_statistics_atexit); bloom_filter_atexit_registered = 1; @@ -736,7 +750,6 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) fail: revs->bloom_filter_settings = NULL; - free(path_alloc); release_revisions_bloom_keyvecs(revs); } From 2a6ce090f27016d68ee6952809d98fe88ce53522 Mon Sep 17 00:00:00 2001 From: Lidong Yan Date: Tue, 15 Jul 2025 10:56:22 +0800 Subject: [PATCH 7/8] bloom: optimize multiple pathspec items in revision MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible. Add new test cases in t/t4216-log-bloom.sh to ensure - consistent results between the optimization for multiple pathspec items using bloom filter and the case without bloom filter optimization. - does not use bloom filter if any pathspec item is not literal. With these optimizations, we get some improvements for multi-pathspec runs of 'git log'. First, in the Git repository we see these modest results: Benchmark 1: old Time (mean ± σ): 73.1 ms ± 2.9 ms Range (min … max): 69.9 ms … 84.5 ms 42 runs Benchmark 2: new Time (mean ± σ): 55.1 ms ± 2.9 ms Range (min … max): 51.1 ms … 61.2 ms 52 runs Summary 'new' ran 1.33 ± 0.09 times faster than 'old' But in a larger repo, such as the LLVM project repo below, we get even better results: Benchmark 1: old Time (mean ± σ): 1.974 s ± 0.006 s Range (min … max): 1.960 s … 1.983 s 10 runs Benchmark 2: new Time (mean ± σ): 262.9 ms ± 2.4 ms Range (min … max): 257.7 ms … 266.2 ms 11 runs Summary 'new' ran 7.51 ± 0.07 times faster than 'old' Signed-off-by: Derrick Stolee [ly: rename convert_pathspec_to_filter() to convert_pathspec_to_bloom_keyvec()] Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- revision.c | 21 +++++++++++---------- t/t4216-log-bloom.sh | 23 ++++++++++++++--------- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/revision.c b/revision.c index ed849b31c704b7..572331a42fcee5 100644 --- a/revision.c +++ b/revision.c @@ -675,12 +675,11 @@ static int forbid_bloom_filters(struct pathspec *spec) { if (spec->has_wildcard) return 1; - if (spec->nr > 1) - return 1; if (spec->magic & ~PATHSPEC_LITERAL) return 1; - if (spec->nr && (spec->items[0].magic & ~PATHSPEC_LITERAL)) - return 1; + for (size_t nr = 0; nr < spec->nr; nr++) + if (spec->items[nr].magic & ~PATHSPEC_LITERAL) + return 1; return 0; } @@ -733,13 +732,15 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) if (!revs->pruning.pathspec.nr) return; - revs->bloom_keyvecs_nr = 1; - CALLOC_ARRAY(revs->bloom_keyvecs, 1); + revs->bloom_keyvecs_nr = revs->pruning.pathspec.nr; + CALLOC_ARRAY(revs->bloom_keyvecs, revs->bloom_keyvecs_nr); - if (convert_pathspec_to_bloom_keyvec(&revs->bloom_keyvecs[0], - &revs->pruning.pathspec.items[0], - revs->bloom_filter_settings)) - goto fail; + for (int i = 0; i < revs->pruning.pathspec.nr; i++) { + if (convert_pathspec_to_bloom_keyvec(&revs->bloom_keyvecs[i], + &revs->pruning.pathspec.items[i], + revs->bloom_filter_settings)) + goto fail; + } if (trace2_is_enabled() && !bloom_filter_atexit_registered) { atexit(trace2_bloom_filter_statistics_atexit); diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 8910d53cac1146..639868ac562f9e 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -66,8 +66,9 @@ sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { rm -f "$TRASH_DIRECTORY/trace.perf" && - git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && - GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom + eval git -c core.commitGraph=false log --pretty="format:%s" "$1" >log_wo_bloom && + eval "GIT_TRACE2_PERF=\"$TRASH_DIRECTORY/trace.perf\"" \ + git -c core.commitGraph=true log --pretty="format:%s" "$1" >log_w_bloom } test_bloom_filters_used () { @@ -138,10 +139,6 @@ test_expect_success 'git log with --walk-reflogs does not use Bloom filters' ' test_bloom_filters_not_used "--walk-reflogs -- A" ' -test_expect_success 'git log -- multiple path specs does not use Bloom filters' ' - test_bloom_filters_not_used "-- file4 A/file1" -' - test_expect_success 'git log -- "." pathspec at root does not use Bloom filters' ' test_bloom_filters_not_used "-- ." ' @@ -151,9 +148,17 @@ test_expect_success 'git log with wildcard that resolves to a single path uses B test_bloom_filters_used "-- *renamed" ' -test_expect_success 'git log with wildcard that resolves to a multiple paths does not uses Bloom filters' ' - test_bloom_filters_not_used "-- *" && - test_bloom_filters_not_used "-- file*" +test_expect_success 'git log with multiple literal paths uses Bloom filter' ' + test_bloom_filters_used "-- file4 A/file1" && + test_bloom_filters_used "-- *" && + test_bloom_filters_used "-- file*" +' + +test_expect_success 'git log with path contains a wildcard does not use Bloom filter' ' + test_bloom_filters_not_used "-- file\*" && + test_bloom_filters_not_used "-- A/\* file4" && + test_bloom_filters_not_used "-- file4 A/\*" && + test_bloom_filters_not_used "-- * A/\*" ' test_expect_success 'setup - add commit-graph to the chain without Bloom filters' ' From 97e14d99f6def189b0f786ac6207b792ca3197b1 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Wed, 23 Jul 2025 15:45:01 -0700 Subject: [PATCH 8/8] The thirteenth batch Signed-off-by: Junio C Hamano --- Documentation/RelNotes/2.51.0.adoc | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Documentation/RelNotes/2.51.0.adoc b/Documentation/RelNotes/2.51.0.adoc index 33ae8f9b88cb53..ed968c5342db2c 100644 --- a/Documentation/RelNotes/2.51.0.adoc +++ b/Documentation/RelNotes/2.51.0.adoc @@ -47,6 +47,12 @@ UI, Workflows & Features service names (like smtp) in addition to the numeric port numbers (like 25). + * Lift the limitation to use changed-path filter in "git log" so that + it can be used for a pathspec with multiple literal paths. + + * Clean up the way how signature on commit objects are exported to + and imported from fast-import stream. + Performance, Internal Implementation, Development Support etc. -------------------------------------------------------------- @@ -195,6 +201,13 @@ including security updates, are included in this release. expansion. (merge 7d275cd5c0 jb/gpg-program-variable-is-a-pathname later to maint). + * Our header file relied on that the system-supplied + header is not later included, which would override our + macro definitions, but "amazon linux" broke this assumption. Fix + this by preemptively including near the beginning of + ourselves. + (merge 9d3b33125f ps/sane-ctype-workaround later to maint). + * Other code cleanup, docfix, build fix, etc. (merge b257adb571 lo/my-first-ow-doc-update later to maint). (merge 8b34b6a220 ly/sequencer-update-squash-is-fixup-only later to maint).