From 9481877de342425ea1d3cd10f7a212e99d4e2670 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 29 Apr 2025 15:47:43 +0000 Subject: [PATCH 1/5] hashmap: ensure hashmaps are reusable after hashmap_clear() In the series merged at bf0a430f70b5 (Merge branch 'en/strmap', 2020-11-21), strmap was built on top of hashmap and hashmap was extended in a few ways to support strmap and be more generally useful. One of the extensions was that hashmap_partial_clear() was introduced to allow reuse of the hashmap without freeing the table. Peff believed that it also made sense to introduce a hashmap_clear() which freed everything while allowing reuse. I added hashmap_clear(), but in doing so, overlooked the fact that for a hashmap to be reusable, it needs a defined cmpfn and data (the HASHMAP_INIT macro requires these fields as parameters, for example). So, if we want the hashmap to be reusable, we shouldn't zero out those fields. We probably also shouldn't zero out do_count_items. (We could zero out grow_at and shrink_at, but whether we zero those or not is irrelevant as they'll be automatically updated whenever a new entry is inserted.) Since clearing is associated with freeing map->table, and the only thing required for consistency after freeing map->table is zeroing tablesize and private_size, let's only zero those fields out. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- hashmap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hashmap.c b/hashmap.c index 5009471800e8c6..b99cd0300a8b4f 100644 --- a/hashmap.c +++ b/hashmap.c @@ -205,8 +205,9 @@ void hashmap_clear_(struct hashmap *map, ssize_t entry_offset) return; if (entry_offset >= 0) /* called by hashmap_clear_and_free */ free_individual_entries(map, entry_offset); - free(map->table); - memset(map, 0, sizeof(*map)); + FREE_AND_NULL(map->table); + map->tablesize = 0; + map->private_size = 0; } struct hashmap_entry *hashmap_get(const struct hashmap *map, From 104add8368617f80ee356ea48497364ed39a7b7a Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Tue, 29 Apr 2025 11:37:58 +0000 Subject: [PATCH 2/5] diff: check range before dereferencing an array element Before accessing an array element at a given index, it should be verified that the index is within the desired bounds, not afterwards, otherwise it may not make sense to even access the array element in the first place. This is the point of CodeQL's `cpp/offset-use-before-range-check` rule. This CodeQL rule unfortunately is also triggered by the `fill_es_indent_data()` code, even though the condition `off < len - 1` does not even need to guarantee that the offset is in bounds (`s` points to a NUL-terminated string, for which `s[off] == '\r'` would fail before running out of bounds). Let's work around this rare false positive to help us use an otherwise mostly useful tool is a worthy thing to do. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index c89c15d98e0e29..18ba3060460868 100644 --- a/diff.c +++ b/diff.c @@ -892,7 +892,7 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es) /* skip any \v \f \r at start of indentation */ while (s[off] == '\f' || s[off] == '\v' || - (s[off] == '\r' && off < len - 1)) + (off < len - 1 && s[off] == '\r')) off++; /* calculate the visual width of indentation */ From 8583c9dcbc7d362250c0310e4cee771ec5003327 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Wed, 30 Apr 2025 14:44:57 +0200 Subject: [PATCH 3/5] builtin/mv: bail out when trying to move child and its parent We have a known issue in git-mv(1) where moving both a child and any of its parents causes an assert to trigger because the child cannot be found anymore in the index. We have added a test for this in commit 0fcd473fdd3 (t7001: add failure test which triggers assertion, 2024-10-22) without addressing the issue, which is why the test itself is marked as `test_expect_failure`. The behaviour of that test relies on a call to assert(3p) though, which may or may not be compiled into the resulting binary depending on whether or not we pass `-DNDEBUG`. When these asserts are compiled into Git this may cause our CI to hang on Windows though, because asserts may cause a modal window to be shown. While we could work around the issue by converting this into a call to `BUG()`, let's rather address the root cause of the issue by bailing out in case we see that both a child and any of its parents are being moved in the same command. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/mv.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++-- t/t7001-mv.sh | 24 ++++++++++++++++---- 2 files changed, 79 insertions(+), 6 deletions(-) diff --git a/builtin/mv.c b/builtin/mv.c index 55a7d471dca012..c17e14cee6e9b2 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -37,6 +37,13 @@ enum update_mode { INDEX = (1 << 2), SPARSE = (1 << 3), SKIP_WORKTREE_DIR = (1 << 4), + /* + * A file gets moved implicitly via a move of one of its parent + * directories. This flag causes us to skip the check that we don't try + * to move a file and any of its parent directories at the same point + * in time. + */ + MOVE_VIA_PARENT_DIR = (1 << 5), }; #define DUP_BASENAME 1 @@ -181,6 +188,21 @@ static void remove_empty_src_dirs(const char **src_dir, size_t src_dir_nr) strbuf_release(&a_src_dir); } +struct pathmap_entry { + struct hashmap_entry ent; + const char *path; +}; + +static int pathmap_cmp(const void *cmp_data UNUSED, + const struct hashmap_entry *a, + const struct hashmap_entry *b, + const void *key UNUSED) +{ + const struct pathmap_entry *e1 = container_of(a, struct pathmap_entry, ent); + const struct pathmap_entry *e2 = container_of(b, struct pathmap_entry, ent); + return fspathcmp(e1->path, e2->path); +} + int cmd_mv(int argc, const char **argv, const char *prefix, @@ -211,6 +233,8 @@ int cmd_mv(int argc, struct cache_entry *ce; struct string_list only_match_skip_worktree = STRING_LIST_INIT_DUP; struct string_list dirty_paths = STRING_LIST_INIT_DUP; + struct hashmap moved_dirs = HASHMAP_INIT(pathmap_cmp, NULL); + struct strbuf pathbuf = STRBUF_INIT; int ret; git_config(git_default_config, NULL); @@ -329,6 +353,7 @@ int cmd_mv(int argc, dir_check: if (S_ISDIR(st.st_mode)) { + struct pathmap_entry *entry; char *dst_with_slash; size_t dst_with_slash_len; int j, n; @@ -346,6 +371,11 @@ int cmd_mv(int argc, goto act_on_entry; } + entry = xmalloc(sizeof(*entry)); + entry->path = src; + hashmap_entry_init(&entry->ent, fspathhash(src)); + hashmap_add(&moved_dirs, &entry->ent); + /* last - first >= 1 */ modes[i] |= WORKING_DIRECTORY; @@ -366,8 +396,7 @@ int cmd_mv(int argc, strvec_push(&sources, path); strvec_push(&destinations, prefixed_path); - memset(modes + argc + j, 0, sizeof(enum update_mode)); - modes[argc + j] |= ce_skip_worktree(ce) ? SPARSE : INDEX; + modes[argc + j] = MOVE_VIA_PARENT_DIR | (ce_skip_worktree(ce) ? SPARSE : INDEX); submodule_gitfiles[argc + j] = NULL; free(prefixed_path); @@ -463,6 +492,32 @@ int cmd_mv(int argc, } } + for (i = 0; i < argc; i++) { + const char *slash_pos; + + if (modes[i] & MOVE_VIA_PARENT_DIR) + continue; + + strbuf_reset(&pathbuf); + strbuf_addstr(&pathbuf, sources.v[i]); + + slash_pos = strrchr(pathbuf.buf, '/'); + while (slash_pos > pathbuf.buf) { + struct pathmap_entry needle; + + strbuf_setlen(&pathbuf, slash_pos - pathbuf.buf); + + needle.path = pathbuf.buf; + hashmap_entry_init(&needle.ent, fspathhash(pathbuf.buf)); + + if (hashmap_get_entry(&moved_dirs, &needle, ent, NULL)) + die(_("cannot move both '%s' and its parent directory '%s'"), + sources.v[i], pathbuf.buf); + + slash_pos = strrchr(pathbuf.buf, '/'); + } + } + if (only_match_skip_worktree.nr) { advise_on_updating_sparse_paths(&only_match_skip_worktree); if (!ignore_errors) { @@ -587,6 +642,8 @@ int cmd_mv(int argc, strvec_clear(&dest_paths); strvec_clear(&destinations); strvec_clear(&submodule_gitfiles_to_free); + hashmap_clear_and_free(&moved_dirs, struct pathmap_entry, ent); + strbuf_release(&pathbuf); free(submodule_gitfiles); free(modes); return ret; diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh index 25334b506228f0..920479e925620a 100755 --- a/t/t7001-mv.sh +++ b/t/t7001-mv.sh @@ -550,16 +550,32 @@ test_expect_success 'moving nested submodules' ' git status ' -test_expect_failure 'nonsense mv triggers assertion failure and partially updated index' ' +test_expect_success 'moving file and its parent directory at the same time fails' ' test_when_finished git reset --hard HEAD && git reset --hard HEAD && mkdir -p a && mkdir -p b && >a/a.txt && git add a/a.txt && - test_must_fail git mv a/a.txt a b && - git status --porcelain >actual && - grep "^A[ ]*a/a.txt$" actual + cat >expect <<-EOF && + fatal: cannot move both ${SQ}a/a.txt${SQ} and its parent directory ${SQ}a${SQ} + EOF + test_must_fail git mv a/a.txt a b 2>err && + test_cmp expect err +' + +test_expect_success 'moving nested directory and its parent directory at the same time fails' ' + test_when_finished git reset --hard HEAD && + git reset --hard HEAD && + mkdir -p a/b/c && + >a/b/c/file.txt && + git add a && + mkdir target && + cat >expect <<-EOF && + fatal: cannot move both ${SQ}a/b/c${SQ} and its parent directory ${SQ}a${SQ} + EOF + test_must_fail git mv a/b/c a target 2>err && + test_cmp expect err ' test_done From 974f0d46645604ac45b8a5ce0b90e2b2a56ca764 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Wed, 30 Apr 2025 14:44:58 +0200 Subject: [PATCH 4/5] builtin/mv: convert assert(3p) into `BUG()` The use of asserts is discouraged in our codebase because they lead to different behaviour depending on how Git is built. When being unsure enough whether a condition always holds so that one adds the assert, then the assert should probably trigger regardless of how Git is being built. Drop the call to assert(3p) in git-mv(1) and instead use `BUG()`. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/mv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/builtin/mv.c b/builtin/mv.c index c17e14cee6e9b2..4b4e1fce2e28bf 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -560,7 +560,8 @@ int cmd_mv(int argc, continue; pos = index_name_pos(the_repository->index, src, strlen(src)); - assert(pos >= 0); + if (pos < 0) + BUG("could not find source in index: '%s'", src); if (!(mode & SPARSE) && !lstat(src, &st)) sparse_and_dirty = ie_modified(the_repository->index, the_repository->index->cache[pos], From 1ee85f0e215f22b0878d0ad4b2445d12bbb63887 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Thu, 8 May 2025 11:12:02 -0700 Subject: [PATCH 5/5] The twelfth batch Signed-off-by: Junio C Hamano --- Documentation/RelNotes/2.50.0.adoc | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/Documentation/RelNotes/2.50.0.adoc b/Documentation/RelNotes/2.50.0.adoc index b1083c5193a54d..07759cf98b3cca 100644 --- a/Documentation/RelNotes/2.50.0.adoc +++ b/Documentation/RelNotes/2.50.0.adoc @@ -251,6 +251,15 @@ Fixes since v2.49 * Update to arm64 Windows port. (merge 436a42215e js/windows-arm64 later to maint). + * hashmap API clean-up to ensure hashmap_clear() leaves a cleared map + in a reusable state. + (merge 9481877de3 en/hashmap-clear-fix later to maint). + + * "git mv a a/b dst" would ask to move the directory 'a' itself, as + well as its contents, in a single destination directory, which is + a contradicting request that is impossible to satisfy. This case is + now detected and the command errors out. + (merge 974f0d4664 ps/mv-contradiction-fix later to maint). * Other code cleanup, docfix, build fix, etc. (merge 227c4f33a0 ja/doc-block-delimiter-markup-fix later to maint). @@ -277,3 +286,4 @@ Fixes since v2.49 (merge 25292c301d lo/remove-log-reencode-from-rev-info later to maint). (merge 1aa50636fd jk/p5332-testfix later to maint). (merge 42cf4ac552 ps/ci-resurrect-p4-on-github later to maint). + (merge 104add8368 js/diff-codeql-false-positive-workaround later to maint).