From 88ce8dfe2924f2a0c75bd3ec5c52fae02f73ba19 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sat, 10 May 2025 14:33:14 +0200 Subject: [PATCH 01/52] git-daemon doc: update mark-up of synopsis option descriptions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To unify mark-up used in our documentation to a newer convention, started by 22293895 (doc: apply synopsis simplification on git-clone and git-init, 2024-09-24), update the documentation of 'git daemon' to * use [synopsis], not [verse] in the SYNOPSIS section * enclose `--option=` in backquotes Also, split '--[no-]option' into '--option' and '--no-option' to make it easier to grep for them. Signed-off-by: Junio C Hamano Helped-by: Jean-Noël Avila Signed-off-by: Junio C Hamano --- Documentation/git-daemon.adoc | 181 +++++++++++++++++----------------- 1 file changed, 91 insertions(+), 90 deletions(-) diff --git a/Documentation/git-daemon.adoc b/Documentation/git-daemon.adoc index ede7b935d64947..99389f038894c4 100644 --- a/Documentation/git-daemon.adoc +++ b/Documentation/git-daemon.adoc @@ -7,21 +7,21 @@ git-daemon - A really simple server for Git repositories SYNOPSIS -------- -[verse] -'git daemon' [--verbose] [--syslog] [--export-all] - [--timeout=] [--init-timeout=] [--max-connections=] - [--strict-paths] [--base-path=] [--base-path-relaxed] - [--user-path | --user-path=] - [--interpolated-path=] - [--reuseaddr] [--detach] [--pid-file=] - [--enable=] [--disable=] - [--allow-override=] [--forbid-override=] - [--access-hook=] [--[no-]informative-errors] - [--inetd | - [--listen=] [--port=] - [--user= [--group=]]] - [--log-destination=(stderr|syslog|none)] - [...] +[synopsis] +git daemon [--verbose] [--syslog] [--export-all] + [--timeout=] [--init-timeout=] [--max-connections=] + [--strict-paths] [--base-path=] [--base-path-relaxed] + [--user-path | --user-path=] + [--interpolated-path=] + [--reuseaddr] [--detach] [--pid-file=] + [--enable=] [--disable=] + [--allow-override=] [--forbid-override=] + [--access-hook=] [--[no-]informative-errors] + [--inetd | + [--listen=] [--port=] + [--user= [--group=]]] + [--log-destination=(stderr|syslog|none)] + [...] DESCRIPTION ----------- @@ -32,111 +32,111 @@ that service if it is enabled. It verifies that the directory has the magic file "git-daemon-export-ok", and it will refuse to export any Git directory that hasn't explicitly been marked for export this way (unless the `--export-all` parameter is specified). If you -pass some directory paths as 'git daemon' arguments, the offers are limited to +pass some directory paths as `git daemon` arguments, the offers are limited to repositories within those directories. By default, only `upload-pack` service is enabled, which serves -'git fetch-pack' and 'git ls-remote' clients, which are invoked -from 'git fetch', 'git pull', and 'git clone'. +`git fetch-pack` and `git ls-remote` clients, which are invoked +from `git fetch`, `git pull`, and `git clone`. This is ideally suited for read-only updates, i.e., pulling from Git repositories. -An `upload-archive` also exists to serve 'git archive'. +An `upload-archive` also exists to serve `git archive`. OPTIONS ------- ---strict-paths:: +`--strict-paths`:: Match paths exactly (i.e. don't allow "/foo/repo" when the real path is "/foo/repo.git" or "/foo/repo/.git") and don't do user-relative paths. - 'git daemon' will refuse to start when this option is enabled and no + `git daemon` will refuse to start when this option is enabled and no directory arguments are provided. ---base-path=:: +`--base-path=`:: Remap all the path requests as relative to the given path. - This is sort of "Git root" - if you run 'git daemon' with - '--base-path=/srv/git' on example.com, then if you later try to pull - 'git://example.com/hello.git', 'git daemon' will interpret the path - as `/srv/git/hello.git`. - ---base-path-relaxed:: - If --base-path is enabled and repo lookup fails, with this option - 'git daemon' will attempt to lookup without prefixing the base path. - This is useful for switching to --base-path usage, while still + This is sort of "Git root" - if you run `git daemon` with + `--base-path=/srv/git` on `example.com`, then if you later try + to pull from `git://example.com/hello.git`, `git daemon` will + interpret the path as `/srv/git/hello.git`. + +`--base-path-relaxed`:: + If `--base-path` is enabled and repo lookup fails, with this option + `git daemon` will attempt to lookup without prefixing the base path. + This is useful for switching to `--base-path` usage, while still allowing the old paths. ---interpolated-path=:: +`--interpolated-path=`:: To support virtual hosting, an interpolated path template can be used to dynamically construct alternate paths. The template - supports %H for the target hostname as supplied by the client but - converted to all lowercase, %CH for the canonical hostname, - %IP for the server's IP address, %P for the port number, - and %D for the absolute path of the named repository. + supports `%H` for the target hostname as supplied by the client but + converted to all lowercase, `%CH` for the canonical hostname, + `%IP` for the server's IP address, `%P` for the port number, + and `%D` for the absolute path of the named repository. After interpolation, the path is validated against the directory list. ---export-all:: +`--export-all`:: Allow pulling from all directories that look like Git repositories (have the 'objects' and 'refs' subdirectories), even if they - do not have the 'git-daemon-export-ok' file. + do not have the `git-daemon-export-ok` file. ---inetd:: - Have the server run as an inetd service. Implies --syslog (may be - overridden with `--log-destination=`). - Incompatible with --detach, --port, --listen, --user and --group - options. +`--inetd`:: + Have the server run as an inetd service. Implies `--syslog` (may + be overridden with `--log-destination=`). + Incompatible with `--detach`, `--port`, `--listen`, `--user` and + `--group` options. ---listen=:: +`--listen=`:: Listen on a specific IP address or hostname. IP addresses can be either an IPv4 address or an IPv6 address if supported. If IPv6 - is not supported, then --listen= is also not supported and - --listen must be given an IPv4 address. + is not supported, then `--listen=` is also not supported + and `--listen` must be given an IPv4 address. Can be given more than once. Incompatible with `--inetd` option. ---port=:: +`--port=`:: Listen on an alternative port. Incompatible with `--inetd` option. ---init-timeout=:: +`--init-timeout=`:: Timeout (in seconds) between the moment the connection is established and the client request is received (typically a rather low value, since that should be basically immediate). ---timeout=:: +`--timeout=`:: Timeout (in seconds) for specific client sub-requests. This includes the time it takes for the server to process the sub-request and the time spent waiting for the next client's request. ---max-connections=:: +`--max-connections=`:: Maximum number of concurrent clients, defaults to 32. Set it to zero for no limit. ---syslog:: +`--syslog`:: Short for `--log-destination=syslog`. ---log-destination=:: +`--log-destination=`:: Send log messages to the specified destination. - Note that this option does not imply --verbose, + Note that this option does not imply `--verbose`, thus by default only error conditions will be logged. - The must be one of: + The __ must be one of: + -- -stderr:: +`stderr`:: Write to standard error. Note that if `--detach` is specified, the process disconnects from the real standard error, making this destination effectively equivalent to `none`. -syslog:: +`syslog`:: Write to syslog, using the `git-daemon` identifier. -none:: +`none`:: Disable all logging. -- + The default destination is `syslog` if `--inetd` or `--detach` is specified, otherwise `stderr`. ---user-path:: ---user-path=:: +`--user-path`:: +`--user-path=`:: Allow {tilde}user notation to be used in requests. When specified with no parameter, a request to git://host/{tilde}alice/foo is taken as a request to access @@ -145,23 +145,23 @@ otherwise `stderr`. taken as a request to access `/foo` repository in the home directory of user `alice`. ---verbose:: +`--verbose`:: Log details about the incoming connections and requested files. ---reuseaddr:: - Use SO_REUSEADDR when binding the listening socket. +`--reuseaddr`:: + Use `SO_REUSEADDR` when binding the listening socket. This allows the server to restart without waiting for old connections to time out. ---detach:: - Detach from the shell. Implies --syslog. +`--detach`:: + Detach from the shell. Implies `--syslog`. ---pid-file=:: - Save the process id in 'file'. Ignored when the daemon +`--pid-file=`:: + Save the process id in __. Ignored when the daemon is run under `--inetd`. ---user=:: ---group=:: +`--user=`:: +`--group=`:: Change daemon's uid and gid before entering the service loop. When only `--user` is given without `--group`, the primary group ID for the user is used. The values of @@ -170,43 +170,44 @@ otherwise `stderr`. + Giving these options is an error when used with `--inetd`; use the facility of inet daemon to achieve the same before spawning -'git daemon' if needed. +`git daemon` if needed. + Like many programs that switch user id, the daemon does not reset -environment variables such as `$HOME` when it runs git programs, +environment variables such as `HOME` when it runs git programs, e.g. `upload-pack` and `receive-pack`. When using this option, you may also want to set and export `HOME` to point at the home -directory of `` before starting the daemon, and make sure any -Git configuration files in that directory are readable by ``. +directory of __ before starting the daemon, and make sure any +Git configuration files in that directory are readable by __. ---enable=:: ---disable=:: +`--enable=`:: +`--disable=`:: Enable/disable the service site-wide per default. Note that a service disabled site-wide can still be enabled per repository if it is marked overridable and the repository enables the service with a configuration item. ---allow-override=:: ---forbid-override=:: +`--allow-override=`:: +`--forbid-override=`:: Allow/forbid overriding the site-wide default with per repository configuration. By default, all the services may be overridden. ---[no-]informative-errors:: +`--informative-errors`:: +`--no-informative-errors`:: When informative errors are turned on, git-daemon will report more verbose errors to the client, differentiating conditions like "no such repository" from "repository not exported". This is more convenient for clients, but may leak information about the existence of unexported repositories. When informative errors are not enabled, all errors report "access denied" to the - client. The default is --no-informative-errors. + client. The default is `--no-informative-errors`. ---access-hook=:: +`--access-hook=`:: Every time a client connects, first run an external command specified by the with service name (e.g. "upload-pack"), - path to the repository, hostname (%H), canonical hostname - (%CH), IP address (%IP), and TCP port (%P) as its command-line + path to the repository, hostname (`%H`), canonical hostname + (`%CH`), IP address (`%IP`), and TCP port (`%P`) as its command-line arguments. The external command can decide to decline the service by exiting with a non-zero status (or to allow it by exiting with a zero status). It can also look at the $REMOTE_ADDR @@ -217,7 +218,7 @@ The external command can optionally write a single line to its standard output to be sent to the requestor as an error message when it declines the service. -:: +__:: The remaining arguments provide a list of directories. If any directories are specified, then the `git-daemon` process will serve a requested directory only if it is contained in one of @@ -229,24 +230,24 @@ SERVICES These services can be globally enabled/disabled using the command-line options of this command. If finer-grained -control is desired (e.g. to allow 'git archive' to be run +control is desired (e.g. to allow `git archive` to be run against only in a few selected repositories the daemon serves), the per-repository configuration file can be used to enable or disable them. upload-pack:: - This serves 'git fetch-pack' and 'git ls-remote' + This serves `git fetch-pack` and `git ls-remote` clients. It is enabled by default, but a repository can disable it by setting `daemon.uploadpack` configuration item to `false`. upload-archive:: - This serves 'git archive --remote'. It is disabled by + This serves `git archive --remote`. It is disabled by default, but a repository can enable it by setting `daemon.uploadarch` configuration item to `true`. receive-pack:: - This serves 'git send-pack' clients, allowing anonymous + This serves `git send-pack` clients, allowing anonymous push. It is disabled by default, as there is _no_ authentication in the protocol (in other words, anybody can push anything into the repository, including removal @@ -300,7 +301,7 @@ default repository could be made as well. 'git daemon' as regular daemon for virtual hosts:: - To set up 'git daemon' as a regular, non-inetd service that + To set up `git daemon` as a regular, non-inetd service that handles repositories for multiple virtual hosts based on their IP addresses, start the daemon like this: + @@ -317,7 +318,7 @@ Repositories can still be accessed by hostname though, assuming they correspond to these IP addresses. selectively enable/disable services per repository:: - To enable 'git archive --remote' and disable 'git fetch' against + To enable `git archive --remote` and disable `git fetch` against a repository, have the following in the configuration file in the repository (that is the file 'config' next to `HEAD`, 'refs' and 'objects'). @@ -331,8 +332,8 @@ selectively enable/disable services per repository:: ENVIRONMENT ----------- -'git daemon' will set REMOTE_ADDR to the IP address of the client -that connected to it, if the IP address is available. REMOTE_ADDR will +`git daemon` will set `REMOTE_ADDR` to the IP address of the client +that connected to it, if the IP address is available. `REMOTE_ADDR` will be available in the environment of hooks called when services are performed. From 914c549ac161c3393dd760be5af4d290620a27e8 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sat, 10 May 2025 14:33:15 +0200 Subject: [PATCH 02/52] git-{var,write-tree} docs: update mark-up of synopsis option descriptions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To unify mark-up used in our documentation to a newer convention, started by 22293895 (doc: apply synopsis simplification on git-clone and git-init, 2024-09-24), update the documentation for 'git var' and 'git write-tree' to * use [synopsis], not [verse] in the SYNOPSIS section * enclose `--option=` in backquotes Signed-off-by: Junio C Hamano Helped-by: Jean-Noël Avila Signed-off-by: Junio C Hamano --- Documentation/git-var.adoc | 6 +++--- Documentation/git-write-tree.adoc | 18 +++++++++--------- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/Documentation/git-var.adoc b/Documentation/git-var.adoc index 0680568dfda732..909963b1c28593 100644 --- a/Documentation/git-var.adoc +++ b/Documentation/git-var.adoc @@ -8,8 +8,8 @@ git-var - Show a Git logical variable SYNOPSIS -------- -[verse] -'git var' (-l | ) +[synopsis] +git var (-l | ) DESCRIPTION ----------- @@ -18,7 +18,7 @@ no value. OPTIONS ------- --l:: +`-l`:: Display the logical variables. In addition, all the variables of the Git configuration file .git/config are listed as well. (However, the configuration variables listing functionality diff --git a/Documentation/git-write-tree.adoc b/Documentation/git-write-tree.adoc index f22041a9dc3965..4c7100ea1e3aba 100644 --- a/Documentation/git-write-tree.adoc +++ b/Documentation/git-write-tree.adoc @@ -8,8 +8,8 @@ git-write-tree - Create a tree object from the current index SYNOPSIS -------- -[verse] -'git write-tree' [--missing-ok] [--prefix=/] +[synopsis] +git write-tree [--missing-ok] [--prefix=/] DESCRIPTION ----------- @@ -18,23 +18,23 @@ tree object is printed to standard output. The index must be in a fully merged state. -Conceptually, 'git write-tree' sync()s the current index contents +Conceptually, `git write-tree` sync()s the current index contents into a set of tree files. In order to have that match what is actually in your directory right -now, you need to have done a 'git update-index' phase before you did the -'git write-tree'. +now, you need to have done a `git update-index` phase before you did the +`git write-tree`. OPTIONS ------- ---missing-ok:: - Normally 'git write-tree' ensures that the objects referenced by the +`--missing-ok`:: + Normally `git write-tree` ensures that the objects referenced by the directory exist in the object database. This option disables this check. ---prefix=/:: +`--prefix=/`:: Writes a tree object that represents a subdirectory - ``. This can be used to write the tree object + __. This can be used to write the tree object for a subproject that is in the named subdirectory. GIT From 7e7f47a48853a520db606bcd2269bb17cba09744 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sat, 10 May 2025 14:33:16 +0200 Subject: [PATCH 03/52] git-verify-* doc: update mark-up of synopsis option descriptions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To unify mark-up used in our documentation to a newer convention, started by 22293895 (doc: apply synopsis simplification on git-clone and git-init, 2024-09-24), update the documentation pages for 'git verify-commit', 'git verify-tag', and 'git verify-pack' to * use [synopsis], not [verse] in the SYNOPSIS section * enclose `--option=` in backquotes * do not describe non-option arguments in the OPTIONS section Signed-off-by: Junio C Hamano Helped-by: Jean-Noël Avila Signed-off-by: Junio C Hamano --- Documentation/git-verify-commit.adoc | 16 +++++++--------- Documentation/git-verify-pack.adoc | 28 ++++++++++++---------------- Documentation/git-verify-tag.adoc | 16 +++++++--------- 3 files changed, 26 insertions(+), 34 deletions(-) diff --git a/Documentation/git-verify-commit.adoc b/Documentation/git-verify-commit.adoc index aee4c40eac4666..ff5b8b97ef37dd 100644 --- a/Documentation/git-verify-commit.adoc +++ b/Documentation/git-verify-commit.adoc @@ -7,26 +7,24 @@ git-verify-commit - Check the GPG signature of commits SYNOPSIS -------- -[verse] -'git verify-commit' [-v | --verbose] [--raw] ... +[synopsis] +git verify-commit [-v | --verbose] [--raw] ... DESCRIPTION ----------- -Validates the GPG signature created by 'git commit -S'. +Validates the GPG signature created by `git commit -S` +on the commit objects given on the command line. OPTIONS ------- ---raw:: +`--raw`:: Print the raw gpg status output to standard error instead of the normal human-readable output. --v:: ---verbose:: +`-v`:: +`--verbose`:: Print the contents of the commit object before validating it. -...:: - SHA-1 identifiers of Git commit objects. - GIT --- Part of the linkgit:git[1] suite diff --git a/Documentation/git-verify-pack.adoc b/Documentation/git-verify-pack.adoc index d7e886918aa7af..b0462d8db3935f 100644 --- a/Documentation/git-verify-pack.adoc +++ b/Documentation/git-verify-pack.adoc @@ -8,43 +8,39 @@ git-verify-pack - Validate packed Git archive files SYNOPSIS -------- -[verse] -'git verify-pack' [-v | --verbose] [-s | --stat-only] [--] .idx... +[synopsis] +git verify-pack [-v | --verbose] [-s | --stat-only] [--] .idx... DESCRIPTION ----------- -Reads given idx file for packed Git archive created with the -'git pack-objects' command and verifies the idx file and the -corresponding pack file. +Read each idx file for packed Git archive given on the command line, +and verify the idx file and the corresponding pack file. OPTIONS ------- -.idx ...:: - The idx files to verify. - --v:: ---verbose:: +`-v`:: +`--verbose`:: After verifying the pack, show the list of objects contained in the pack and a histogram of delta chain length. --s:: ---stat-only:: +`-s`:: +`--stat-only`:: Do not verify the pack contents; only show the histogram of delta chain length. With `--verbose`, the list of objects is also shown. -\--:: +`--`:: Do not interpret any more arguments as options. OUTPUT FORMAT ------------- -When specifying the -v option the format used is: +When specifying the `-v` option the format used is: - SHA-1 type size size-in-packfile offset-in-packfile + object-name type size size-in-packfile offset-in-packfile for objects that are not deltified in the pack, and - SHA-1 type size size-in-packfile offset-in-packfile depth base-SHA-1 + object-name type size size-in-packfile offset-in-packfile depth base-object-name for objects that are deltified. diff --git a/Documentation/git-verify-tag.adoc b/Documentation/git-verify-tag.adoc index 81d50ecc4c6879..b3721a86f49e31 100644 --- a/Documentation/git-verify-tag.adoc +++ b/Documentation/git-verify-tag.adoc @@ -7,26 +7,24 @@ git-verify-tag - Check the GPG signature of tags SYNOPSIS -------- -[verse] -'git verify-tag' [-v | --verbose] [--format=] [--raw] ... +[synopsis] +git verify-tag [-v | --verbose] [--format=] [--raw] ... DESCRIPTION ----------- -Validates the gpg signature created by 'git tag'. +Validates the gpg signature created by `git tag` in the tag +objects listed on the command line. OPTIONS ------- ---raw:: +`--raw`:: Print the raw gpg status output to standard error instead of the normal human-readable output. --v:: ---verbose:: +`-v`:: +`--verbose`:: Print the contents of the tag object before validating it. -...:: - SHA-1 identifiers of Git tag objects. - GIT --- Part of the linkgit:git[1] suite From 20e4e9ad0b72be0a4ccf9300f51c383c03beec97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jean-No=C3=ABl=20Avila?= Date: Sat, 10 May 2025 14:33:17 +0200 Subject: [PATCH 04/52] git-var doc: fix usage of $ENV_VAR vs ENV_VAR MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When refering to environment variables in the documentation, use the ENV_VARIABLE format instead of $ENV_VARIABLE. The latter is used in the documentation to refer to the actual value of the variable, not the name of the variable. Signed-off-by: Jean-Noël Avila Signed-off-by: Junio C Hamano --- Documentation/git-var.adoc | 40 ++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/Documentation/git-var.adoc b/Documentation/git-var.adoc index 909963b1c28593..b606c2d649979f 100644 --- a/Documentation/git-var.adoc +++ b/Documentation/git-var.adoc @@ -32,58 +32,56 @@ EXAMPLES VARIABLES --------- -GIT_AUTHOR_IDENT:: +`GIT_AUTHOR_IDENT`:: The author of a piece of code. -GIT_COMMITTER_IDENT:: +`GIT_COMMITTER_IDENT`:: The person who put a piece of code into Git. -GIT_EDITOR:: +`GIT_EDITOR`:: Text editor for use by Git commands. The value is meant to be interpreted by the shell when it is used. Examples: `~/bin/vi`, `$SOME_ENVIRONMENT_VARIABLE`, `"C:\Program Files\Vim\gvim.exe" - --nofork`. The order of preference is the `$GIT_EDITOR` - environment variable, then `core.editor` configuration, then - `$VISUAL`, then `$EDITOR`, and then the default chosen at compile + --nofork`. The order of preference is `$GIT_EDITOR`, then + `core.editor` configuration value, then `$VISUAL`, then + `$EDITOR`, and then the default chosen at compile time, which is usually 'vi'. ifdef::git-default-editor[] The build you are using chose '{git-default-editor}' as the default. endif::git-default-editor[] -GIT_SEQUENCE_EDITOR:: +`GIT_SEQUENCE_EDITOR`:: Text editor used to edit the 'todo' file while running `git rebase -i`. Like `GIT_EDITOR`, the value is meant to be interpreted by - the shell when it is used. The order of preference is the - `$GIT_SEQUENCE_EDITOR` environment variable, then - `sequence.editor` configuration, and then the value of `git var - GIT_EDITOR`. + the shell when it is used. The order of preference is + `$GIT_SEQUENCE_EDITOR`, then `sequence.editor` configuration value, + and then the value of `git var GIT_EDITOR`. -GIT_PAGER:: +`GIT_PAGER`:: Text viewer for use by Git commands (e.g., 'less'). The value is meant to be interpreted by the shell. The order of preference - is the `$GIT_PAGER` environment variable, then `core.pager` - configuration, then `$PAGER`, and then the default chosen at - compile time (usually 'less'). + is `$GIT_PAGER`, then the value of `core.pager` configuration, then + `$PAGER`, and then the default chosen at compile time (usually `less`). ifdef::git-default-pager[] The build you are using chose '{git-default-pager}' as the default. endif::git-default-pager[] -GIT_DEFAULT_BRANCH:: +`GIT_DEFAULT_BRANCH`:: The name of the first branch created in newly initialized repositories. -GIT_SHELL_PATH:: +`GIT_SHELL_PATH`:: The path of the binary providing the POSIX shell for commands which use the shell. -GIT_ATTR_SYSTEM:: +`GIT_ATTR_SYSTEM`:: The path to the system linkgit:gitattributes[5] file, if one is enabled. -GIT_ATTR_GLOBAL:: +`GIT_ATTR_GLOBAL`:: The path to the global (per-user) linkgit:gitattributes[5] file. -GIT_CONFIG_SYSTEM:: +`GIT_CONFIG_SYSTEM`:: The path to the system configuration file, if one is enabled. -GIT_CONFIG_GLOBAL:: +`GIT_CONFIG_GLOBAL`:: The path to the global (per-user) configuration files, if any. Most path values contain only one value. However, some can contain multiple From 784ceccb91b82dc8a2c69ddd6f1f5ccc2e2f96f2 Mon Sep 17 00:00:00 2001 From: shejialuo Date: Wed, 14 May 2025 23:50:26 +0800 Subject: [PATCH 05/52] packed-backend: fsck should warn when "packed-refs" file is empty We assume the "packed-refs" won't be empty and instead has at least one line in it (even when there are no refs packed, there is the file header line). Because there is no terminating LF in the empty file, we will report "packedRefEntryNotTerminated(ERROR)" to the user. However, the runtime code paths would accept an empty "packed-refs" file, for example, "create_snapshot" would simply return the "snapshot" without checking the content of "packed-refs". So, we should skip checking the content of "packed-refs" when it is empty during fsck. After 694b7a1999 (repack_without_ref(): write peeled refs in the rewritten file, 2013-04-22), we would always write a header into the "packed-refs" file. So, versions of Git that are not too ancient never write such an empty "packed-refs" file. As an empty file often indicates a sign of a filesystem-level issue, the way we want to resolve this inconsistency is not make everybody totally silent but notice and report the anomaly. Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report to the users that "packed-refs" is empty. Signed-off-by: shejialuo Signed-off-by: Junio C Hamano --- Documentation/fsck-msgids.adoc | 6 ++++++ fsck.h | 1 + refs/packed-backend.c | 9 +++++++++ t/t0602-reffiles-fsck.sh | 17 +++++++++++++++++ 4 files changed, 33 insertions(+) diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc index 9601fff22854b6..0ba4f9a27e4c73 100644 --- a/Documentation/fsck-msgids.adoc +++ b/Documentation/fsck-msgids.adoc @@ -59,6 +59,12 @@ `emptyName`:: (WARN) A path contains an empty name. +`emptyPackedRefsFile`:: + (INFO) "packed-refs" file is empty. Report to the + git@vger.kernel.org mailing list if you see this error. As only + very early versions of Git would create such an empty + "packed_refs" file, we might tighten this rule in the future. + `extraHeaderEntry`:: (IGNORE) Extra headers found after `tagger`. diff --git a/fsck.h b/fsck.h index b1deae61eed7b1..0c5869ac34e216 100644 --- a/fsck.h +++ b/fsck.h @@ -84,6 +84,7 @@ enum fsck_msg_type { FUNC(LARGE_PATHNAME, WARN) \ /* infos (reported as warnings, but ignored by default) */ \ FUNC(BAD_FILEMODE, INFO) \ + FUNC(EMPTY_PACKED_REFS_FILE, INFO) \ FUNC(GITMODULES_PARSE, INFO) \ FUNC(GITIGNORE_SYMLINK, INFO) \ FUNC(GITATTRIBUTES_SYMLINK, INFO) \ diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 3ad1ed0787aada..fb91833e76d9c9 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -2103,6 +2103,15 @@ static int packed_fsck(struct ref_store *ref_store, goto cleanup; } + if (!st.st_size) { + struct fsck_ref_report report = { 0 }; + report.path = "packed-refs"; + ret = fsck_report_ref(o, &report, + FSCK_MSG_EMPTY_PACKED_REFS_FILE, + "file is empty"); + goto cleanup; + } + if (strbuf_read(&packed_ref_content, fd, 0) < 0) { ret = error_errno(_("unable to read '%s'"), refs->path); goto cleanup; diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index 9d1dc2144c4b72..f671ac4d3aba1a 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -647,6 +647,23 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' ' ) ' +test_expect_success 'empty packed-refs should be reported' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + test_commit default && + + >.git/packed-refs && + git refs verify 2>err && + cat >expect <<-EOF && + warning: packed-refs: emptyPackedRefsFile: file is empty + EOF + rm .git/packed-refs && + test_cmp expect err + ) +' + test_expect_success 'packed-refs header should be checked' ' test_when_finished "rm -rf repo" && git init repo && From a0dee3f74b4f42076b7c23ca6d9aca61ed064e82 Mon Sep 17 00:00:00 2001 From: shejialuo Date: Wed, 14 May 2025 23:50:35 +0800 Subject: [PATCH 06/52] packed-backend: extract snapshot allocation in `load_contents` "load_contents" would choose which way to load the content of the "packed-refs". However, we cannot directly use this function when checking the consistency due to we don't want to open the file. And we also need to reuse the logic to avoid causing repetition. Let's create a new helper function "allocate_snapshot_buffer" to extract the snapshot allocation logic in "load_contents" and update the "load_contents" to align with the behavior. Suggested-by: Jeff King Suggested-by: Patrick Steinhardt Signed-off-by: shejialuo Signed-off-by: Junio C Hamano --- refs/packed-backend.c | 53 +++++++++++++++++++++++++------------------ 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index fb91833e76d9c9..1da44a3d6d789d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -517,6 +517,32 @@ static int refname_contains_nul(struct strbuf *refname) #define SMALL_FILE_SIZE (32*1024) +static int allocate_snapshot_buffer(struct snapshot *snapshot, int fd, struct stat *st) +{ + ssize_t bytes_read; + size_t size; + + size = xsize_t(st->st_size); + if (!size) + return 0; + + if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) { + snapshot->buf = xmalloc(size); + bytes_read = read_in_full(fd, snapshot->buf, size); + if (bytes_read < 0 || bytes_read != size) + die_errno("couldn't read %s", snapshot->refs->path); + snapshot->mmapped = 0; + } else { + snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + snapshot->mmapped = 1; + } + + snapshot->start = snapshot->buf; + snapshot->eof = snapshot->buf + size; + + return 1; +} + /* * Depending on `mmap_strategy`, either mmap or read the contents of * the `packed-refs` file into the snapshot. Return 1 if the file @@ -525,10 +551,9 @@ static int refname_contains_nul(struct strbuf *refname) */ static int load_contents(struct snapshot *snapshot) { - int fd; struct stat st; - size_t size; - ssize_t bytes_read; + int ret; + int fd; fd = open(snapshot->refs->path, O_RDONLY); if (fd < 0) { @@ -550,27 +575,11 @@ static int load_contents(struct snapshot *snapshot) if (fstat(fd, &st) < 0) die_errno("couldn't stat %s", snapshot->refs->path); - size = xsize_t(st.st_size); - - if (!size) { - close(fd); - return 0; - } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) { - snapshot->buf = xmalloc(size); - bytes_read = read_in_full(fd, snapshot->buf, size); - if (bytes_read < 0 || bytes_read != size) - die_errno("couldn't read %s", snapshot->refs->path); - snapshot->mmapped = 0; - } else { - snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); - snapshot->mmapped = 1; - } - close(fd); - snapshot->start = snapshot->buf; - snapshot->eof = snapshot->buf + size; + ret = allocate_snapshot_buffer(snapshot, fd, &st); - return 1; + close(fd); + return ret; } static const char *find_reference_location_1(struct snapshot *snapshot, From 86ddd588f24acf3960489dccb8aed82dc570796b Mon Sep 17 00:00:00 2001 From: shejialuo Date: Wed, 14 May 2025 23:50:42 +0800 Subject: [PATCH 07/52] packed-backend: mmap large "packed-refs" file during fsck During fsck, we use "strbuf_read" to read the content of "packed-refs" without using mmap mechanism. This is a bad practice which would consume more memory than using mmap mechanism. Besides, as all code paths in "packed-backend.c" use this way, we should make "fsck" align with the current codebase. As we have introduced the helper function "allocate_snapshot_buffer", we can simply use this function to use mmap mechanism. Suggested-by: Jeff King Suggested-by: Patrick Steinhardt Signed-off-by: shejialuo Signed-off-by: Junio C Hamano --- refs/packed-backend.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 1da44a3d6d789d..7fd73a0e6da3b5 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -2068,7 +2068,7 @@ static int packed_fsck(struct ref_store *ref_store, { struct packed_ref_store *refs = packed_downcast(ref_store, REF_STORE_READ, "fsck"); - struct strbuf packed_ref_content = STRBUF_INIT; + struct snapshot snapshot = { 0 }; unsigned int sorted = 0; struct stat st; int ret = 0; @@ -2112,7 +2112,7 @@ static int packed_fsck(struct ref_store *ref_store, goto cleanup; } - if (!st.st_size) { + if (!allocate_snapshot_buffer(&snapshot, fd, &st)) { struct fsck_ref_report report = { 0 }; report.path = "packed-refs"; ret = fsck_report_ref(o, &report, @@ -2121,21 +2121,16 @@ static int packed_fsck(struct ref_store *ref_store, goto cleanup; } - if (strbuf_read(&packed_ref_content, fd, 0) < 0) { - ret = error_errno(_("unable to read '%s'"), refs->path); - goto cleanup; - } - - ret = packed_fsck_ref_content(o, ref_store, &sorted, packed_ref_content.buf, - packed_ref_content.buf + packed_ref_content.len); + ret = packed_fsck_ref_content(o, ref_store, &sorted, snapshot.start, + snapshot.eof); if (!ret && sorted) - ret = packed_fsck_ref_sorted(o, ref_store, packed_ref_content.buf, - packed_ref_content.buf + packed_ref_content.len); + ret = packed_fsck_ref_sorted(o, ref_store, snapshot.start, + snapshot.eof); cleanup: if (fd >= 0) close(fd); - strbuf_release(&packed_ref_content); + clear_snapshot_buffer(&snapshot); return ret; } From 131a8fa8151c95f309241ead33018f30f57ff57c Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:39 +0000 Subject: [PATCH 08/52] commit: simplify code The difference of two unsigned integers is defined to be unsigned, and therefore it is misleading to check whether it is greater than zero (instead, the more natural way would be to check whether the difference is zero or not). Let's instead avoid the subtraction altogether, and compare the two operands directly, which makes the code more obvious as a side effect. Pointed out by CodeQL's rule with the ID `cpp/unsigned-difference-expression-compared-zero`. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- builtin/commit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/commit.c b/builtin/commit.c index 66bd91fd523dd7..fba0dded64a718 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix, for (i = 0; i < the_repository->index->cache_nr; i++) if (ce_intent_to_add(the_repository->index->cache[i])) ita_nr++; - committable = the_repository->index->cache_nr - ita_nr > 0; + committable = the_repository->index->cache_nr > ita_nr; } else { /* * Unless the user did explicitly request a submodule From c607410ada02fce5ee2366b68543736176101295 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:40 +0000 Subject: [PATCH 09/52] fetch: carefully clear local variable's address after use As pointed out by CodeQL, it is a potentially dangerous practice to store local variables' addresses in non-local structs. Yet this is exactly what happens with the `acked_commits` attribute that is used in `cmd_fetch()`: The pointer to a local variable is assigned to it. Now, it is Git's convention that `cmd_*()` functions are essentially only returning just before exiting the process, therefore there is little danger that this attribute is used after the code flow returns from that function. However, code in `cmd_*()` function is often so useful that it gets lifted into a library function, at which point this issue could become a real problem. Let's make sure to clear the `acked_commits` attribute out after it was used, and before the function returns (at which point the address would go stale). Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- builtin/fetch.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/fetch.c b/builtin/fetch.c index cda6eaf1fd6edc..c1a1434c709625 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc, if (server_options.nr) gtransport->server_options = &server_options; result = transport_fetch_refs(gtransport, NULL); + gtransport->smart_options->acked_commits = NULL; oidset_iter_init(&acked_commits, &iter); while ((oid = oidset_iter_next(&iter))) From 7f3ed75ff551e2ca4f8eb0242784e7aacbb14fb3 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:41 +0000 Subject: [PATCH 10/52] commit-graph: avoid malloc'ing a local variable We do need a context to write the commit graph, but that context is only needed during the life time of `commit_graph_write()`, therefore it can easily be a stack variable. This also helps CodeQL recognize that it is safe to assign the address of other local variables to the context's fields. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- commit-graph.c | 141 ++++++++++++++++++++++++------------------------- 1 file changed, 69 insertions(+), 72 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 6394752b0b0868..9f0115dac9b528 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb, const struct commit_graph_opts *opts) { struct repository *r = the_repository; - struct write_commit_graph_context *ctx; + struct write_commit_graph_context ctx = { + .r = r, + .odb = odb, + .append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0, + .report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0, + .split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0, + .opts = opts, + .total_bloom_filter_data_size = 0, + .write_generation_data = (get_configured_generation_version(r) == 2), + .num_generation_data_overflows = 0, + }; uint32_t i; int res = 0; int replace = 0; @@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb, return 0; } - CALLOC_ARRAY(ctx, 1); - ctx->r = r; - ctx->odb = odb; - ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; - ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; - ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->opts = opts; - ctx->total_bloom_filter_data_size = 0; - ctx->write_generation_data = (get_configured_generation_version(r) == 2); - ctx->num_generation_data_overflows = 0; - bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); @@ -2549,14 +2548,14 @@ int write_commit_graph(struct object_directory *odb, bloom_settings.num_hashes); bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", bloom_settings.max_changed_paths); - ctx->bloom_settings = &bloom_settings; + ctx.bloom_settings = &bloom_settings; init_topo_level_slab(&topo_levels); - ctx->topo_levels = &topo_levels; + ctx.topo_levels = &topo_levels; - prepare_commit_graph(ctx->r); - if (ctx->r->objects->commit_graph) { - struct commit_graph *g = ctx->r->objects->commit_graph; + prepare_commit_graph(ctx.r); + if (ctx.r->objects->commit_graph) { + struct commit_graph *g = ctx.r->objects->commit_graph; while (g) { g->topo_levels = &topo_levels; @@ -2565,15 +2564,15 @@ int write_commit_graph(struct object_directory *odb, } if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) - ctx->changed_paths = 1; + ctx.changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { struct commit_graph *g; - g = ctx->r->objects->commit_graph; + g = ctx.r->objects->commit_graph; /* We have changed-paths already. Keep them in the next graph */ if (g && g->bloom_filter_settings) { - ctx->changed_paths = 1; + ctx.changed_paths = 1; /* don't propagate the hash_version unless unspecified */ if (bloom_settings.hash_version == -1) @@ -2586,116 +2585,114 @@ int write_commit_graph(struct object_directory *odb, bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1; - if (ctx->split) { - struct commit_graph *g = ctx->r->objects->commit_graph; + if (ctx.split) { + struct commit_graph *g = ctx.r->objects->commit_graph; while (g) { - ctx->num_commit_graphs_before++; + ctx.num_commit_graphs_before++; g = g->base_graph; } - if (ctx->num_commit_graphs_before) { - ALLOC_ARRAY(ctx->commit_graph_filenames_before, ctx->num_commit_graphs_before); - i = ctx->num_commit_graphs_before; - g = ctx->r->objects->commit_graph; + if (ctx.num_commit_graphs_before) { + ALLOC_ARRAY(ctx.commit_graph_filenames_before, ctx.num_commit_graphs_before); + i = ctx.num_commit_graphs_before; + g = ctx.r->objects->commit_graph; while (g) { - ctx->commit_graph_filenames_before[--i] = xstrdup(g->filename); + ctx.commit_graph_filenames_before[--i] = xstrdup(g->filename); g = g->base_graph; } } - if (ctx->opts) - replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx.opts) + replace = ctx.opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; } - ctx->approx_nr_objects = repo_approximate_object_count(the_repository); + ctx.approx_nr_objects = repo_approximate_object_count(the_repository); - if (ctx->append && ctx->r->objects->commit_graph) { - struct commit_graph *g = ctx->r->objects->commit_graph; + if (ctx.append && ctx.r->objects->commit_graph) { + struct commit_graph *g = ctx.r->objects->commit_graph; for (i = 0; i < g->num_commits; i++) { struct object_id oid; oidread(&oid, g->chunk_oid_lookup + st_mult(g->hash_len, i), the_repository->hash_algo); - oid_array_append(&ctx->oids, &oid); + oid_array_append(&ctx.oids, &oid); } } if (pack_indexes) { - ctx->order_by_pack = 1; - if ((res = fill_oids_from_packs(ctx, pack_indexes))) + ctx.order_by_pack = 1; + if ((res = fill_oids_from_packs(&ctx, pack_indexes))) goto cleanup; } if (commits) { - if ((res = fill_oids_from_commits(ctx, commits))) + if ((res = fill_oids_from_commits(&ctx, commits))) goto cleanup; } if (!pack_indexes && !commits) { - ctx->order_by_pack = 1; - fill_oids_from_all_packs(ctx); + ctx.order_by_pack = 1; + fill_oids_from_all_packs(&ctx); } - close_reachable(ctx); + close_reachable(&ctx); - copy_oids_to_commits(ctx); + copy_oids_to_commits(&ctx); - if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) { + if (ctx.commits.nr >= GRAPH_EDGE_LAST_MASK) { error(_("too many commits to write graph")); res = -1; goto cleanup; } - if (!ctx->commits.nr && !replace) + if (!ctx.commits.nr && !replace) goto cleanup; - if (ctx->split) { - split_graph_merge_strategy(ctx); + if (ctx.split) { + split_graph_merge_strategy(&ctx); if (!replace) - merge_commit_graphs(ctx); + merge_commit_graphs(&ctx); } else - ctx->num_commit_graphs_after = 1; + ctx.num_commit_graphs_after = 1; - ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph); + ctx.trust_generation_numbers = validate_mixed_generation_chain(ctx.r->objects->commit_graph); - compute_topological_levels(ctx); - if (ctx->write_generation_data) - compute_generation_numbers(ctx); + compute_topological_levels(&ctx); + if (ctx.write_generation_data) + compute_generation_numbers(&ctx); - if (ctx->changed_paths) - compute_bloom_filters(ctx); + if (ctx.changed_paths) + compute_bloom_filters(&ctx); - res = write_commit_graph_file(ctx); + res = write_commit_graph_file(&ctx); - if (ctx->changed_paths) + if (ctx.changed_paths) deinit_bloom_filters(); - if (ctx->split) - mark_commit_graphs(ctx); + if (ctx.split) + mark_commit_graphs(&ctx); - expire_commit_graphs(ctx); + expire_commit_graphs(&ctx); cleanup: - free(ctx->graph_name); - free(ctx->base_graph_name); - free(ctx->commits.list); - oid_array_clear(&ctx->oids); + free(ctx.graph_name); + free(ctx.base_graph_name); + free(ctx.commits.list); + oid_array_clear(&ctx.oids); clear_topo_level_slab(&topo_levels); - for (i = 0; i < ctx->num_commit_graphs_before; i++) - free(ctx->commit_graph_filenames_before[i]); - free(ctx->commit_graph_filenames_before); + for (i = 0; i < ctx.num_commit_graphs_before; i++) + free(ctx.commit_graph_filenames_before[i]); + free(ctx.commit_graph_filenames_before); - for (i = 0; i < ctx->num_commit_graphs_after; i++) { - free(ctx->commit_graph_filenames_after[i]); - free(ctx->commit_graph_hash_after[i]); + for (i = 0; i < ctx.num_commit_graphs_after; i++) { + free(ctx.commit_graph_filenames_after[i]); + free(ctx.commit_graph_hash_after[i]); } - free(ctx->commit_graph_filenames_after); - free(ctx->commit_graph_hash_after); - - free(ctx); + free(ctx.commit_graph_filenames_after); + free(ctx.commit_graph_hash_after); return res; } From bf0468e2ba64ac358a61cb01a675b7c5919d64fd Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:42 +0000 Subject: [PATCH 11/52] upload-pack: rename `enum` to reflect the operation While 3145ea957d (upload-pack: introduce fetch server command, 2018-03-15) added support for the `fetch` command, from the server's point of view it is an upload, and hence the `enum` should really be called `upload_state` instead of `fetch_state`. Likewise, rename its values. This also helps unconfuse CodeQL which would otherwise be at sixes or sevens about having _two_ non-local definitions of the same `enum` with the same values. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- upload-pack.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/upload-pack.c b/upload-pack.c index 956da5b061a0e5..26f29b85b551c1 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -1780,16 +1780,16 @@ static void send_shallow_info(struct upload_pack_data *data) packet_delim(1); } -enum fetch_state { - FETCH_PROCESS_ARGS = 0, - FETCH_SEND_ACKS, - FETCH_SEND_PACK, - FETCH_DONE, +enum upload_state { + UPLOAD_PROCESS_ARGS = 0, + UPLOAD_SEND_ACKS, + UPLOAD_SEND_PACK, + UPLOAD_DONE, }; int upload_pack_v2(struct repository *r, struct packet_reader *request) { - enum fetch_state state = FETCH_PROCESS_ARGS; + enum upload_state state = UPLOAD_PROCESS_ARGS; struct upload_pack_data data; clear_object_flags(the_repository, ALL_FLAGS); @@ -1798,9 +1798,9 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request) data.use_sideband = LARGE_PACKET_MAX; get_upload_pack_config(r, &data); - while (state != FETCH_DONE) { + while (state != UPLOAD_DONE) { switch (state) { - case FETCH_PROCESS_ARGS: + case UPLOAD_PROCESS_ARGS: process_args(request, &data); if (!data.want_obj.nr && !data.wait_for_done) { @@ -1811,27 +1811,27 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request) * to just send 'have's without 'want's); guess * they didn't want anything. */ - state = FETCH_DONE; + state = UPLOAD_DONE; } else if (data.seen_haves) { /* * Request had 'have' lines, so lets ACK them. */ - state = FETCH_SEND_ACKS; + state = UPLOAD_SEND_ACKS; } else { /* * Request had 'want's but no 'have's so we can * immediately go to construct and send a pack. */ - state = FETCH_SEND_PACK; + state = UPLOAD_SEND_PACK; } break; - case FETCH_SEND_ACKS: + case UPLOAD_SEND_ACKS: if (process_haves_and_send_acks(&data)) - state = FETCH_SEND_PACK; + state = UPLOAD_SEND_PACK; else - state = FETCH_DONE; + state = UPLOAD_DONE; break; - case FETCH_SEND_PACK: + case UPLOAD_SEND_PACK: send_wanted_ref_info(&data); send_shallow_info(&data); @@ -1841,9 +1841,9 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request) packet_writer_write(&data.writer, "packfile\n"); create_pack_file(&data, NULL); } - state = FETCH_DONE; + state = UPLOAD_DONE; break; - case FETCH_DONE: + case UPLOAD_DONE: continue; } } From 655268452cafd061c6c38541a719b6f5b9d528e3 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:43 +0000 Subject: [PATCH 12/52] has_dir_name(): make code more obvious One thing that might be non-obvious to readers (or to analyzers like CodeQL) is that the function essentially does nothing when the Git index is empty, and in particular that it does not look at the value of `len_eq_last` (which would be uninitialized at that point). Let's make this much easier to understand, by returning early if the Git index is empty, and by avoiding empty `else` blocks. This commit changes indentation and is hence best viewed using `--ignore-space-change`. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- read-cache.c | 55 +++++++++++++--------------------------------------- 1 file changed, 13 insertions(+), 42 deletions(-) diff --git a/read-cache.c b/read-cache.c index 73f83a7e7a113e..c0bb760ad473ef 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1117,48 +1117,19 @@ static int has_dir_name(struct index_state *istate, * * Compare the entry's full path with the last path in the index. */ - if (istate->cache_nr > 0) { - cmp_last = strcmp_offset(name, - istate->cache[istate->cache_nr - 1]->name, - &len_eq_last); - if (cmp_last > 0) { - if (name[len_eq_last] != '/') { - /* - * The entry sorts AFTER the last one in the - * index. - * - * If there were a conflict with "file", then our - * name would start with "file/" and the last index - * entry would start with "file" but not "file/". - * - * The next character after common prefix is - * not '/', so there can be no conflict. - */ - return retval; - } else { - /* - * The entry sorts AFTER the last one in the - * index, and the next character after common - * prefix is '/'. - * - * Either the last index entry is a file in - * conflict with this entry, or it has a name - * which sorts between this entry and the - * potential conflicting file. - * - * In both cases, we fall through to the loop - * below and let the regular search code handle it. - */ - } - } else if (cmp_last == 0) { - /* - * The entry exactly matches the last one in the - * index, but because of multiple stage and CE_REMOVE - * items, we fall through and let the regular search - * code handle it. - */ - } - } + if (!istate->cache_nr) + return 0; + + cmp_last = strcmp_offset(name, + istate->cache[istate->cache_nr - 1]->name, + &len_eq_last); + if (cmp_last > 0 && name[len_eq_last] != '/') + /* + * The entry sorts AFTER the last one in the + * index and their paths have no common prefix, + * so there cannot be a F/D conflict. + */ + return 0; for (;;) { size_t len; From 6c91162449cb0a2fe3c42a1caa232444afed9c7c Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:44 +0000 Subject: [PATCH 13/52] fetch: avoid unnecessary work when there is no current branch As pointed out by CodeQL, `branch_get()` may return `NULL`, in which case `branch_has_merge_config()` would return early, but we can even avoid enumerating the refs prefixes in that case, saving even more CPU cycles. Technically, we should enclose these two statements in an `if (branch) {...}` block, but the indentation is already quite deep, therefore I refrained from doing that. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- builtin/fetch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index c1a1434c709625..40a0e8d24434f2 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1728,7 +1728,7 @@ static int do_fetch(struct transport *transport, if (transport->remote->follow_remote_head != FOLLOW_REMOTE_NEVER) do_set_head = 1; } - if (branch_has_merge_config(branch) && + if (branch && branch_has_merge_config(branch) && !strcmp(branch->remote_name, transport->remote->name)) { int i; for (i = 0; i < branch->merge_nr; i++) { From 3d39bcd98ecce0fce77b00fd680bd245b2161ddf Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:45 +0000 Subject: [PATCH 14/52] Avoid redundant conditions While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to `if (i <= 0) ... else ...`, the latter is vastly easier to read because it avoids writing out a condition that is unnecessary. Let's drop such unnecessary conditions. Pointed out by CodeQL. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- help.c | 2 +- transport-helper.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/help.c b/help.c index 6ef90838f128af..21b778707a6a65 100644 --- a/help.c +++ b/help.c @@ -214,7 +214,7 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes) else if (cmp == 0) { ei++; free(cmds->names[ci++]); - } else if (cmp > 0) + } else ei++; } diff --git a/transport-helper.c b/transport-helper.c index 69391ee7d28e11..0789e5bca53282 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -1437,7 +1437,7 @@ static int udt_do_read(struct unidirectional_transfer *t) transfer_debug("%s EOF (with %i bytes in buffer)", t->src_name, (int)t->bufuse); t->state = SSTATE_FLUSHING; - } else if (bytes > 0) { + } else { t->bufuse += bytes; transfer_debug("Read %i bytes from %s (buffer now at %i)", (int)bytes, t->src_name, (int)t->bufuse); From fc451e6ea85310725532cbdbc280f8a56a7ec7df Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:46 +0000 Subject: [PATCH 15/52] trace2: avoid "futile conditional" CodeQL reports empty `if` blocks that only contain a comment as "futile conditional". The comment talks about potential plans to turn this into a warning, but that seems not to have been necessary. Replace the entire construct with a concise comment. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- trace2/tr2_tmr.c | 24 +++++------------------- 1 file changed, 5 insertions(+), 19 deletions(-) diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c index 51f564b07a4091..038181ad9be05b 100644 --- a/trace2/tr2_tmr.c +++ b/trace2/tr2_tmr.c @@ -102,25 +102,11 @@ void tr2_update_final_timers(void) struct tr2_timer *t_final = &final_timer_block.timer[tid]; struct tr2_timer *t = &ctx->timer_block.timer[tid]; - if (t->recursion_count) { - /* - * The current thread is exiting with - * timer[tid] still running. - * - * Technically, this is a bug, but I'm going - * to ignore it. - * - * I don't think it is worth calling die() - * for. I don't think it is worth killing the - * process for this bookkeeping error. We - * might want to call warning(), but I'm going - * to wait on that. - * - * The downside here is that total_ns won't - * include the current open interval (now - - * start_ns). I can live with that. - */ - } + /* + * `t->recursion_count` could technically be non-zero, which + * would constitute a bug. Reporting the bug would potentially + * cause an infinite recursion, though, so let's ignore it. + */ if (!t->interval_count) continue; /* this timer was not used by this thread */ From ee63d026b407118221aca455a9c4f03a08ecf648 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:47 +0000 Subject: [PATCH 16/52] commit-graph: avoid using stale stack addresses The code is a bit too hard to reason about to fully assess whether the `fill_commit_graph_info()` function is called at all after `write_commit_graph()` returns (and hence the stack variable `topo_levels` goes out of context). Let's simply make sure that the stack address is no longer used at that stage, thereby making the code quite a bit easier to reason about. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- commit-graph.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 9f0115dac9b528..d052c1bf15c513 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2683,6 +2683,15 @@ int write_commit_graph(struct object_directory *odb, oid_array_clear(&ctx.oids); clear_topo_level_slab(&topo_levels); + if (ctx.r->objects->commit_graph) { + struct commit_graph *g = ctx.r->objects->commit_graph; + + while (g) { + g->topo_levels = NULL; + g = g->base_graph; + } + } + for (i = 0; i < ctx.num_commit_graphs_before; i++) free(ctx.commit_graph_filenames_before[i]); free(ctx.commit_graph_filenames_before); From d7cfbd4351bb304eefc09a8b1ba24fd40a9f36a0 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:48 +0000 Subject: [PATCH 17/52] bundle-uri: avoid using undefined output of `sscanf()` In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31) code was introduced that assumes that an `sscanf()` call leaves its output variables unchanged unless the return value indicates success. However, the POSIX documentation makes no such guarantee: https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html So let's make sure that the output variable `maxCreationToken` is always well-defined. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- bundle-uri.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 96d2ba726d9909..13a42f92387ea5 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r, */ if (!repo_config_get_value(r, "fetch.bundlecreationtoken", - &creationTokenStr) && - sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 && - bundles.items[0]->creationToken <= maxCreationToken) { - free(bundles.items); - return 0; + &creationTokenStr)) { + if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1) + maxCreationToken = 0; + if (bundles.items[0]->creationToken <= maxCreationToken) { + free(bundles.items); + return 0; + } } /* From 22488332393646cfa4263bcb24836f492876406e Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Thu, 15 May 2025 13:11:49 +0000 Subject: [PATCH 18/52] sequencer: stop pretending that an assignment is a condition In 3e81bccdf3 (sequencer: factor out todo command name parsing, 2019-06-27), a `return` statement was introduced that basically was a long sequence of conditions, combined with `&&`, except for the last condition which is not really a condition but an assignment. The point of this construct was to return 1 (i.e. `true`) from the function if all of those conditions held true, and also assign the `bol` pointer to the end of the parsed command. Some static analyzers are really unhappy about such constructs. And human readers are at least puzzled, if not confused, by seeing a single `=` inside a chain of conditions where they would have expected to see `==` instead and, based on experience, immediately suspect a typo. Let's help all of this by turning this into the more verbose, more readable form of an `if` construct that both assigns the pointer as well as returns 1 if all of the conditions hold true. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- sequencer.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sequencer.c b/sequencer.c index b5c4043757e948..e5e3bc6fa5ea5d 100644 --- a/sequencer.c +++ b/sequencer.c @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol) const char nick = todo_command_info[command].c; const char *p = *bol; - return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) && - (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) && - (*bol = p); + if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) && + (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) { + *bol = p; + return 1; + } + return 0; } static int check_label_or_ref_arg(enum todo_command command, const char *arg) From 56f1cd10f48a5f630633a0e65696917e6f70fdd9 Mon Sep 17 00:00:00 2001 From: Lidong Yan <502024330056@smail.nju.edu.cn> Date: Tue, 13 May 2025 02:49:10 +0000 Subject: [PATCH 19/52] mailinfo: fix pointential memory leak if `decode_header` failed In mailinfo.c:decode_header, if convert_to_utf8 failed, the strbuf stored in dec will leak. Simply add strbuf_release and free(dec) will solve this problem. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- mailinfo.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/mailinfo.c b/mailinfo.c index 7b001fa5dbd685..ee4597da6bef97 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -381,12 +381,12 @@ static int is_format_patch_separator(const char *line, int len) return !memcmp(SAMPLE + (cp - line), cp, strlen(SAMPLE) - (cp - line)); } -static struct strbuf *decode_q_segment(const struct strbuf *q_seg, int rfc2047) +static int decode_q_segment(struct strbuf *out, const struct strbuf *q_seg, + int rfc2047) { const char *in = q_seg->buf; int c; - struct strbuf *out = xmalloc(sizeof(struct strbuf)); - strbuf_init(out, q_seg->len); + strbuf_grow(out, q_seg->len); while ((c = *in++) != 0) { if (c == '=') { @@ -405,16 +405,15 @@ static struct strbuf *decode_q_segment(const struct strbuf *q_seg, int rfc2047) c = 0x20; strbuf_addch(out, c); } - return out; + return 0; } -static struct strbuf *decode_b_segment(const struct strbuf *b_seg) +static int decode_b_segment(struct strbuf *out, const struct strbuf *b_seg) { /* Decode in..ep, possibly in-place to ot */ int c, pos = 0, acc = 0; const char *in = b_seg->buf; - struct strbuf *out = xmalloc(sizeof(struct strbuf)); - strbuf_init(out, b_seg->len); + strbuf_grow(out, b_seg->len); while ((c = *in++) != 0) { if (c == '+') @@ -447,7 +446,7 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg) break; } } - return out; + return 0; } static int convert_to_utf8(struct mailinfo *mi, @@ -475,7 +474,7 @@ static int convert_to_utf8(struct mailinfo *mi, static void decode_header(struct mailinfo *mi, struct strbuf *it) { char *in, *ep, *cp; - struct strbuf outbuf = STRBUF_INIT, *dec; + struct strbuf outbuf = STRBUF_INIT, dec = STRBUF_INIT; struct strbuf charset_q = STRBUF_INIT, piecebuf = STRBUF_INIT; int found_error = 1; /* pessimism */ @@ -530,18 +529,19 @@ static void decode_header(struct mailinfo *mi, struct strbuf *it) default: goto release_return; case 'b': - dec = decode_b_segment(&piecebuf); + if ((found_error = decode_b_segment(&dec, &piecebuf))) + goto release_return; break; case 'q': - dec = decode_q_segment(&piecebuf, 1); + if ((found_error = decode_q_segment(&dec, &piecebuf, 1))) + goto release_return; break; } - if (convert_to_utf8(mi, dec, charset_q.buf)) + if (convert_to_utf8(mi, &dec, charset_q.buf)) goto release_return; - strbuf_addbuf(&outbuf, dec); - strbuf_release(dec); - free(dec); + strbuf_addbuf(&outbuf, &dec); + strbuf_release(&dec); in = ep + 2; } strbuf_addstr(&outbuf, in); @@ -552,6 +552,7 @@ static void decode_header(struct mailinfo *mi, struct strbuf *it) strbuf_release(&outbuf); strbuf_release(&charset_q); strbuf_release(&piecebuf); + strbuf_release(&dec); if (found_error) mi->input_error = -1; @@ -634,23 +635,22 @@ static int is_inbody_header(const struct mailinfo *mi, static void decode_transfer_encoding(struct mailinfo *mi, struct strbuf *line) { - struct strbuf *ret; + struct strbuf ret = STRBUF_INIT; switch (mi->transfer_encoding) { case TE_QP: - ret = decode_q_segment(line, 0); + decode_q_segment(&ret, line, 0); break; case TE_BASE64: - ret = decode_b_segment(line); + decode_b_segment(&ret, line); break; case TE_DONTCARE: default: return; } strbuf_reset(line); - strbuf_addbuf(line, ret); - strbuf_release(ret); - free(ret); + strbuf_addbuf(line, &ret); + strbuf_release(&ret); } static inline int patchbreak(const struct strbuf *line) From 044511f889b1989840339a322f84e50dfa3bf6e0 Mon Sep 17 00:00:00 2001 From: Lidong Yan <502024330056@smail.nju.edu.cn> Date: Wed, 14 May 2025 13:53:28 +0000 Subject: [PATCH 20/52] sequencer: fix memory leak if `todo_list_rearrange_squash()` failed In sequencer.c:todo_list_rearrange_squash, if it fails, memory allocated in `next`, `tail`, `subjects` and `subject2item` will leak. Jump to cleanup label before return could fix this leak problem. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- sequencer.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/sequencer.c b/sequencer.c index b5c4043757e948..5fb7b68a7abb08 100644 --- a/sequencer.c +++ b/sequencer.c @@ -6596,6 +6596,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list) char **subjects; struct commit_todo_item commit_todo; struct todo_item *items = NULL; + int ret = 0; init_commit_todo_item(&commit_todo); /* @@ -6626,8 +6627,8 @@ int todo_list_rearrange_squash(struct todo_list *todo_list) } if (is_fixup(item->command)) { - clear_commit_todo_item(&commit_todo); - return error(_("the script was already rearranged.")); + ret = error(_("the script was already rearranged.")); + goto cleanup; } repo_parse_commit(the_repository, item->commit); @@ -6729,6 +6730,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list) todo_list->items = items; } +cleanup: free(next); free(tail); for (i = 0; i < todo_list->nr; i++) @@ -6738,7 +6740,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list) clear_commit_todo_item(&commit_todo); - return 0; + return ret; } int sequencer_determine_whence(struct repository *r, enum commit_whence *whence) From beccbddb6802c0b56e34bb1d55cecceb093940f4 Mon Sep 17 00:00:00 2001 From: Lidong Yan <502024330056@smail.nju.edu.cn> Date: Fri, 9 May 2025 08:30:35 +0000 Subject: [PATCH 21/52] commit-graph: fix memory leak when `fill_oids_from_packs()` fails In commit-graph.c:fill_oids_from_packs, if open_pack_index failed, memory allocated and returned by add_packed_git will leak. Simply add close_pack and free(p) will solve this problem. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano --- commit-graph.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 6394752b0b0868..93d867770b05d2 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1929,6 +1929,8 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx, } if (open_pack_index(p)) { ret = error(_("error opening index for %s"), packname.buf); + close_pack(p); + free(p); goto cleanup; } for_each_object_in_pack(p, add_packed_commits, ctx, From 53eeed0a81dbd486a84b3252f35642c4cc2e9488 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:28 -0400 Subject: [PATCH 22/52] object-file.h: fix typo in variable declaration This should be "compat", not "comapt". Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- object-file.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/object-file.h b/object-file.h index a85b2e5b494c8f..fd715663fb4f3b 100644 --- a/object-file.h +++ b/object-file.h @@ -180,7 +180,7 @@ enum { int write_object_file_flags(const void *buf, unsigned long len, enum object_type type, struct object_id *oid, - struct object_id *comapt_oid_in, unsigned flags); + struct object_id *compat_oid_in, unsigned flags); static inline int write_object_file(const void *buf, unsigned long len, enum object_type type, struct object_id *oid) { From f227fc7d43d9607edb286eaab0f7714a2f1e4659 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:35 -0400 Subject: [PATCH 23/52] cat-file: make --allow-unknown-type a noop The cat-file command has some minor support for handling objects with "unknown" types. I.e., strings that are not "blob", "commit", "tree", or "tag". In theory this could be used for debugging or experimenting with extensions to Git. But in practice this support is not very useful: 1. You can get the type and size of such objects, but nothing else. Not even the contents! 2. Only loose objects are supported, since packfiles use numeric ids for the types, rather than strings. 3. Likewise you cannot ever transfer objects between repositories, because they cannot be represented in the packfiles used for the on-the-wire protocol. The support for these unknown types complicates the object-parsing code, and has led to bugs such as b748ddb7a4 (unpack_loose_header(): fix infinite loop on broken zlib input, 2025-02-25). So let's drop it. The first step is to remove the user-facing parts, which are accessible only via cat-file. This is technically backwards-incompatible, but given the limitations listed above, these objects couldn't possibly be useful in any workflow. However, we can't just rip out the option entirely. That would hurt a caller who ran: git cat-file -t --allow-unknown-object and fed it normal, well-formed objects. There --allow-unknown-type was doing nothing, but we wouldn't want to start bailing with an error. So to protect any such callers, we'll retain --allow-unknown-type as a noop. The code change is fairly small (but we'll able to clean up more code in follow-on patches). The test updates drop any use of the option. We still retain tests that feed the broken objects to cat-file without --allow-unknown-type, as we should continue to confirm that those objects are rejected. Note that in one spot we can drop a layer of loop, re-indenting the body; viewing the diff with "-w" helps there. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- Documentation/git-cat-file.adoc | 6 +- builtin/cat-file.c | 18 +-- t/t1006-cat-file.sh | 211 ++++++++------------------------ 3 files changed, 56 insertions(+), 179 deletions(-) diff --git a/Documentation/git-cat-file.adoc b/Documentation/git-cat-file.adoc index fc4b92f10495d2..cde79ad242bb77 100644 --- a/Documentation/git-cat-file.adoc +++ b/Documentation/git-cat-file.adoc @@ -9,8 +9,7 @@ SYNOPSIS -------- [verse] 'git cat-file' -'git cat-file' (-e | -p) -'git cat-file' (-t | -s) [--allow-unknown-type] +'git cat-file' (-e | -p | -t | -s) 'git cat-file' (--textconv | --filters) [: | --path= ] 'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects] @@ -202,9 +201,6 @@ flush:: only once, even if it is stored multiple times in the repository. ---allow-unknown-type:: - Allow `-s` or `-t` to query broken/corrupt objects of unknown type. - --follow-symlinks:: With `--batch` or `--batch-check`, follow symlinks inside the repository when requesting objects with extended SHA-1 diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 3914a2a3f61c61..4adc19aa294cec 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -100,8 +100,7 @@ static int stream_blob(const struct object_id *oid) return 0; } -static int cat_one_file(int opt, const char *exp_type, const char *obj_name, - int unknown_type) +static int cat_one_file(int opt, const char *exp_type, const char *obj_name) { int ret; struct object_id oid; @@ -121,9 +120,6 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, if (!path && opt_cw) get_oid_flags |= GET_OID_REQUIRE_PATH; - if (unknown_type) - flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE; - if (get_oid_with_context(the_repository, obj_name, get_oid_flags, &oid, &obj_context)) die("Not a valid object name %s", obj_name); @@ -1038,8 +1034,7 @@ int cmd_cat_file(int argc, const char * const builtin_catfile_usage[] = { N_("git cat-file "), - N_("git cat-file (-e | -p) "), - N_("git cat-file (-t | -s) [--allow-unknown-type] "), + N_("git cat-file (-e | -p | -t | -s) "), N_("git cat-file (--textconv | --filters)\n" " [: | --path= ]"), N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n" @@ -1057,8 +1052,8 @@ int cmd_cat_file(int argc, OPT_GROUP(N_("Emit [broken] object attributes")), OPT_CMDMODE('t', NULL, &opt, N_("show object type (one of 'blob', 'tree', 'commit', 'tag', ...)"), 't'), OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'), - OPT_BOOL(0, "allow-unknown-type", &unknown_type, - N_("allow -s and -t to work with broken/corrupt objects")), + OPT_HIDDEN_BOOL(0, "allow-unknown-type", &unknown_type, + N_("historical option -- no-op")), OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")), OPT_ALIAS(0, "mailmap", "use-mailmap"), /* Batch mode */ @@ -1209,10 +1204,7 @@ int cmd_cat_file(int argc, obj_name = argv[1]; } - if (unknown_type && opt != 't' && opt != 's') - die("git cat-file --allow-unknown-type: use with -s or -t"); - - ret = cat_one_file(opt, exp_type, obj_name, unknown_type); + ret = cat_one_file(opt, exp_type, obj_name); out: list_objects_filter_release(&batch.objects_filter); diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index ce8b27bf548fb7..d96d02ad7dc4e2 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -136,18 +136,6 @@ $content" test_cmp expect actual ' - test_expect_success "Type of $type is correct using --allow-unknown-type" ' - echo $type >expect && - git cat-file -t --allow-unknown-type $oid >actual && - test_cmp expect actual - ' - - test_expect_success "Size of $type is correct using --allow-unknown-type" ' - echo $size >expect && - git cat-file -s --allow-unknown-type $oid >actual && - test_cmp expect actual - ' - test -z "$content" || test_expect_success "Content of $type is correct" ' echo_without_newline "$content" >expect && @@ -677,95 +665,67 @@ test_expect_success 'setup bogus data' ' bogus_long_oid=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin) ' -for arg1 in '' --allow-unknown-type +for arg1 in -s -t -p do - for arg2 in -s -t -p - do - if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p" - then - continue - fi + test_expect_success "cat-file $arg1 error on bogus short OID" ' + cat >expect <<-\EOF && + fatal: invalid object type + EOF + test_must_fail git cat-file $arg1 $bogus_short_oid >out 2>actual && + test_must_be_empty out && + test_cmp expect actual + ' - test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" ' - cat >expect <<-\EOF && - fatal: invalid object type + test_expect_success "cat-file $arg1 error on bogus full OID" ' + if test "$arg1" = "-p" + then + cat >expect <<-EOF + error: header for $bogus_long_oid too long, exceeds 32 bytes + fatal: Not a valid object name $bogus_long_oid + EOF + else + cat >expect <<-EOF + error: header for $bogus_long_oid too long, exceeds 32 bytes + fatal: git cat-file: could not get object info EOF + fi && - if test "$arg1" = "--allow-unknown-type" - then - git cat-file $arg1 $arg2 $bogus_short_oid - else - test_must_fail git cat-file $arg1 $arg2 $bogus_short_oid >out 2>actual && - test_must_be_empty out && - test_cmp expect actual - fi - ' + test_must_fail git cat-file $arg1 $bogus_long_oid >out 2>actual && + test_must_be_empty out && + test_cmp expect actual + ' - test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" ' - if test "$arg2" = "-p" - then - cat >expect <<-EOF - error: header for $bogus_long_oid too long, exceeds 32 bytes - fatal: Not a valid object name $bogus_long_oid - EOF - else - cat >expect <<-EOF - error: header for $bogus_long_oid too long, exceeds 32 bytes - fatal: git cat-file: could not get object info - EOF - fi && - - if test "$arg1" = "--allow-unknown-type" - then - git cat-file $arg1 $arg2 $bogus_short_oid - else - test_must_fail git cat-file $arg1 $arg2 $bogus_long_oid >out 2>actual && - test_must_be_empty out && - test_cmp expect actual - fi - ' + test_expect_success "cat-file $arg1 error on missing short OID" ' + cat >expect.err <<-EOF && + fatal: Not a valid object name $(test_oid deadbeef_short) + EOF + test_must_fail git cat-file $arg1 $(test_oid deadbeef_short) >out 2>err.actual && + test_must_be_empty out && + test_cmp expect.err err.actual + ' - test_expect_success "cat-file $arg1 $arg2 error on missing short OID" ' - cat >expect.err <<-EOF && - fatal: Not a valid object name $(test_oid deadbeef_short) + test_expect_success "cat-file $arg1 error on missing full OID" ' + if test "$arg1" = "-p" + then + cat >expect.err <<-EOF + fatal: Not a valid object name $(test_oid deadbeef) EOF - test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual && - test_must_be_empty out && - test_cmp expect.err err.actual - ' - - test_expect_success "cat-file $arg1 $arg2 error on missing full OID" ' - if test "$arg2" = "-p" - then - cat >expect.err <<-EOF - fatal: Not a valid object name $(test_oid deadbeef) - EOF - else - cat >expect.err <<-\EOF - fatal: git cat-file: could not get object info - EOF - fi && - test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual && - test_must_be_empty out && - test_cmp expect.err err.actual - ' - done + else + cat >expect.err <<-\EOF + fatal: git cat-file: could not get object info + EOF + fi && + test_must_fail git cat-file $arg1 $(test_oid deadbeef) >out 2>err.actual && + test_must_be_empty out && + test_cmp expect.err err.actual + ' done -test_expect_success '-e is OK with a broken object without --allow-unknown-type' ' +test_expect_success '-e is OK with a broken object' ' git cat-file -e $bogus_short_oid ' -test_expect_success '-e can not be combined with --allow-unknown-type' ' - test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_oid -' - -test_expect_success '-p cannot print a broken object even with --allow-unknown-type' ' - test_must_fail git cat-file -p $bogus_short_oid && - test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_oid -' - test_expect_success ' does not work with objects of broken types' ' cat >err.expect <<-\EOF && fatal: invalid object type "bogus" @@ -788,60 +748,8 @@ test_expect_success 'broken types combined with --batch and --batch-check' ' test_cmp err.expect err.actual ' -test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' ' - test_expect_code 128 git cat-file --batch --allow-unknown-type expect <<-EOF && - $bogus_short_type - EOF - git cat-file -t --allow-unknown-type $bogus_short_oid >actual && - test_cmp expect actual && - - # Create it manually, as "git replace" will die on bogus - # types. - head=$(git rev-parse --verify HEAD) && - test_when_finished "test-tool ref-store main delete-refs 0 msg refs/replace/$bogus_short_oid" && - test-tool ref-store main update-ref msg "refs/replace/$bogus_short_oid" $head $ZERO_OID REF_SKIP_OID_VERIFICATION && - - cat >expect <<-EOF && - commit - EOF - git cat-file -t --allow-unknown-type $bogus_short_oid >actual && - test_cmp expect actual -' - -test_expect_success "Type of broken object is correct" ' - echo $bogus_short_type >expect && - git cat-file -t --allow-unknown-type $bogus_short_oid >actual && - test_cmp expect actual -' - -test_expect_success "Size of broken object is correct" ' - echo $bogus_short_size >expect && - git cat-file -s --allow-unknown-type $bogus_short_oid >actual && - test_cmp expect actual -' - -test_expect_success 'clean up broken object' ' - rm .git/objects/$(test_oid_to_path $bogus_short_oid) -' - -test_expect_success "Type of broken object is correct when type is large" ' - echo $bogus_long_type >expect && - git cat-file -t --allow-unknown-type $bogus_long_oid >actual && - test_cmp expect actual -' - -test_expect_success "Size of large broken object is correct when type is large" ' - echo $bogus_long_size >expect && - git cat-file -s --allow-unknown-type $bogus_long_oid >actual && - test_cmp expect actual -' - -test_expect_success 'clean up broken object' ' +test_expect_success 'clean up broken objects' ' + rm .git/objects/$(test_oid_to_path $bogus_short_oid) && rm .git/objects/$(test_oid_to_path $bogus_long_oid) ' @@ -903,25 +811,6 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' ' ) ' -test_expect_success 'truncated object with --allow-unknown-type' - <<\EOT - objtype='a really long type name that exceeds the 32-byte limit' && - blob=$(git hash-object -w --literally -t "$objtype" /dev/null) && - objpath=.git/objects/$(test_oid_to_path "$blob") && - - # We want to truncate the object far enough in that we don't hit the - # end while inflating the first 32 bytes (since we want to have to dig - # for the trailing NUL of the header). But we don't want to go too far, - # since our header isn't very big. And of course we are counting - # deflated zlib bytes in the on-disk file, so it's a bit of a guess. - # Empirically 50 seems to work. - mv "$objpath" obj.bak && - test_when_finished 'mv obj.bak "$objpath"' && - test_copy_bytes 50 "$objpath" && - - test_must_fail git cat-file --allow-unknown-type -t $blob 2>err && - test_grep "unable to unpack $blob header" err -EOT - test_expect_success 'object reading handles zlib dictionary' - <<\EOT echo 'content that will be recompressed' >file && blob=$(git hash-object -w file) && From ae24b032a04ccd1565cb1ce13317b56daa77ce7f Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:45 -0400 Subject: [PATCH 24/52] object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag Since cat-file dropped its "--allow-unknown-type" option in the previous commit, there are no more uses of the internal flag that implemented it. Let's drop it. That in turn lets us drop the strbuf parameter of unpack_loose_header(), which now is always NULL. And without that, we can drop all of the additional code to inflate larger headers into the strbuf. Arguably we could drop ULHR_TOO_LONG, as no callers really care about the distinction from ULHR_BAD. But it's easy enough to retain, and it does let us produce a slightly more specific message in one instance. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- object-file.c | 45 +++++++-------------------------------------- object-file.h | 10 ++-------- object-store.h | 2 -- streaming.c | 2 +- 4 files changed, 10 insertions(+), 49 deletions(-) diff --git a/object-file.c b/object-file.c index dc56a4766df4d1..1127e154f61da5 100644 --- a/object-file.c +++ b/object-file.c @@ -299,8 +299,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, - unsigned long bufsiz, - struct strbuf *header) + unsigned long bufsiz) { int status; @@ -325,32 +324,9 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, return ULHR_OK; /* - * We have a header longer than MAX_HEADER_LEN. The "header" - * here is only non-NULL when we run "cat-file - * --allow-unknown-type". + * We have a header longer than MAX_HEADER_LEN. */ - if (!header) - return ULHR_TOO_LONG; - - /* - * buffer[0..bufsiz] was not large enough. Copy the partial - * result out to header, and then append the result of further - * reading the stream. - */ - strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer); - - do { - stream->next_out = buffer; - stream->avail_out = bufsiz; - - obj_read_unlock(); - status = git_inflate(stream, 0); - obj_read_lock(); - strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer); - if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer)) - return 0; - } while (status == Z_OK); - return ULHR_BAD; + return ULHR_TOO_LONG; } static void *unpack_loose_rest(git_zstream *stream, @@ -476,10 +452,8 @@ int loose_object_info(struct repository *r, void *map; git_zstream stream; char hdr[MAX_HEADER_LEN]; - struct strbuf hdrbuf = STRBUF_INIT; unsigned long size_scratch; enum object_type type_scratch; - int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE; if (oi->delta_base_oid) oidclr(oi->delta_base_oid, the_repository->hash_algo); @@ -521,18 +495,15 @@ int loose_object_info(struct repository *r, if (oi->disk_sizep) *oi->disk_sizep = mapsize; - switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr), - allow_unknown ? &hdrbuf : NULL)) { + switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) { case ULHR_OK: - if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) + if (parse_loose_header(hdr, oi) < 0) status = error(_("unable to parse %s header"), oid_to_hex(oid)); - else if (!allow_unknown && *oi->typep < 0) + else if (*oi->typep < 0) die(_("invalid object type")); if (!oi->contentp) break; - if (hdrbuf.len) - BUG("unpacking content with unknown types not yet supported"); *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); if (*oi->contentp) goto cleanup; @@ -558,7 +529,6 @@ int loose_object_info(struct repository *r, munmap(map, mapsize); if (oi->sizep == &size_scratch) oi->sizep = NULL; - strbuf_release(&hdrbuf); if (oi->typep == &type_scratch) oi->typep = NULL; oi->whence = OI_LOOSE; @@ -1682,8 +1652,7 @@ int read_loose_object(const char *path, goto out; } - if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr), - NULL) != ULHR_OK) { + if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) != ULHR_OK) { error(_("unable to unpack header of %s"), path); goto out_inflate; } diff --git a/object-file.h b/object-file.h index fd715663fb4f3b..a979fd5e4da6ea 100644 --- a/object-file.h +++ b/object-file.h @@ -133,12 +133,7 @@ int format_object_header(char *str, size_t size, enum object_type type, * - ULHR_BAD on error * - ULHR_TOO_LONG if the header was too long * - * It will only parse up to MAX_HEADER_LEN bytes unless an optional - * "hdrbuf" argument is non-NULL. This is intended for use with - * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error) - * reporting. The full header will be extracted to "hdrbuf" for use - * with parse_loose_header(), ULHR_TOO_LONG will still be returned - * from this function to indicate that the header was too long. + * It will only parse up to MAX_HEADER_LEN bytes. */ enum unpack_loose_header_result { ULHR_OK, @@ -149,8 +144,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, - unsigned long bufsiz, - struct strbuf *hdrbuf); + unsigned long bufsiz); /** * parse_loose_header() parses the starting " \0" of an diff --git a/object-store.h b/object-store.h index c2fe5a19605040..cf908fe68e0131 100644 --- a/object-store.h +++ b/object-store.h @@ -240,8 +240,6 @@ struct object_info { /* Invoke lookup_replace_object() on the given hash */ #define OBJECT_INFO_LOOKUP_REPLACE 1 -/* Allow reading from a loose object file of unknown/bogus type */ -#define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2 /* Do not retry packed storage after checking packed and loose storage */ #define OBJECT_INFO_QUICK 8 /* diff --git a/streaming.c b/streaming.c index 127d6b5d6ac2d7..6d6512e2e0d6d9 100644 --- a/streaming.c +++ b/streaming.c @@ -238,7 +238,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r, return -1; switch (unpack_loose_header(&st->z, st->u.loose.mapped, st->u.loose.mapsize, st->u.loose.hdr, - sizeof(st->u.loose.hdr), NULL)) { + sizeof(st->u.loose.hdr))) { case ULHR_OK: break; case ULHR_BAD: From aac2abeca7077aa5f87f4132b98d37dd938b3573 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:47 -0400 Subject: [PATCH 25/52] cat-file: use type enum instead of buffer for -t option Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is no need to pass a strbuf into oid_object_info_extended() to record the type. The regular object_type enum is sufficient to capture all of the types we will allow. This simplifies the code a bit, and will eventually let us drop object_info's type_name strbuf support. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/cat-file.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 4adc19aa294cec..67a5ff2b9ebd29 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -109,7 +109,6 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) unsigned long size; struct object_context obj_context = {0}; struct object_info oi = OBJECT_INFO_INIT; - struct strbuf sb = STRBUF_INIT; unsigned flags = OBJECT_INFO_LOOKUP_REPLACE; unsigned get_oid_flags = GET_OID_RECORD_PATH | @@ -132,16 +131,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) buf = NULL; switch (opt) { case 't': - oi.type_name = &sb; + oi.typep = &type; if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0) die("git cat-file: could not get object info"); - if (sb.len) { - printf("%s\n", sb.buf); - strbuf_release(&sb); - ret = 0; - goto cleanup; - } - break; + printf("%s\n", type_name(type)); + ret = 0; + goto cleanup; case 's': oi.sizep = &size; From b32b434bfe241cde380c5f3aca48a1fdcd86961b Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:50 -0400 Subject: [PATCH 26/52] oid_object_info_convert(): stop using string for object type In oid_object_info_convert(), we convert objects between their sha1 and sha256 variants. To do this, we naturally need to know the type, which we get from oid_object_info_extended() using its type_name strbuf option. But getting the value as a string (versus an object_type enum) is not helpful. Since we do not allow unknown types, the regular enum is sufficient. And the resulting code is a bit simpler, as we no longer have to manage the extra allocation nor convert the string to an enum ourselves. Note that at first glance, it might seem like we should retain the error check for "type == -1" to catch bogus types found by the underlying parser. But we don't need it, as an unknown type would have yielded an error from the call to oid_object_info_extended(), which would already have caused us to return an error. In fact, I suspect this was always impossible to trigger. Even when we were converting the string to a type enum ourselves, an invalid type would never have escaped oid_object_info_extended(), since we never passed the (now removed) OBJECT_INFO_ALLOW_UNKNOWN_TYPE option. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- object-store.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/object-store.c b/object-store.c index 2f51d0e3b037e3..b8f6955ea74174 100644 --- a/object-store.c +++ b/object-store.c @@ -727,7 +727,7 @@ static int oid_object_info_convert(struct repository *r, { const struct git_hash_algo *input_algo = &hash_algos[input_oid->algo]; int do_die = flags & OBJECT_INFO_DIE_IF_CORRUPT; - struct strbuf type_name = STRBUF_INIT; + enum object_type type; struct object_id oid, delta_base_oid; struct object_info new_oi, *oi; unsigned long size; @@ -753,7 +753,7 @@ static int oid_object_info_convert(struct repository *r, if (input_oi->sizep || input_oi->contentp) { new_oi.contentp = &content; new_oi.sizep = &size; - new_oi.type_name = &type_name; + new_oi.typep = &type; } oi = &new_oi; } @@ -766,12 +766,7 @@ static int oid_object_info_convert(struct repository *r, if (new_oi.contentp) { struct strbuf outbuf = STRBUF_INIT; - enum object_type type; - type = type_from_string_gently(type_name.buf, type_name.len, - !do_die); - if (type == -1) - return -1; if (type != OBJ_BLOB) { ret = convert_object_file(the_repository, &outbuf, the_hash_algo, input_algo, @@ -788,10 +783,8 @@ static int oid_object_info_convert(struct repository *r, *input_oi->contentp = content; else free(content); - if (input_oi->type_name) - *input_oi->type_name = type_name; - else - strbuf_release(&type_name); + if (input_oi->typep) + *input_oi->typep = type; } if (new_oi.delta_base_oid == &delta_base_oid) { if (repo_oid_to_algop(r, &delta_base_oid, input_algo, From 4ae0e9423c95c63c17f66fb2de255c46dc14c4e5 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:53 -0400 Subject: [PATCH 27/52] fsck: stop using object_info->type_name strbuf When fsck-ing a loose object, we use object_info's type_name strbuf to record the parsed object type as a string. For most objects this is redundant with the object_type enum, but it does let us report the string when we encounter an object with an unknown type (for which there is no matching enum value). There are a few downsides, though: 1. The code to report these cases is not actually robust. Since we did not pass a strbuf to unpack_loose_header(), we only retrieved types from headers up to 32 bytes. In longer cases, we'd simply say "object corrupt or missing". 2. This is the last caller that uses object_info's type_name strbuf support. It would be nice to refactor it so that we can simplify that code. 3. Likewise, we'll check the hash of the object using its unknown type (again, as long as that type is short enough). That depends on the hash_object_file_literally() code, which we'd eventually like to get rid of. So we can simplify things by bailing immediately in read_loose_object() when we encounter an unknown type. This has a few user-visible effects: a. Instead of producing a single line of error output like this: error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 we'll now issue two lines (the first from read_loose_object() when we see the unparsable header, and the second from the fsck code, since we couldn't read the object): error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 This is a little more verbose, but this sort of error should be rare (such objects are almost impossible to work with, and cannot be transferred between repositories as they are not representable in packfiles). And as a bonus, reporting the broken header in full could help with debugging other cases (e.g., a header like "blob xyzzy\0" would fail in parsing the size, but previously we'd not have showed the offending bytes). b. An object with an unknown type will be reported as corrupt, without actually doing a hash check. Again, I think this is unlikely to matter in practice since such objects are totally unusable. We'll update one fsck test to match the new error strings. And we can remove another test that covered the case of an object with an unknown type _and_ a hash corruption. Since we'll skip the hash check now in this case, the test is no longer interesting. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/fsck.c | 13 ++----------- object-file.c | 12 +++++++++--- t/t1450-fsck.sh | 29 +++-------------------------- 3 files changed, 14 insertions(+), 40 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 6cac28356ce14f..e7d96a9c8ea586 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -614,12 +614,11 @@ static void get_default_heads(void) struct for_each_loose_cb { struct progress *progress; - struct strbuf obj_type; }; -static int fsck_loose(const struct object_id *oid, const char *path, void *data) +static int fsck_loose(const struct object_id *oid, const char *path, + void *data UNUSED) { - struct for_each_loose_cb *cb_data = data; struct object *obj; enum object_type type = OBJ_NONE; unsigned long size; @@ -629,8 +628,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data) struct object_id real_oid = *null_oid(the_hash_algo); int err = 0; - strbuf_reset(&cb_data->obj_type); - oi.type_name = &cb_data->obj_type; oi.sizep = &size; oi.typep = &type; @@ -642,10 +639,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data) err = error(_("%s: object corrupt or missing: %s"), oid_to_hex(oid), path); } - if (type != OBJ_NONE && type < 0) - err = error(_("%s: object is of unknown type '%s': %s"), - oid_to_hex(&real_oid), cb_data->obj_type.buf, - path); if (err < 0) { errors_found |= ERROR_OBJECT; free(contents); @@ -697,7 +690,6 @@ static void fsck_object_dir(const char *path) { struct progress *progress = NULL; struct for_each_loose_cb cb_data = { - .obj_type = STRBUF_INIT, .progress = progress, }; @@ -712,7 +704,6 @@ static void fsck_object_dir(const char *path) &cb_data); display_progress(progress, 256); stop_progress(&progress); - strbuf_release(&cb_data.obj_type); } static int fsck_head_link(const char *head_ref_name, diff --git a/object-file.c b/object-file.c index 1127e154f61da5..7a35bde96ef10a 100644 --- a/object-file.c +++ b/object-file.c @@ -1662,6 +1662,12 @@ int read_loose_object(const char *path, goto out_inflate; } + if (*oi->typep < 0) { + error(_("unable to parse type from header '%s' of %s"), + hdr, path); + goto out_inflate; + } + if (*oi->typep == OBJ_BLOB && *size > repo_settings_get_big_file_threshold(the_repository)) { if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0) @@ -1672,9 +1678,9 @@ int read_loose_object(const char *path, error(_("unable to unpack contents of %s"), path); goto out_inflate; } - hash_object_file_literally(the_repository->hash_algo, - *contents, *size, - oi->type_name->buf, real_oid); + hash_object_file(the_repository->hash_algo, + *contents, *size, + *oi->typep, real_oid); if (!oideq(expected_oid, real_oid)) goto out_inflate; } diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 0105045376245a..3f52dd5abc541b 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -71,30 +71,6 @@ test_expect_success 'object with hash mismatch' ' ) ' -test_expect_success 'object with hash and type mismatch' ' - git init --bare hash-type-mismatch && - ( - cd hash-type-mismatch && - - oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) && - oldoid=$oid && - old=$(test_oid_to_path "$oid") && - new=$(dirname $old)/$(test_oid ff_2) && - oid="$(dirname $new)$(basename $new)" && - - mv objects/$old objects/$new && - git update-index --add --cacheinfo 100644 $oid foo && - tree=$(git write-tree) && - cmt=$(echo bogus | git commit-tree $tree) && - git update-ref refs/heads/bogus $cmt && - - - test_must_fail git fsck 2>out && - grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out && - grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out - ) -' - test_expect_success 'zlib corrupt loose object output ' ' git init --bare corrupt-loose-output && ( @@ -1001,8 +977,9 @@ test_expect_success 'fsck error and recovery on invalid object type' ' test_must_fail git fsck 2>err && grep -e "^error" -e "^fatal" err >errors && - test_line_count = 1 errors && - grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err + test_line_count = 2 errors && + test_grep "unable to parse type from header .garbage" err && + test_grep "$garbage_blob: object corrupt or missing:" err ) ' From d2956385a9319155928e2d7bc5f9d90eeac5d0a5 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:56 -0400 Subject: [PATCH 28/52] oid_object_info(): drop type_name strbuf We provide a mechanism for callers to get the object type as a raw string, rather than an object_type enum. This was in theory useful for returning types that are not representable in the enum, but we consider any such type to be an error, and there are no callers that use the strbuf anymore. Let's drop support to simplify the code a bit. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- object-file.c | 4 +--- object-store.c | 2 -- object-store.h | 1 - packfile.c | 7 +------ 4 files changed, 2 insertions(+), 12 deletions(-) diff --git a/object-file.c b/object-file.c index 7a35bde96ef10a..b10e28352913c2 100644 --- a/object-file.c +++ b/object-file.c @@ -403,8 +403,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi) } type = type_from_string_gently(type_buf, type_len, 1); - if (oi->type_name) - strbuf_add(oi->type_name, type_buf, type_len); if (oi->typep) *oi->typep = type; @@ -466,7 +464,7 @@ int loose_object_info(struct repository *r, * return value implicitly indicates whether the * object even exists. */ - if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { + if (!oi->typep && !oi->sizep && !oi->contentp) { struct stat st; if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) return quick_has_loose(r, oid) ? 0 : -1; diff --git a/object-store.c b/object-store.c index b8f6955ea74174..216c61dcf2330a 100644 --- a/object-store.c +++ b/object-store.c @@ -646,8 +646,6 @@ static int do_oid_object_info_extended(struct repository *r, *(oi->disk_sizep) = 0; if (oi->delta_base_oid) oidclr(oi->delta_base_oid, the_repository->hash_algo); - if (oi->type_name) - strbuf_addstr(oi->type_name, type_name(co->type)); if (oi->contentp) *oi->contentp = xmemdupz(co->buf, co->size); oi->whence = OI_CACHED; diff --git a/object-store.h b/object-store.h index cf908fe68e0131..6b55c245ebbc12 100644 --- a/object-store.h +++ b/object-store.h @@ -205,7 +205,6 @@ struct object_info { unsigned long *sizep; off_t *disk_sizep; struct object_id *delta_base_oid; - struct strbuf *type_name; void **contentp; /* Response */ diff --git a/packfile.c b/packfile.c index d91016f1c7ff40..80e35f1032d332 100644 --- a/packfile.c +++ b/packfile.c @@ -1598,17 +1598,12 @@ int packed_object_info(struct repository *r, struct packed_git *p, *oi->disk_sizep = pack_pos_to_offset(p, pos + 1) - obj_offset; } - if (oi->typep || oi->type_name) { + if (oi->typep) { enum object_type ptot; ptot = packed_to_object_type(r, p, obj_offset, type, &w_curs, curpos); if (oi->typep) *oi->typep = ptot; - if (oi->type_name) { - const char *tn = type_name(ptot); - if (tn) - strbuf_addstr(oi->type_name, tn); - } if (ptot < 0) { type = OBJ_BAD; goto out; From f2ed511a2f8f7339e21e4f2792ebe230e92dd669 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:49:59 -0400 Subject: [PATCH 29/52] t/helper: add zlib test-tool It's occasionally useful when testing or debugging to be able to do raw zlib inflate/deflate operations (e.g., to check the bytes of a specific loose or packed object). Even though zlib's deflate algorithm is used by many other programs, this is surprisingly hard to do in a portable way. E.g., gzip can do this if you manually munge some header bytes. But the result is somewhat arcane, and we don't assume gzip is available anyway. Likewise, pigz will handle raw zlib, but we can't assume it is available. So let's introduce a short test helper for just doing zlib operations. We'll use it in subsequent patches to add some new tests, but it would also have come in handy a few times in the past: - The hard-coded pack data from 3b910d0c5e (add tests for indexing packs with delta cycles, 2013-08-23) could probably be generated on the fly. - Likewise we could avoid the hard-coded data from 0b1493c2d4 (git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT, 2025-02-25). Though note this would require support for more zlib options. - It would have helped with the debugging documented in 41dfbb2dbe (howto: add article on recovering a corrupted object, 2013-10-25). I'll leave refactoring existing tests for another day, but I hope the examples above show the general utility. I aimed for simplicity in the code. In particular, it will read all input into a memory buffer, rather than streaming. That makes the zlib loops harder to get wrong (which has been a source of subtle bugs in the past). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- Makefile | 1 + t/helper/meson.build | 1 + t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/helper/test-zlib.c | 62 ++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 66 insertions(+) create mode 100644 t/helper/test-zlib.c diff --git a/Makefile b/Makefile index de73c6ddcd1e30..14616ff6255e58 100644 --- a/Makefile +++ b/Makefile @@ -859,6 +859,7 @@ TEST_BUILTINS_OBJS += test-wildmatch.o TEST_BUILTINS_OBJS += test-windows-named-pipe.o TEST_BUILTINS_OBJS += test-write-cache.o TEST_BUILTINS_OBJS += test-xml-encode.o +TEST_BUILTINS_OBJS += test-zlib.o # Do not add more tests here unless they have extra dependencies. Add # them in TEST_BUILTINS_OBJS above. diff --git a/t/helper/meson.build b/t/helper/meson.build index d4e8b26df8d6de..675e64c0101b61 100644 --- a/t/helper/meson.build +++ b/t/helper/meson.build @@ -77,6 +77,7 @@ test_tool_sources = [ 'test-windows-named-pipe.c', 'test-write-cache.c', 'test-xml-encode.c', + 'test-zlib.c', ] test_tool = executable('test-tool', diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 74812ed86d385a..a7abc618b3887e 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -91,6 +91,7 @@ static struct test_cmd cmds[] = { { "windows-named-pipe", cmd__windows_named_pipe }, #endif { "write-cache", cmd__write_cache }, + { "zlib", cmd__zlib }, }; static NORETURN void die_usage(void) diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 2571a3ccfe8991..7f150fa1eb9ad2 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -84,6 +84,7 @@ int cmd__wildmatch(int argc, const char **argv); int cmd__windows_named_pipe(int argc, const char **argv); #endif int cmd__write_cache(int argc, const char **argv); +int cmd__zlib(int argc, const char **argv); int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); diff --git a/t/helper/test-zlib.c b/t/helper/test-zlib.c new file mode 100644 index 00000000000000..de7e9edee12ff7 --- /dev/null +++ b/t/helper/test-zlib.c @@ -0,0 +1,62 @@ +#include "test-tool.h" +#include "git-zlib.h" +#include "strbuf.h" + +static const char *zlib_usage = "test-tool zlib [inflate|deflate]"; + +static void do_zlib(struct git_zstream *stream, + int (*zlib_func)(git_zstream *, int), + int fd_in, int fd_out) +{ + struct strbuf buf_in = STRBUF_INIT; + int status = Z_OK; + + if (strbuf_read(&buf_in, fd_in, 0) < 0) + die_errno("read error"); + + stream->next_in = (unsigned char *)buf_in.buf; + stream->avail_in = buf_in.len; + + while (status == Z_OK || + (status == Z_BUF_ERROR && !stream->avail_out)) { + unsigned char buf_out[4096]; + + stream->next_out = buf_out; + stream->avail_out = sizeof(buf_out); + + status = zlib_func(stream, Z_FINISH); + if (write_in_full(fd_out, buf_out, + sizeof(buf_out) - stream->avail_out) < 0) + die_errno("write error"); + } + + if (status != Z_STREAM_END) + die("zlib error %d", status); + + strbuf_release(&buf_in); +} + +int cmd__zlib(int argc, const char **argv) +{ + git_zstream stream; + + if (argc != 2) + usage(zlib_usage); + + memset(&stream, 0, sizeof(stream)); + + if (!strcmp(argv[1], "inflate")) { + git_inflate_init(&stream); + do_zlib(&stream, git_inflate, 0, 1); + git_inflate_end(&stream); + } else if (!strcmp(argv[1], "deflate")) { + git_deflate_init(&stream, Z_DEFAULT_COMPRESSION); + do_zlib(&stream, git_deflate, 0, 1); + git_deflate_end(&stream); + } else { + error("unknown mode: %s", argv[1]); + usage(zlib_usage); + } + + return 0; +} From b5643b60acb71e3c117558b37020a8db8fe17c69 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:50:02 -0400 Subject: [PATCH 30/52] t: add lib-loose.sh This commit adds a shell library for writing raw loose objects into the object database. Normally this is done with hash-object, but the specific intent here is to allow broken objects that hash-object may not support. We'll convert several cases that use "hash-object --literally" to write objects with invalid types. That works currently, but dropping this dependency will allow us to remove that feature and simplify the object-writing code. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/lib-loose.sh | 30 +++++++++++++++++++++++++++++ t/t1006-cat-file.sh | 5 +++-- t/t1450-fsck.sh | 3 ++- t/t1512-rev-parse-disambiguation.sh | 5 +++-- 4 files changed, 38 insertions(+), 5 deletions(-) create mode 100644 t/lib-loose.sh diff --git a/t/lib-loose.sh b/t/lib-loose.sh new file mode 100644 index 00000000000000..3613631eafa89b --- /dev/null +++ b/t/lib-loose.sh @@ -0,0 +1,30 @@ +# Support routines for hand-crafting loose objects. + +# Write a loose object into the odb at $1, with object type $2 and contents +# from stdin. Writes the oid to stdout. Example: +# +# oid=$(echo foo | loose_obj .git/objects blob) +# +loose_obj () { + cat >tmp_loose.content && + size=$(wc -c tmp_loose.raw && + + oid=$(test-tool $test_hash_algo tmp_loose.zlib && + mkdir -p "$dir" && + mv tmp_loose.zlib "$file" && + + rm tmp_loose.raw tmp_loose.content && + echo "$oid" +} diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index d96d02ad7dc4e2..317da6869c88ee 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -3,6 +3,7 @@ test_description='git cat-file' . ./test-lib.sh +. "$TEST_DIRECTORY/lib-loose.sh" test_cmdmode_usage () { test_expect_code 129 "$@" 2>err && @@ -657,12 +658,12 @@ test_expect_success 'setup bogus data' ' bogus_short_type="bogus" && bogus_short_content="bogus" && bogus_short_size=$(strlen "$bogus_short_content") && - bogus_short_oid=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) && + bogus_short_oid=$(echo_without_newline "$bogus_short_content" | loose_obj .git/objects $bogus_short_type) && bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" && bogus_long_content="bogus" && bogus_long_size=$(strlen "$bogus_long_content") && - bogus_long_oid=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin) + bogus_long_oid=$(echo_without_newline "$bogus_long_content" | loose_obj .git/objects $bogus_long_type) ' for arg1 in -s -t -p diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 3f52dd5abc541b..5ae86c42be55ac 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -7,6 +7,7 @@ test_description='git fsck random collection of tests ' . ./test-lib.sh +. "$TEST_DIRECTORY/lib-loose.sh" test_expect_success setup ' git config gc.auto 0 && @@ -973,7 +974,7 @@ test_expect_success 'fsck error and recovery on invalid object type' ' ( cd garbage-type && - garbage_blob=$(git hash-object --stdin -w -t garbage --literally err && grep -e "^error" -e "^fatal" err >errors && diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh index 70f1e0a998e103..1a380a418425a4 100755 --- a/t/t1512-rev-parse-disambiguation.sh +++ b/t/t1512-rev-parse-disambiguation.sh @@ -24,6 +24,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. "$TEST_DIRECTORY/lib-loose.sh" test_cmp_failed_rev_parse () { dir=$1 @@ -67,8 +68,8 @@ test_expect_success 'ambiguous loose bad object parsed as OBJ_BAD' ' cd blob.bad && # Both have the prefix "bad0" - echo xyzfaowcoh | git hash-object -t bad -w --stdin --literally && - echo xyzhjpyvwl | git hash-object -t bad -w --stdin --literally + echo xyzfaowcoh | loose_obj objects bad && + echo xyzhjpyvwl | loose_obj objects bad ) && test_cmp_failed_rev_parse blob.bad bad0 <<-\EOF From 65a6a79b4204a2038498fd14be993b89067a046a Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:50:05 -0400 Subject: [PATCH 31/52] hash-object: stop allowing unknown types When passed the "--literally" option, hash-object will allow any arbitrary string for its "-t" type option. Such objects are only useful for testing or debugging, as they cannot be used in the normal way (e.g., you cannot fetch their contents!). Let's drop this feature, which will eventually let us simplify the object-writing code. This is technically backwards incompatible, but since such objects were never really functional, it seems unlikely that anybody will notice. We will retain the --literally flag, as it also instructs hash-object not to worry about other format issues (e.g., type-specific things that fsck would complain about). The documentation does not need to be updated, as it was always vague about which checks we're loosening (it uses only the phrase "any garbage"). The code change is a bit hard to verify from just the patch text. We can drop our local hash_literally() helper, but it was really just wrapping write_object_file_literally(). We now replace that with calling index_fd(), as we do for the non-literal code path, but dropping the INDEX_FORMAT_CHECK flag. This ends up being the same semantically as what the _literally() code path was doing (modulo handling unknown types, which is our goal). We'll be able to clean up these code paths a bit more in subsequent patches. The existing test is flipped to show that we now reject the unknown type. The additional "extra-long type" test is now redundant, as we bail early upon seeing a bogus type. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/hash-object.c | 29 +++++------------------------ t/t1007-hash-object.sh | 11 ++--------- 2 files changed, 7 insertions(+), 33 deletions(-) diff --git a/builtin/hash-object.c b/builtin/hash-object.c index cd53fa3bde8dc3..3c6949b3faa029 100644 --- a/builtin/hash-object.c +++ b/builtin/hash-object.c @@ -24,26 +24,6 @@ enum { HASH_OBJECT_WRITE = (1 << 1), }; -/* - * This is to create corrupt objects for debugging and as such it - * needs to bypass the data conversion performed by, and the type - * limitation imposed by, index_fd() and its callees. - */ -static int hash_literally(struct object_id *oid, int fd, const char *type, unsigned flags) -{ - struct strbuf buf = STRBUF_INIT; - int ret; - - if (strbuf_read(&buf, fd, 4096) < 0) - ret = -1; - else - ret = write_object_file_literally(buf.buf, buf.len, type, oid, - (flags & HASH_OBJECT_WRITE) ? WRITE_OBJECT_FILE_PERSIST : 0); - close(fd); - strbuf_release(&buf); - return ret; -} - static void hash_fd(int fd, const char *type, const char *path, unsigned flags, int literally) { @@ -56,11 +36,12 @@ static void hash_fd(int fd, const char *type, const char *path, unsigned flags, if (flags & HASH_OBJECT_CHECK) index_flags |= INDEX_FORMAT_CHECK; + if (literally) + index_flags &= ~INDEX_FORMAT_CHECK; + if (fstat(fd, &st) < 0 || - (literally - ? hash_literally(&oid, fd, type, flags) - : index_fd(the_repository->index, &oid, fd, &st, - type_from_string(type), path, index_flags))) + index_fd(the_repository->index, &oid, fd, &st, + type_from_string(type), path, index_flags)) die((flags & HASH_OBJECT_WRITE) ? "Unable to add %s to database" : "Unable to hash %s", path); diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh index b3cf53ff8c9f79..dbbe9fb0d4b19b 100755 --- a/t/t1007-hash-object.sh +++ b/t/t1007-hash-object.sh @@ -248,15 +248,8 @@ test_expect_success 'hash-object complains about truncated type name' ' test_must_fail git hash-object -t bl --stdin Date: Fri, 16 May 2025 00:50:08 -0400 Subject: [PATCH 32/52] hash-object: merge HASH_* and INDEX_* flags The hash-object command has its own custom flag bits that it sets based on command-line options. But since we dropped hash_literally() in the previous commit, the only thing we do with those flag bits is convert them directly into "index_flags" to pass to index_fd(). This extra layer of indirection makes the code harder to read and reason about. Let's just use the INDEX_* flags directly. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/hash-object.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/builtin/hash-object.c b/builtin/hash-object.c index 3c6949b3faa029..1ecb70b551fe69 100644 --- a/builtin/hash-object.c +++ b/builtin/hash-object.c @@ -19,30 +19,19 @@ #include "strbuf.h" #include "write-or-die.h" -enum { - HASH_OBJECT_CHECK = (1 << 0), - HASH_OBJECT_WRITE = (1 << 1), -}; - static void hash_fd(int fd, const char *type, const char *path, unsigned flags, int literally) { - unsigned int index_flags = 0; struct stat st; struct object_id oid; - if (flags & HASH_OBJECT_WRITE) - index_flags |= INDEX_WRITE_OBJECT; - if (flags & HASH_OBJECT_CHECK) - index_flags |= INDEX_FORMAT_CHECK; - if (literally) - index_flags &= ~INDEX_FORMAT_CHECK; + flags &= ~INDEX_FORMAT_CHECK; if (fstat(fd, &st) < 0 || index_fd(the_repository->index, &oid, fd, &st, - type_from_string(type), path, index_flags)) - die((flags & HASH_OBJECT_WRITE) + type_from_string(type), path, flags)) + die((flags & INDEX_WRITE_OBJECT) ? "Unable to add %s to database" : "Unable to hash %s", path); printf("%s\n", oid_to_hex(&oid)); @@ -94,13 +83,13 @@ int cmd_hash_object(int argc, int no_filters = 0; int literally = 0; int nongit = 0; - unsigned flags = HASH_OBJECT_CHECK; + unsigned flags = INDEX_FORMAT_CHECK; const char *vpath = NULL; char *vpath_free = NULL; const struct option hash_object_options[] = { OPT_STRING('t', NULL, &type, N_("type"), N_("object type")), OPT_BIT('w', NULL, &flags, N_("write the object into the object database"), - HASH_OBJECT_WRITE), + INDEX_WRITE_OBJECT), OPT_COUNTUP( 0 , "stdin", &hashstdin, N_("read the object from stdin")), OPT_BOOL( 0 , "stdin-paths", &stdin_paths, N_("read file names from stdin")), OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")), @@ -114,7 +103,7 @@ int cmd_hash_object(int argc, argc = parse_options(argc, argv, prefix, hash_object_options, hash_object_usage, 0); - if (flags & HASH_OBJECT_WRITE) + if (flags & INDEX_WRITE_OBJECT) prefix = setup_git_directory(); else prefix = setup_git_directory_gently(&nongit); From f710fd7b49218ce3407a88b2c548704299c7c664 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:50:10 -0400 Subject: [PATCH 33/52] hash-object: handle --literally with OPT_NEGBIT Since we recently removed the hash_literally() function, the hash-object --literally option has been simplified to just removing the INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool, we can just have the option parser remove the bit from the set of flags directly. This simplifies the helper functions. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/hash-object.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/builtin/hash-object.c b/builtin/hash-object.c index 1ecb70b551fe69..6a99ec250d028f 100644 --- a/builtin/hash-object.c +++ b/builtin/hash-object.c @@ -19,15 +19,11 @@ #include "strbuf.h" #include "write-or-die.h" -static void hash_fd(int fd, const char *type, const char *path, unsigned flags, - int literally) +static void hash_fd(int fd, const char *type, const char *path, unsigned flags) { struct stat st; struct object_id oid; - if (literally) - flags &= ~INDEX_FORMAT_CHECK; - if (fstat(fd, &st) < 0 || index_fd(the_repository->index, &oid, fd, &st, type_from_string(type), path, flags)) @@ -39,15 +35,14 @@ static void hash_fd(int fd, const char *type, const char *path, unsigned flags, } static void hash_object(const char *path, const char *type, const char *vpath, - unsigned flags, int literally) + unsigned flags) { int fd; fd = xopen(path, O_RDONLY); - hash_fd(fd, type, vpath, flags, literally); + hash_fd(fd, type, vpath, flags); } -static void hash_stdin_paths(const char *type, int no_filters, unsigned flags, - int literally) +static void hash_stdin_paths(const char *type, int no_filters, unsigned flags) { struct strbuf buf = STRBUF_INIT; struct strbuf unquoted = STRBUF_INIT; @@ -59,8 +54,7 @@ static void hash_stdin_paths(const char *type, int no_filters, unsigned flags, die("line is badly quoted"); strbuf_swap(&buf, &unquoted); } - hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags, - literally); + hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags); } strbuf_release(&buf); strbuf_release(&unquoted); @@ -81,7 +75,6 @@ int cmd_hash_object(int argc, int hashstdin = 0; int stdin_paths = 0; int no_filters = 0; - int literally = 0; int nongit = 0; unsigned flags = INDEX_FORMAT_CHECK; const char *vpath = NULL; @@ -93,7 +86,9 @@ int cmd_hash_object(int argc, OPT_COUNTUP( 0 , "stdin", &hashstdin, N_("read the object from stdin")), OPT_BOOL( 0 , "stdin-paths", &stdin_paths, N_("read file names from stdin")), OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")), - OPT_BOOL( 0, "literally", &literally, N_("just hash any random garbage to create corrupt objects for debugging Git")), + OPT_NEGBIT( 0, "literally", &flags, + N_("just hash any random garbage to create corrupt objects for debugging Git"), + INDEX_FORMAT_CHECK), OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")), OPT_END() }; @@ -139,7 +134,7 @@ int cmd_hash_object(int argc, } if (hashstdin) - hash_fd(0, type, vpath, flags, literally); + hash_fd(0, type, vpath, flags); for (i = 0 ; i < argc; i++) { const char *arg = argv[i]; @@ -148,12 +143,12 @@ int cmd_hash_object(int argc, if (prefix) arg = to_free = prefix_filename(prefix, arg); hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg, - flags, literally); + flags); free(to_free); } if (stdin_paths) - hash_stdin_paths(type, no_filters, flags, literally); + hash_stdin_paths(type, no_filters, flags); free(vpath_free); From 141f8c8c0535004fa5432d9a6d57bf08129a7dd8 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Fri, 16 May 2025 00:50:13 -0400 Subject: [PATCH 34/52] object-file: drop support for writing objects with unknown types Since "hash-object --literally" no longer supports objects with unknown types, there are now no callers of write_object_file_literally() and its helpers. Let's drop them to simplify the code. In particular, this gets rid of some ugly copy-and-paste code from write_object_file_literally(), which is a parallel implementation of write_object_file(). When the split was originally made, the two weren't that long, but commits like 63a6745a07 (object-file: update the loose object map when writing loose objects, 2023-10-01) ended up having to duplicate some tricky code. This patch drops all of that duplication and should make things less error-prone going forward. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- object-file.c | 81 ++++----------------------------------------------- object-file.h | 5 +--- 2 files changed, 6 insertions(+), 80 deletions(-) diff --git a/object-file.c b/object-file.c index b10e28352913c2..1ac04c2891634a 100644 --- a/object-file.c +++ b/object-file.c @@ -130,12 +130,6 @@ int has_loose_object(const struct object_id *oid) return check_and_freshen(oid, 0); } -static int format_object_header_literally(char *str, size_t size, - const char *type, size_t objsize) -{ - return xsnprintf(str, size, "%s %"PRIuMAX, type, (uintmax_t)objsize) + 1; -} - int format_object_header(char *str, size_t size, enum object_type type, size_t objsize) { @@ -144,7 +138,7 @@ int format_object_header(char *str, size_t size, enum object_type type, if (!name) BUG("could not get a type name for 'enum object_type' value %d", type); - return format_object_header_literally(str, size, name, objsize); + return xsnprintf(str, size, "%s %"PRIuMAX, name, (uintmax_t)objsize) + 1; } int check_object_signature(struct repository *r, const struct object_id *oid, @@ -558,17 +552,6 @@ static void write_object_file_prepare(const struct git_hash_algo *algo, hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen); } -static void write_object_file_prepare_literally(const struct git_hash_algo *algo, - const void *buf, unsigned long len, - const char *type, struct object_id *oid, - char *hdr, int *hdrlen) -{ - struct git_hash_ctx c; - - *hdrlen = format_object_header_literally(hdr, *hdrlen, type, len); - hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen); -} - #define CHECK_COLLISION_DEST_VANISHED -2 static int check_collision(const char *source, const char *dest) @@ -698,21 +681,14 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, return 0; } -static void hash_object_file_literally(const struct git_hash_algo *algo, - const void *buf, unsigned long len, - const char *type, struct object_id *oid) -{ - char hdr[MAX_HEADER_LEN]; - int hdrlen = sizeof(hdr); - - write_object_file_prepare_literally(algo, buf, len, type, oid, hdr, &hdrlen); -} - void hash_object_file(const struct git_hash_algo *algo, const void *buf, unsigned long len, enum object_type type, struct object_id *oid) { - hash_object_file_literally(algo, buf, len, type_name(type), oid); + char hdr[MAX_HEADER_LEN]; + int hdrlen = sizeof(hdr); + + write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen); } /* Finalize a file on disk, and close it. */ @@ -1114,53 +1090,6 @@ int write_object_file_flags(const void *buf, unsigned long len, return 0; } -int write_object_file_literally(const void *buf, unsigned long len, - const char *type, struct object_id *oid, - unsigned flags) -{ - char *header; - struct repository *repo = the_repository; - const struct git_hash_algo *algo = repo->hash_algo; - const struct git_hash_algo *compat = repo->compat_hash_algo; - struct object_id compat_oid; - int hdrlen, status = 0; - int compat_type = -1; - - if (compat) { - compat_type = type_from_string_gently(type, -1, 1); - if (compat_type == OBJ_BLOB) - hash_object_file(compat, buf, len, compat_type, - &compat_oid); - else if (compat_type != -1) { - struct strbuf converted = STRBUF_INIT; - convert_object_file(the_repository, - &converted, algo, compat, - buf, len, compat_type, 0); - hash_object_file(compat, converted.buf, converted.len, - compat_type, &compat_oid); - strbuf_release(&converted); - } - } - - /* type string, SP, %lu of the length plus NUL must fit this */ - hdrlen = strlen(type) + MAX_HEADER_LEN; - header = xmalloc(hdrlen); - write_object_file_prepare_literally(the_hash_algo, buf, len, type, - oid, header, &hdrlen); - - if (!(flags & WRITE_OBJECT_FILE_PERSIST)) - goto cleanup; - if (freshen_packed_object(oid) || freshen_loose_object(oid)) - goto cleanup; - status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0); - if (compat_type != -1) - return repo_add_loose_object_map(repo, oid, &compat_oid); - -cleanup: - free(header); - return status; -} - int force_object_loose(const struct object_id *oid, time_t mtime) { struct repository *repo = the_repository; diff --git a/object-file.h b/object-file.h index a979fd5e4da6ea..6f411424523932 100644 --- a/object-file.h +++ b/object-file.h @@ -159,7 +159,7 @@ int parse_loose_header(const char *hdr, struct object_info *oi); enum { /* - * By default, `write_object_file_literally()` does not actually write + * By default, `write_object_file()` does not actually write * anything into the object store, but only computes the object ID. * This flag changes that so that the object will be written as a loose * object and persisted. @@ -187,9 +187,6 @@ struct input_stream { int is_finished; }; -int write_object_file_literally(const void *buf, unsigned long len, - const char *type, struct object_id *oid, - unsigned flags); int stream_loose_object(struct input_stream *in_stream, size_t len, struct object_id *oid); From ea8a71b40d3fdc91180b951c829cdf41bb6f7da0 Mon Sep 17 00:00:00 2001 From: Moumita Dhar Date: Fri, 16 May 2025 20:15:12 +0530 Subject: [PATCH 35/52] userdiff: extend Bash pattern to cover more shell function forms The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options. Signed-off-by: Moumita Dhar Acked-by: Johannes Sixt Signed-off-by: Junio C Hamano --- .../bash-bashism-style-complete-line-capture | 4 +++ .../bash-posix-style-complete-line-capture | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 36 +++++++++++++++++++ t/t4034/bash/post | 31 ++++++++++++++++ t/t4034/bash/pre | 31 ++++++++++++++++ userdiff.c | 26 +++++++++----- 8 files changed, 128 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-complete-line-capture create mode 100644 t/t4018/bash-posix-style-complete-line-capture create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-complete-line-capture b/t/t4018/bash-bashism-style-complete-line-capture new file mode 100644 index 00000000000000..070b979fa6a93b --- /dev/null +++ b/t/t4018/bash-bashism-style-complete-line-capture @@ -0,0 +1,4 @@ +function myfunc # RIGHT +{ + echo 'ChangeMe' +} diff --git a/t/t4018/bash-posix-style-complete-line-capture b/t/t4018/bash-posix-style-complete-line-capture new file mode 100644 index 00000000000000..b56942f322aff1 --- /dev/null +++ b/t/t4018/bash-posix-style-complete-line-capture @@ -0,0 +1,4 @@ +func() { # RIGHT + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 00000000000000..398ae1c5d2fb22 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f101cf..0be647c2fbc05c 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 00000000000000..1864ab25dc76bb --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,36 @@ +diff --git a/pre b/post +index 09ac008..60ba6a2 100644 +--- a/pre ++++ b/post +@@ -1,31 +1,31 @@ +my_varnew_var=10 +x=123456 +echo $1$2 +echo $USER$USERNAME +${HOMEHOMEDIR} +((a++=b)) +((a**=b)) +((a//=b)) +((a%%=b)) +((a||=b)) +((a^^=b)) +((a===b)) +((a!!=b)) +((a<<=b)) +((a>>=b)) +$((a<<<b)) +$((a>>>b)) +$((a&&&b)) +$((a|||b)) +${a::-b} +${a::=b} +${a::+b} +${a::?b} +${a###*/} +${a%%%.*} +${a^^^} +${a,,,} +${!a} +${a[*@]} +ls -a-x +ls --all--color diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 00000000000000..2bbee8936dc1a3 --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,31 @@ +new_var=10 +x=456 +echo $2 +echo $USERNAME +${HOMEDIR} +((a+=b)) +((a*=b)) +((a/=b)) +((a%=b)) +((a|=b)) +((a^=b)) +((a==b)) +((a!=b)) +((a<=b)) +((a>=b)) +$((a<>b)) +$((a&&b)) +$((a||b)) +${a:-b} +${a:=b} +${a:+b} +${a:?b} +${a##*/} +${a%%.*} +${a^^} +${a,,} +${!a} +${a[@]} +ls -x +ls --color diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 00000000000000..8d22039c40a5de --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,31 @@ +my_var=10 +x=123 +echo $1 +echo $USER +${HOME} +((a+b)) +((a*b)) +((a/b)) +((a%b)) +((a|b)) +((a^b)) +((a=b)) +((a!b)) +((ab)) +$((ab)) +$((a&b)) +$((a|b)) +${a:b} +${a:b} +${a:b} +${a:b} +${a#*/} +${a%.*} +${a^} +${a,} +${a} +${a[*]} +ls -a +ls --all diff --git a/userdiff.c b/userdiff.c index da75625020e34c..05776ccd10401c 100644 --- a/userdiff.c +++ b/userdiff.c @@ -59,20 +59,30 @@ PATTERNS("bash", "(" "(" /* POSIX identifier with mandatory parentheses */ - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z0-9_]+|\\$\\{" + /*Command list separators and redirection operators */ + "|\\|\\||&&|<<|>>" + /* Operators ending in '=' (comparison + compound assignment) */ + "|==|!=|<=|>=|[-+*/%&|^]=" + /* Additional parameter expansion operators */ + "|:=|:-|:\\+|:\\?|##|%%|\\^\\^|,," + /* Command-line options (to avoid splitting -option) */ + "|[-a-zA-Z0-9_]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ From 952de281fe63eb03e0dcc8adf773ce54cb581b83 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 16 May 2025 14:55:27 +0000 Subject: [PATCH 36/52] apply: integrate with the sparse index The sparse index allows storing directory entries in the index, marked with the skip-wortkree bit and pointing to a tree object. This may be an unexpected data shape for some implementation areas, so we are rolling it out incrementally on a builtin-per-builtin basis. This change enables the sparse index for 'git apply'. The main motivation for this change is that 'git apply' is used as a child process of 'git add -p' and expanding the sparse index for each of those child processes can lead to significant performance issues. The good news is that the actual index manipulation code used by 'git apply' is already integrated with the sparse index, so the only product change is to mark the builtin as allowing the sparse index so it isn't inflated on read. The more involved part of this change is around adding tests that verify how 'git apply' behaves in a sparse-checkout environment and whether or not the index expands in certain operations. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/apply.c | 7 +++- t/t1092-sparse-checkout-compatibility.sh | 53 ++++++++++++++++++++++++ 2 files changed, 59 insertions(+), 1 deletion(-) diff --git a/builtin/apply.c b/builtin/apply.c index 84f1863d3ac349..a1e20c593d0903 100644 --- a/builtin/apply.c +++ b/builtin/apply.c @@ -12,7 +12,7 @@ static const char * const apply_usage[] = { int cmd_apply(int argc, const char **argv, const char *prefix, - struct repository *repo UNUSED) + struct repository *repo) { int force_apply = 0; int options = 0; @@ -35,6 +35,11 @@ int cmd_apply(int argc, &state, &force_apply, &options, apply_usage); + if (repo) { + prepare_repo_settings(repo); + repo->settings.command_requires_full_index = 0; + } + if (check_apply_state(&state, force_apply)) exit(128); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index a4c7c41fc00aa3..fa2472010d8abb 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1340,6 +1340,30 @@ test_expect_success 'submodule handling' ' grep "160000 $(git -C initial-repo rev-parse HEAD) 0 modules/sub" cache ' +test_expect_success 'git apply functionality' ' + init_repos && + + test_all_match git checkout base && + + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse && + git -C full-checkout diff base..merge-right -- folder2 >patch-outside && + + # Apply a patch to a file inside the sparse definition + test_all_match git apply --index --stat ../patch-in-sparse && + test_all_match git status --porcelain=v2 && + + # Apply a patch to a file outside the sparse definition + test_sparse_match test_must_fail git apply ../patch-outside && + grep "No such file or directory" sparse-checkout-err && + + # But it works with --index and --cached + test_all_match git apply --index --stat ../patch-outside && + test_all_match git status --porcelain=v2 && + test_all_match git reset --hard && + test_all_match git apply --cached --stat ../patch-outside && + test_all_match git status --porcelain=v2 +' + # When working with a sparse index, some commands will need to expand the # index to operate properly. If those commands also write the index back # to disk, they need to convert the index to sparse before writing. @@ -2347,6 +2371,35 @@ test_expect_success 'sparse-index is not expanded: check-attr' ' ensure_not_expanded check-attr -a --cached -- folder1/a ' +test_expect_success 'sparse-index is not expanded: git apply' ' + init_repos && + + git -C sparse-index checkout base && + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse && + git -C full-checkout diff base..merge-right -- folder2 >patch-outside && + + # Apply a patch to a file inside the sparse definition + ensure_not_expanded apply --index --stat ../patch-in-sparse && + + # Apply a patch to a file outside the sparse definition + # Fails when caring about the worktree. + ensure_not_expanded ! apply ../patch-outside && + + # Expands when using --index. + ensure_expanded apply --index ../patch-outside && + + # Does not when index is partially expanded. + git -C sparse-index reset --hard && + ensure_not_expanded apply --cached ../patch-outside && + + # Try again with a reset and collapsed index. + git -C sparse-index reset --hard && + git -C sparse-index sparse-checkout reapply && + + # Expands when index is collapsed. + ensure_expanded apply --cached ../patch-outside +' + test_expect_success 'advice.sparseIndexExpanded' ' init_repos && From 02ed8555f68440c5f533ad3c098ac01fc8965861 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 16 May 2025 14:55:28 +0000 Subject: [PATCH 37/52] git add: make -p/-i aware of sparse index It is slow to expand a sparse index in-memory due to parsing of trees. We aim to minimize that performance cost when possible. 'git add -p' uses 'git apply' child processes to modify the index, but still there are some expansions that occur. It turns out that control flows out of cmd_add() in the interactive cases before the lines that confirm that the builtin is integrated with the sparse index. Moving that integration point earlier in cmd_add() allows 'git add -i' and 'git add -p' to operate without expanding a sparse index to a full one. Add test cases that confirm that these interactive add options work with the sparse index. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/add.c | 7 +-- t/t1092-sparse-checkout-compatibility.sh | 60 ++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/builtin/add.c b/builtin/add.c index 78dfb265776724..b96360dc5cf96d 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -391,6 +391,10 @@ int cmd_add(int argc, argc = parse_options(argc, argv, prefix, builtin_add_options, builtin_add_usage, PARSE_OPT_KEEP_ARGV0); + + prepare_repo_settings(repo); + repo->settings.command_requires_full_index = 0; + if (patch_interactive) add_interactive = 1; if (add_interactive) { @@ -427,9 +431,6 @@ int cmd_add(int argc, add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize; require_pathspec = !(take_worktree_changes || (0 < addremove_explicit)); - prepare_repo_settings(repo); - repo->settings.command_requires_full_index = 0; - repo_hold_locked_index(repo, &lock_file, LOCK_DIE_ON_ERROR); /* diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index fa2472010d8abb..f47cf8fa7fdfab 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -384,6 +384,38 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout - ' +test_expect_success 'git add -p' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + # Does not expand when edits are within sparse checkout. + run_on_all ../edit-contents deep/a && + run_on_all ../edit-contents deep/deeper1/a && + + test_write_lines y n >in && + run_on_all git add -p in && + run_on_all git add -i in && + run_on_all git add -p in && + run_on_all git add -i sparse-index/deep/a && + echo "new content" >sparse-index/deep/deeper1/a && + test_write_lines y n >in && + ensure_not_expanded add -p sparse-index/folder1/a && + test_write_lines y n y >in && + ensure_expanded add -p sparse-index/folder1/a && + test_write_lines u 2 3 "" q >in && + ensure_expanded add -i Date: Fri, 16 May 2025 14:55:29 +0000 Subject: [PATCH 38/52] reset: integrate sparse index with --patch Similar to the previous change for 'git add -p', the reset builtin checked for integration with the sparse index after possibly redirecting its logic toward the interactive logic. This means that the builtin would expand the sparse index to a full one upon read. Move this check earlier within cmd_reset() to improve performance here. Add tests to guarantee that we are not universally expanding the index. Add behavior tests to check that we are doing the same operations as a full index. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/reset.c | 6 ++-- t/t1092-sparse-checkout-compatibility.sh | 42 ++++++++++++++++++++++-- 2 files changed, 43 insertions(+), 5 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 73b4537a9a567d..dc50ffc1ac59e8 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -420,6 +420,9 @@ int cmd_reset(int argc, oidcpy(&oid, &tree->object.oid); } + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + if (patch_mode) { if (reset_type != NONE) die(_("options '%s' and '%s' cannot be used together"), "--patch", "--{hard,mixed,soft}"); @@ -457,9 +460,6 @@ int cmd_reset(int argc, if (intent_to_add && reset_type != MIXED) die(_("the option '%s' requires '%s'"), "-N", "--mixed"); - prepare_repo_settings(the_repository); - the_repository->settings.command_requires_full_index = 0; - if (repo_read_index(the_repository) < 0) die(_("index file corrupt")); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index f47cf8fa7fdfab..e11dfd872ecacf 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -384,7 +384,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout - ' -test_expect_success 'git add -p' ' +test_expect_success 'git add, checkout, and reset with -p' ' init_repos && write_script edit-contents <<-\EOF && @@ -398,7 +398,7 @@ test_expect_success 'git add -p' ' test_write_lines y n >in && run_on_all git add -p in && run_on_all git add -i in && run_on_all git add -i in && + test_sparse_match git checkout HEAD~1 --patch sparse-index/deep/a && + echo "new content" >sparse-index/deep/deeper1/a && + git -C sparse-index commit -a -m "inside-changes" && + + test_write_lines y y >in && + ensure_not_expanded checkout HEAD~1 --patch sparse-index/deep/a && + echo "new content" >sparse-index/deep/deeper1/a && + git -C sparse-index add . && + ensure_not_expanded reset --patch sparse-index/folder1/a && + git -C sparse-index add --sparse folder1 && + git -C sparse-index sparse-checkout reapply && + ensure_expanded reset --patch sparse-index/folder1/a && + git -C sparse-index add --sparse folder1 && + git -C sparse-index commit -m "folder1 change" && + git -C sparse-index sparse-checkout reapply && + ensure_expanded checkout HEAD~1 --patch Date: Fri, 16 May 2025 14:55:30 +0000 Subject: [PATCH 39/52] p2000: add performance test for patch-mode commands The previous three changes contributed performance improvements to 'git apply', 'git add -p', and 'git reset -p' when using a sparse index. The improvement to 'git apply' also improved 'git checkout -p'. Add performance tests to demonstrate this (and to help validate that performance remains good in the future). In the truncated test output below, we see that the full checkout performance changes within noise expectations, but the sparse index cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for 'git reset -p'. 'git checkout -p' improves immediatley by 91% because it does not need any change to its builtin. Test HEAD~4 HEAD~3 HEAD~2 HEAD~1 ------------------------------------------------------------------------------------- 2000.118: ... git add -p (full-v3) 0.79 0.79 +0.0% 0.82 +3.8% 0.82 +3.8% 2000.119: ... git add -p (full-v4) 0.74 0.76 +2.7% 0.74 +0.0% 0.76 +2.7% 2000.120: ... git add -p (sparse-v3) 1.94 1.28 -34.0% 0.07 -96.4% 0.07 -96.4% 2000.121: ... git add -p (sparse-v4) 1.93 1.28 -33.7% 0.06 -96.9% 0.06 -96.9% 2000.122: ... git checkout -p (full-v3) 1.18 1.18 +0.0% 1.18 +0.0% 1.19 +0.8% 2000.123: ... git checkout -p (full-v4) 1.10 1.12 +1.8% 1.11 +0.9% 1.11 +0.9% 2000.124: ... git checkout -p (sparse-v3) 1.31 0.11 -91.6% 0.11 -91.6% 0.11 -91.6% 2000.125: ... git checkout -p (sparse-v4) 1.29 0.11 -91.5% 0.11 -91.5% 0.11 -91.5% 2000.126: ... git reset -p (full-v3) 0.81 0.80 -1.2% 0.83 +2.5% 0.83 +2.5% 2000.127: ... git reset -p (full-v4) 0.78 0.77 -1.3% 0.77 -1.3% 0.78 +0.0% 2000.128: ... git reset -p (sparse-v3) 1.58 0.92 -41.8% 0.91 -42.4% 0.07 -95.6% 2000.129: ... git reset -p (sparse-v4) 1.58 0.92 -41.8% 0.92 -41.8% 0.07 -95.6% It is worth noting that if our test was more involved and had multiple hunks to evaluate, then the time spent in 'git apply' would dominate due to multiple index loads and writes. As it stands, we need the sparse index improvement in 'git add -p' itself to confirm this performance improvement. Since the change for 'git add -i' is identical, we avoid a second test case for that similar operation. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- t/perf/p2000-sparse-operations.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 39e92b0841437b..aadf22bc2f0bb2 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -135,5 +135,8 @@ test_perf_on_all git diff-tree HEAD test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a test_perf_on_all "git worktree add ../temp && git worktree remove ../temp" test_perf_on_all git check-attr -a -- $SPARSE_CONE/a +test_perf_on_all 'echo >>a && test_write_lines y | git add -p' +test_perf_on_all 'test_write_lines y y y | git checkout --patch -' +test_perf_on_all 'echo >>a && git add a && test_write_lines y | git reset --patch' test_done From e42667241de12840ef58c0ba1c060b86c850bae0 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Fri, 16 May 2025 16:26:26 +0000 Subject: [PATCH 40/52] sequencer: make it clearer that commit descriptions are just comments Every once in a while, users report that editing the commit summaries in the todo list does not get reflected in the rebase operation, suggesting that users are (a) only using one-line commit messages, and (b) not understanding that the commit summaries are merely helpful comments to help them find the right hashes. It may be difficult to correct users' poor commit messages, but we can at least try to make it clearer that the commit summaries are not directives of some sort by inserting a comment character. Hopefully that leads to them looking a little further and noticing the hints at the bottom to use 'reword' or 'edit' directives. Yes, this change may look funny at first since it hardcodes '#' rather than using comment_line_str. However: * comment_line_str exists to allow disambiguation between lines in a commit message and lines that are instructions to users editing the commit message. No such disambiguation is needed for these comments that occur on the same line after existing directives * the exact "comment" character(s) on regular pick lines used aren't actually important; I could have used anything, including completely random variable length text for each line and it'd work because we ignore everything after 'pick' and the hash. * The whole point of this change is to signal to users that they should NOT be editing any part of the line after the hash (and if they do so, their edits will be ignored), while the whole point of comment_line_str is to allow highly flexible editing. So making it more general by using comment_line_str actually feels counterproductive. * The character for merge directives absolutely must be '#'; that has been deeply hardcoded for a long time (see below), and will break if some other comment character is used instead. In a desire to have pick and merge directives be similar, I use the same comment character for both. * Perhaps merge directives could be fixed to not be inflexible about the comment character used, if someone feels highly motivated, but I think that should be done in a separate follow-on patch. Here are (some of?) the locations where '#' has already been hardcoded for a long time for merges: 1) In check_label_or_ref_arg(): case TODO_LABEL: /* * '#' is not a valid label as the merge command uses it to * separate merge parents from the commit subject. */ 2) In do_merge(): /* * For octopus merges, the arg starts with the list of revisions to be * merged. The list is optionally followed by '#' and the oneline. */ merge_arg_len = oneline_offset = arg_len; for (p = arg; p - arg < arg_len; p += strspn(p, " \t\n")) { if (!*p) break; if (*p == '#' && (!p[1] || isspace(p[1]))) { 3) In label_oid(): if ((buf->len == the_hash_algo->hexsz && !get_oid_hex(label, &dummy)) || (buf->len == 1 && *label == '#') || hashmap_get_from_hash(&state->labels, strihash(label), label)) { /* * If the label already exists, or if the label is a * valid full OID, or the label is a '#' (which we use * as a separator between merge heads and oneline), we * append a dash and a number to make it unique. */ Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- sequencer.c | 16 +++++-- t/t3404-rebase-interactive.sh | 54 +++++++++++----------- t/t3415-rebase-autosquash.sh | 14 +++--- t/t3430-rebase-merges.sh | 10 ++-- t/t5520-pull.sh | 2 +- t/t7512-status-help.sh | 86 +++++++++++++++++------------------ 6 files changed, 94 insertions(+), 88 deletions(-) diff --git a/sequencer.c b/sequencer.c index b5c4043757e948..2735966544bc8a 100644 --- a/sequencer.c +++ b/sequencer.c @@ -5901,11 +5901,11 @@ static int make_script_with_merges(struct pretty_print_context *pp, /* Create a label from the commit message */ strbuf_reset(&label_from_message); - if (skip_prefix(oneline.buf, "Merge ", &p1) && + if (skip_prefix(oneline.buf, "# Merge ", &p1) && (p1 = strchr(p1, '\'')) && (p2 = strchr(++p1, '\''))) strbuf_add(&label_from_message, p1, p2 - p1); - else if (skip_prefix(oneline.buf, "Merge pull request ", + else if (skip_prefix(oneline.buf, "# Merge pull request ", &p1) && (p1 = strstr(p1, " from "))) strbuf_addstr(&label_from_message, p1 + strlen(" from ")); @@ -5940,7 +5940,7 @@ static int make_script_with_merges(struct pretty_print_context *pp, strbuf_addstr(&buf, label_oid(oid, label, &state)); } - strbuf_addf(&buf, " # %s", oneline.buf); + strbuf_addf(&buf, " %s", oneline.buf); FLEX_ALLOC_STR(entry, string, buf.buf); oidcpy(&entry->entry.oid, &commit->object.oid); @@ -6022,7 +6022,7 @@ static int make_script_with_merges(struct pretty_print_context *pp, else { strbuf_reset(&oneline); pretty_print_commit(pp, commit, &oneline); - strbuf_addf(out, "%s %s # %s\n", + strbuf_addf(out, "%s %s %s\n", cmd_reset, to, oneline.buf); } } @@ -6090,8 +6090,14 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc, git_config_get_string("rebase.instructionFormat", &format); if (!format || !*format) { free(format); - format = xstrdup("%s"); + format = xstrdup("# %s"); } + if (*format != '#') { + char *temp = format; + format = xstrfmt("# %s", temp); + free(temp); + } + get_commit_format(format, &revs); free(format); pp.fmt = revs.commit_format; diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh index 2aee9789a2fae2..6bac217ed3555e 100755 --- a/t/t3404-rebase-interactive.sh +++ b/t/t3404-rebase-interactive.sh @@ -1468,7 +1468,7 @@ test_expect_success 'rebase -i respects rebase.missingCommitsCheck = warn' ' cat >expect <<-EOF && Warning: some commits may have been dropped accidentally. Dropped commits (newer to older): - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary) + - $(git log --format="%h # %s" -1 primary) To avoid this message, use "drop" to explicitly remove a commit. EOF test_config rebase.missingCommitsCheck warn && @@ -1486,8 +1486,8 @@ test_expect_success 'rebase -i respects rebase.missingCommitsCheck = error' ' cat >expect <<-EOF && Warning: some commits may have been dropped accidentally. Dropped commits (newer to older): - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary) - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary~2) + - $(git log --format="%h # %s" -1 primary) + - $(git log --format="%h # %s" -1 primary~2) To avoid this message, use "drop" to explicitly remove a commit. Use '\''git config rebase.missingCommitsCheck'\'' to change the level of warnings. @@ -1530,11 +1530,11 @@ test_expect_success 'rebase --edit-todo respects rebase.missingCommitsCheck = ig test_expect_success 'rebase --edit-todo respects rebase.missingCommitsCheck = warn' ' cat >expect <<-EOF && error: invalid command '\''pickled'\'' - error: invalid line 1: pickled $(git rev-list --pretty=oneline --abbrev-commit -1 primary~4) + error: invalid line 1: pickled $(git log --format="%h # %s" -1 primary~4) Warning: some commits may have been dropped accidentally. Dropped commits (newer to older): - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary) - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary~4) + - $(git log --format="%h # %s" -1 primary) + - $(git log --format="%h # %s" -1 primary~4) To avoid this message, use "drop" to explicitly remove a commit. EOF head -n5 expect >expect.2 && @@ -1565,11 +1565,11 @@ test_expect_success 'rebase --edit-todo respects rebase.missingCommitsCheck = wa test_expect_success 'rebase --edit-todo respects rebase.missingCommitsCheck = error' ' cat >expect <<-EOF && error: invalid command '\''pickled'\'' - error: invalid line 1: pickled $(git rev-list --pretty=oneline --abbrev-commit -1 primary~4) + error: invalid line 1: pickled $(git log --format="%h # %s" -1 primary~4) Warning: some commits may have been dropped accidentally. Dropped commits (newer to older): - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary) - - $(git rev-list --pretty=oneline --abbrev-commit -1 primary~4) + - $(git log --format="%h # %s" -1 primary) + - $(git log --format="%h # %s" -1 primary~4) To avoid this message, use "drop" to explicitly remove a commit. Use '\''git config rebase.missingCommitsCheck'\'' to change the level of warnings. @@ -1642,11 +1642,11 @@ test_expect_success 'respects rebase.abbreviateCommands with fixup, squash and e test_commit "fixup! first" file2.txt "first line again" first_fixup && test_commit "squash! second" file1.txt "another line here" second_squash && cat >expected <<-EOF && - p $(git rev-list --abbrev-commit -1 first) first - f $(git rev-list --abbrev-commit -1 first_fixup) fixup! first + p $(git rev-list --abbrev-commit -1 first) # first + f $(git rev-list --abbrev-commit -1 first_fixup) # fixup! first x git show HEAD - p $(git rev-list --abbrev-commit -1 second) second - s $(git rev-list --abbrev-commit -1 second_squash) squash! second + p $(git rev-list --abbrev-commit -1 second) # second + s $(git rev-list --abbrev-commit -1 second_squash) # squash! second x git show HEAD EOF git checkout abbrevcmd && @@ -1665,7 +1665,7 @@ test_expect_success 'static check of bad command' ' set_fake_editor && test_must_fail env FAKE_LINES="1 2 3 bad 4 5" \ git rebase -i --root 2>actual && - test_grep "pickled $(git rev-list --oneline -1 primary~1)" \ + test_grep "pickled $(git log --format="%h # %s" -1 primary~1)" \ actual && test_grep "You can fix this with .git rebase --edit-todo.." \ actual && @@ -1865,15 +1865,15 @@ test_expect_success '--update-refs adds label and update-ref commands' ' set_cat_todo_editor && cat >expect <<-EOF && - pick $(git log -1 --format=%h J) J - fixup $(git log -1 --format=%h update-refs) fixup! J # empty + pick $(git log -1 --format=%h J) # J + fixup $(git log -1 --format=%h update-refs) # fixup! J # empty update-ref refs/heads/second update-ref refs/heads/first - pick $(git log -1 --format=%h K) K - pick $(git log -1 --format=%h L) L - fixup $(git log -1 --format=%h is-not-reordered) fixup! L # empty + pick $(git log -1 --format=%h K) # K + pick $(git log -1 --format=%h L) # L + fixup $(git log -1 --format=%h is-not-reordered) # fixup! L # empty update-ref refs/heads/third - pick $(git log -1 --format=%h M) M + pick $(git log -1 --format=%h M) # M update-ref refs/heads/no-conflict-branch update-ref refs/heads/is-not-reordered update-ref refs/heads/shared-tip @@ -1905,19 +1905,19 @@ test_expect_success '--update-refs adds commands with --rebase-merges' ' cat >expect <<-EOF && label onto reset onto - pick $(git log -1 --format=%h branch2~1) F - pick $(git log -1 --format=%h branch2) I + pick $(git log -1 --format=%h branch2~1) # F + pick $(git log -1 --format=%h branch2) # I update-ref refs/heads/branch2 label branch2 reset onto - pick $(git log -1 --format=%h refs/heads/second) J + pick $(git log -1 --format=%h refs/heads/second) # J update-ref refs/heads/second update-ref refs/heads/first - pick $(git log -1 --format=%h refs/heads/third~1) K - pick $(git log -1 --format=%h refs/heads/third) L - fixup $(git log -1 --format=%h update-refs-with-merge) fixup! L # empty + pick $(git log -1 --format=%h refs/heads/third~1) # K + pick $(git log -1 --format=%h refs/heads/third) # L + fixup $(git log -1 --format=%h update-refs-with-merge) # fixup! L # empty update-ref refs/heads/third - pick $(git log -1 --format=%h HEAD~2) M + pick $(git log -1 --format=%h HEAD~2) # M update-ref refs/heads/no-conflict-branch merge -C $(git log -1 --format=%h HEAD~1) branch2 # merge update-ref refs/heads/merge-branch diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh index fcc40d6fe1fd5b..26b42a526a1944 100755 --- a/t/t3415-rebase-autosquash.sh +++ b/t/t3415-rebase-autosquash.sh @@ -257,8 +257,8 @@ test_expect_success 'auto squash of fixup commit that matches branch name which GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ && sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p;}" tmp >actual && cat <<-EOF >expect && - pick HASH second commit - pick HASH fixup! self-cycle # empty + pick HASH # second commit + pick HASH # fixup! self-cycle # empty EOF test_cmp expect actual ' @@ -311,10 +311,10 @@ test_auto_fixup_fixup () { parent2=$(git rev-parse --short HEAD^^) && parent3=$(git rev-parse --short HEAD^^^) && cat >expected <<-EOF && - pick $parent3 first commit - $1 $parent1 $1! first - $1 $head $1! $2! first - pick $parent2 second commit + pick $parent3 # first commit + $1 $parent1 # $1! first + $1 $head # $1! $2! first + pick $parent2 # second commit EOF test_cmp expected actual ) && @@ -389,7 +389,7 @@ test_expect_success 'autosquash with empty custom instructionFormat' ' set_cat_todo_editor && test_must_fail git -c rebase.instructionFormat= \ rebase --autosquash --force-rebase -i HEAD^ >actual && - git log -1 --format="pick %h %s" >expect && + git log -1 --format="pick %h # %s" >expect && test_cmp expect actual ) ' diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh index b84d68c4b96bc9..5f8fa05420c269 100755 --- a/t/t3430-rebase-merges.sh +++ b/t/t3430-rebase-merges.sh @@ -106,18 +106,18 @@ test_expect_success 'generate correct todo list' ' label onto reset onto - pick $b B + pick $b # B label first reset onto - pick $c C + pick $c # C label branch-point - pick $f F - pick $g G + pick $f # F + pick $g # G label second reset branch-point # C - pick $d D + pick $d # D merge -C $e first # E merge -C $h second # H diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh index 47534f1062d203..63c9a8f04b1cb9 100755 --- a/t/t5520-pull.sh +++ b/t/t5520-pull.sh @@ -813,7 +813,7 @@ test_expect_success 'git pull --rebase does not reapply old patches' ' cd dst && test_must_fail git pull --rebase && cat .git/rebase-merge/done .git/rebase-merge/git-rebase-todo >work && - grep -v -e \# -e ^$ work >patches && + grep -v -e ^\# -e ^$ work >patches && test_line_count = 1 patches && rm -f work ) diff --git a/t/t7512-status-help.sh b/t/t7512-status-help.sh index 802f8f704c62eb..25e8e9711f8fef 100755 --- a/t/t7512-status-help.sh +++ b/t/t7512-status-help.sh @@ -139,7 +139,7 @@ test_expect_success 'status during rebase -i when conflicts unresolved' ' cat >expected <expected <expected <expected <expected <expected <expected <expected <expected <expected <expected <expected <expected <expected <expected <expected < Date: Fri, 16 May 2025 20:04:17 +0000 Subject: [PATCH 41/52] merge-ort: add a new mergeability_only option Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new mergeability_only option. This option allows the merge machinery to, in the "outer layer" of the merge: * exit upon first[-ish] conflict * avoid (not prevent) writing merged blobs/trees to the object store I have a number of qualifiers there, so let me explain each: "outer layer": Note that since the recursive merge of merge bases (corresponding to call_depth > 0) can conflict without the outer final merge (corresponding to call_depth == 0) conflicting, we can't short-circuit nor avoid writing merged blobs/trees to the object store during those inner merges. "first-ish conflict": The current patch only exits early from process_entries() on the first conflict it detects, but conflicts could have been detected in a previous function call, namely detect_and_process_renames(). However: * conflicts detected by detect_and_process_renames() are quite rare conflict types * the detection would still come after regular rename detection (which is the expensive part of detect_and_process_renames()), so it is not saving us much in computation time given that process_entries() directly follows detect_and_process_renames() * [this overlaps with the next bullet point] process_entries() is the place where virtually all object writing occurs (object writing is sometimes more of a concern for Forges than computation time), so exiting early here isn't saving us much in object writes either * the code changes needed to handle an earlier exit are slightly more invasive in detect_and_process_renames() than for process_entries(). Given the rareness of the even earlier conflicts, the limited savings we'd get from exiting even earlier, and in an attempt to keep this patch simpler, we don't guarantee that we actually exit on the first conflict detected. We can always revisit this decision later if we decide that a further micro-optimization to exit slightly earlier in rare cases is worthwhile. "avoid (not prevent) writing objects": The detect_and_process_renames() call can also write objects to the object store, when rename/rename conflicts involve one (or more) files that have also been modified on both sides. Because of this alternate call path leading to handle_content_merges(), our "early exit" does not prevent writing objects entirely, even within the "outer layer" (i.e. even within call_depth == 0). I figure that's fine though, since we're already writing objects for the inner merges (i.e. for call_depth > 0), which are likely going to represent vastly more objects than files involved in rename/rename+modify/modify cases in the outer merge, on average. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- merge-ort.c | 38 +++++++++++++++++++++++++++++++------- merge-ort.h | 1 + 2 files changed, 32 insertions(+), 7 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 77310a4a52c972..47b3d1730ece36 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2127,6 +2127,7 @@ static int handle_content_merge(struct merge_options *opt, const struct version_info *b, const char *pathnames[3], const int extra_marker_size, + const int record_object, struct version_info *result) { /* @@ -2214,7 +2215,7 @@ static int handle_content_merge(struct merge_options *opt, ret = -1; } - if (!ret && + if (!ret && record_object && write_object_file(result_buf.ptr, result_buf.size, OBJ_BLOB, &result->oid)) { path_msg(opt, ERROR_OBJECT_WRITE_FAILED, 0, @@ -2897,6 +2898,7 @@ static int process_renames(struct merge_options *opt, struct version_info merged; struct conflict_info *base, *side1, *side2; unsigned was_binary_blob = 0; + const int record_object = true; pathnames[0] = oldpath; pathnames[1] = newpath; @@ -2947,6 +2949,7 @@ static int process_renames(struct merge_options *opt, &side2->stages[2], pathnames, 1 + 2 * opt->priv->call_depth, + record_object, &merged); if (clean_merge < 0) return -1; @@ -3061,6 +3064,7 @@ static int process_renames(struct merge_options *opt, struct conflict_info *base, *side1, *side2; int clean; + const int record_object = true; pathnames[0] = oldpath; pathnames[other_source_index] = oldpath; @@ -3080,6 +3084,7 @@ static int process_renames(struct merge_options *opt, &side2->stages[2], pathnames, 1 + 2 * opt->priv->call_depth, + record_object, &merged); if (clean < 0) return -1; @@ -3931,9 +3936,12 @@ static int write_completed_directory(struct merge_options *opt, * Write out the tree to the git object directory, and also * record the mode and oid in dir_info->result. */ + int record_tree = (!opt->mergeability_only || + opt->priv->call_depth); dir_info->is_null = 0; dir_info->result.mode = S_IFDIR; - if (write_tree(&dir_info->result.oid, &info->versions, offset, + if (record_tree && + write_tree(&dir_info->result.oid, &info->versions, offset, opt->repo->hash_algo->rawsz) < 0) ret = -1; } @@ -4231,10 +4239,13 @@ static int process_entry(struct merge_options *opt, struct version_info *o = &ci->stages[0]; struct version_info *a = &ci->stages[1]; struct version_info *b = &ci->stages[2]; + int record_object = (!opt->mergeability_only || + opt->priv->call_depth); clean_merge = handle_content_merge(opt, path, o, a, b, ci->pathnames, opt->priv->call_depth * 2, + record_object, &merged_file); if (clean_merge < 0) return -1; @@ -4395,6 +4406,8 @@ static int process_entries(struct merge_options *opt, STRING_LIST_INIT_NODUP, NULL, 0 }; int ret = 0; + const int record_tree = (!opt->mergeability_only || + opt->priv->call_depth); trace2_region_enter("merge", "process_entries setup", opt->repo); if (strmap_empty(&opt->priv->paths)) { @@ -4454,6 +4467,12 @@ static int process_entries(struct merge_options *opt, ret = -1; goto cleanup; }; + if (!ci->merged.clean && opt->mergeability_only && + !opt->priv->call_depth) { + ret = 0; + goto cleanup; + } + } } trace2_region_leave("merge", "processing", opt->repo); @@ -4468,7 +4487,8 @@ static int process_entries(struct merge_options *opt, fflush(stdout); BUG("dir_metadata accounting completely off; shouldn't happen"); } - if (write_tree(result_oid, &dir_metadata.versions, 0, + if (record_tree && + write_tree(result_oid, &dir_metadata.versions, 0, opt->repo->hash_algo->rawsz) < 0) ret = -1; cleanup: @@ -4715,6 +4735,8 @@ void merge_display_update_messages(struct merge_options *opt, if (opt->record_conflict_msgs_as_headers) BUG("Either display conflict messages or record them as headers, not both"); + if (opt->mergeability_only) + BUG("Displaying conflict messages incompatible with mergeability-only checks"); trace2_region_enter("merge", "display messages", opt->repo); @@ -5171,10 +5193,12 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, result->path_messages = &opt->priv->conflicts; if (result->clean >= 0) { - result->tree = parse_tree_indirect(&working_tree_oid); - if (!result->tree) - die(_("unable to read tree (%s)"), - oid_to_hex(&working_tree_oid)); + if (!opt->mergeability_only) { + result->tree = parse_tree_indirect(&working_tree_oid); + if (!result->tree) + die(_("unable to read tree (%s)"), + oid_to_hex(&working_tree_oid)); + } /* existence of conflicted entries implies unclean */ result->clean &= strmap_empty(&opt->priv->conflicted); } diff --git a/merge-ort.h b/merge-ort.h index 30750c03962f2c..6045579825da8b 100644 --- a/merge-ort.h +++ b/merge-ort.h @@ -83,6 +83,7 @@ struct merge_options { /* miscellaneous control options */ const char *subtree_shift; unsigned renormalize : 1; + unsigned mergeability_only : 1; /* exit early, write fewer objects */ unsigned record_conflict_msgs_as_headers : 1; const char *msg_header_prefix; From 29d7bf19512d8ca97be5cf708ca2e0bcc29408ab Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Fri, 16 May 2025 20:04:18 +0000 Subject: [PATCH 42/52] merge-tree: add a new --quiet flag Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new --quiet flag which will make use of the new mergeability_only flag added to merge-ort in the previous commit. This option allows the merge machinery to, in the outer layer of the merge: * exit early when a conflict is detected * avoid writing (most) merged blobs/trees to the object store Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- Documentation/git-merge-tree.adoc | 6 +++++ builtin/merge-tree.c | 18 +++++++++++++++ t/t4301-merge-tree-write-tree.sh | 38 +++++++++++++++++++++++++++++++ 3 files changed, 62 insertions(+) diff --git a/Documentation/git-merge-tree.adoc b/Documentation/git-merge-tree.adoc index cf0578f9b5e86d..f824eea61f1e06 100644 --- a/Documentation/git-merge-tree.adoc +++ b/Documentation/git-merge-tree.adoc @@ -65,6 +65,12 @@ OPTIONS default is to include these messages if there are merge conflicts, and to omit them otherwise. +--quiet:: + Disable all output from the program. Useful when you are only + interested in the exit status. Allows merge-tree to exit + early when it finds a conflict, and allows it to avoid writing + most objects created by merges. + --allow-unrelated-histories:: merge-tree will by default error out if the two branches specified share no common history. This flag can be given to override that diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 4aafa73c61559e..7f41665dfd7e67 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -490,6 +490,9 @@ static int real_merge(struct merge_tree_options *o, if (result.clean < 0) die(_("failure to merge")); + if (o->merge_options.mergeability_only) + goto cleanup; + if (show_messages == -1) show_messages = !result.clean; @@ -522,6 +525,8 @@ static int real_merge(struct merge_tree_options *o, } if (o->use_stdin) putchar(line_termination); + +cleanup: merge_finalize(&opt, &result); clear_merge_options(&opt); return !result.clean; /* result.clean < 0 handled above */ @@ -538,6 +543,7 @@ int cmd_merge_tree(int argc, int original_argc; const char *merge_base = NULL; int ret; + int quiet = 0; const char * const merge_tree_usage[] = { N_("git merge-tree [--write-tree] [] "), @@ -552,6 +558,10 @@ int cmd_merge_tree(int argc, N_("do a trivial merge only"), MODE_TRIVIAL), OPT_BOOL(0, "messages", &o.show_messages, N_("also show informational/conflict messages")), + OPT_BOOL_F(0, "quiet", + &quiet, + N_("suppress all output; only exit status wanted"), + PARSE_OPT_NONEG), OPT_SET_INT('z', NULL, &line_termination, N_("separate paths with the NUL character"), '\0'), OPT_BOOL_F(0, "name-only", @@ -583,6 +593,14 @@ int cmd_merge_tree(int argc, argc = parse_options(argc, argv, prefix, mt_options, merge_tree_usage, PARSE_OPT_STOP_AT_NON_OPTION); + if (quiet && o.show_messages == -1) + o.show_messages = 0; + o.merge_options.mergeability_only = quiet; + die_for_incompatible_opt2(quiet, "--quiet", o.show_messages, "--messages"); + die_for_incompatible_opt2(quiet, "--quiet", o.name_only, "--name-only"); + die_for_incompatible_opt2(quiet, "--quiet", o.use_stdin, "--stdin"); + die_for_incompatible_opt2(quiet, "--quiet", !line_termination, "-z"); + if (xopts.nr && o.mode == MODE_TRIVIAL) die(_("--trivial-merge is incompatible with all other options")); for (size_t x = 0; x < xopts.nr; x++) diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh index f9c5883a7f7cd6..6e117ee93c8b5b 100755 --- a/t/t4301-merge-tree-write-tree.sh +++ b/t/t4301-merge-tree-write-tree.sh @@ -54,6 +54,25 @@ test_expect_success setup ' git commit -m first-commit ' +test_expect_success '--quiet on clean merge' ' + # Get rid of loose objects to start with + git gc && + echo "0 objects, 0 kilobytes" >expect && + git count-objects >actual && + test_cmp expect actual && + + # Ensure merge is successful (exit code of 0) + git merge-tree --write-tree --quiet side1 side3 >output && + + # Ensure there is no output + test_must_be_empty output && + + # Ensure no loose objects written (all new objects written would have + # been in "outer layer" of the merge) + git count-objects >actual && + test_cmp expect actual +' + test_expect_success 'Clean merge' ' TREE_OID=$(git merge-tree --write-tree side1 side3) && q_to_tab <<-EOF >expect && @@ -72,6 +91,25 @@ test_expect_success 'Failed merge without rename detection' ' grep "CONFLICT (modify/delete): numbers deleted" out ' +test_expect_success '--quiet on conflicted merge' ' + # Get rid of loose objects to start with + git gc && + echo "0 objects, 0 kilobytes" >expect && + git count-objects >actual && + test_cmp expect actual && + + # Ensure merge has conflict + test_expect_code 1 git merge-tree --write-tree --quiet side1 side2 >output && + + # Ensure there is no output + test_must_be_empty output && + + # Ensure no loose objects written (all new objects written would have + # been in "outer layer" of the merge) + git count-objects >actual && + test_cmp expect actual +' + test_expect_success 'Content merge and a few conflicts' ' git checkout side1^0 && test_must_fail git merge side2 && From 3749b8a795347443286bb7c1d36489ea14b1f03f Mon Sep 17 00:00:00 2001 From: K Jayatheerth Date: Sun, 18 May 2025 13:13:15 +0530 Subject: [PATCH 43/52] docs: remove unused mentoring mailing list reference The git-mentoring group was initially created to help newcomers with their development itches. However, in practice, most of their questions were already being addressed directly on the mailing list, and contributors consistently received helpful responses there. Remove the mentoring group details from the Documentation. Signed-off-by: K Jayatheerth Signed-off-by: Junio C Hamano --- Documentation/MyFirstContribution.adoc | 8 -------- 1 file changed, 8 deletions(-) diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc index ca1d688c9ba5e1..ef190d87481bb5 100644 --- a/Documentation/MyFirstContribution.adoc +++ b/Documentation/MyFirstContribution.adoc @@ -40,14 +40,6 @@ the list by sending an email to The https://lore.kernel.org/git[archive] of this mailing list is available to view in a browser. -==== https://groups.google.com/forum/#!forum/git-mentoring[git-mentoring@googlegroups.com] - -This mailing list is targeted to new contributors and was created as a place to -post questions and receive answers outside of the public eye of the main list. -Veteran contributors who are especially interested in helping mentor newcomers -are present on the list. In order to avoid search indexers, group membership is -required to view messages; anyone can join and no approval is required. - ==== https://web.libera.chat/#git-devel[#git-devel] on Libera Chat This IRC channel is for conversations between Git contributors. If someone is From a1dcf6b2897e34b684249e6a823221a063ae3910 Mon Sep 17 00:00:00 2001 From: K Jayatheerth Date: Sun, 18 May 2025 13:13:16 +0530 Subject: [PATCH 44/52] docs: clarify cmd_psuh signature and explain UNUSED macro The sample program, as written, would no longer build for at least two reasons: - Since this document was first written, the convention to call a subcommand implementation has changed, and cmd_psuh() now needs to accept the fourth parameter, repository. - These days, compiler warning options for developers include one that detects and complains about unused parameters, so ones that are deliberately unused have to be marked as such. Update the old-style examples to adjust to the current practices, with explanations as needed. Signed-off-by: K Jayatheerth Signed-off-by: Junio C Hamano --- Documentation/MyFirstContribution.adoc | 28 +++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc index ef190d87481bb5..7a3e913f363870 100644 --- a/Documentation/MyFirstContribution.adoc +++ b/Documentation/MyFirstContribution.adoc @@ -142,15 +142,31 @@ command in `builtin/psuh.c`. Create that file, and within it, write the entry point for your command in a function matching the style and signature: ---- -int cmd_psuh(int argc, const char **argv, const char *prefix) +int cmd_psuh(int argc UNUSED, const char **argv UNUSED, + const char *prefix UNUSED, struct repository *repo UNUSED) ---- +A few things to note: + +* A subcommand implementation takes its command line arguments + in `int argc` + `const char **argv`, like `main()` would. + +* It also takes two extra parameters, `prefix` and `repo`. What + they mean will not be discussed until much later. + +* Because this first example will not use any of the parameters, + your compiler will give warnings on unused parameters. As the + list of these four parameters is mandated by the API to add + new built-in commands, you cannot omit them. Instead, you add + `UNUSED` to each of them to tell the compiler that you *know* + you are not (yet) using it. + We'll also need to add the declaration of psuh; open up `builtin.h`, find the declaration for `cmd_pull`, and add a new line for `psuh` immediately before it, in order to keep the declarations alphabetically sorted: ---- -int cmd_psuh(int argc, const char **argv, const char *prefix); +int cmd_psuh(int argc, const char **argv, const char *prefix, struct repository *repo); ---- Be sure to `#include "builtin.h"` in your `psuh.c`. You'll also need to @@ -166,7 +182,8 @@ Throughout the tutorial, we will mark strings for translation as necessary; you should also do so when writing your user-facing commands in the future. ---- -int cmd_psuh(int argc, const char **argv, const char *prefix) +int cmd_psuh(int argc UNUSED, const char **argv UNUSED, + const char *prefix UNUSED, struct repository *repo UNUSED) { printf(_("Pony saying hello goes here.\n")); return 0; @@ -279,8 +296,9 @@ on the reference implementation linked at the top of this document. It's probably useful to do at least something besides printing out a string. Let's start by having a look at everything we get. -Modify your `cmd_psuh` implementation to dump the args you're passed, keeping -existing `printf()` calls in place: +Modify your `cmd_psuh` implementation to dump the args you're passed, +keeping existing `printf()` calls in place; because the args are now +used, remove the `UNUSED` macro from them: ---- int i; From 7649d316ce1b71911dce71fdffd843a71732b827 Mon Sep 17 00:00:00 2001 From: K Jayatheerth Date: Sun, 18 May 2025 13:13:17 +0530 Subject: [PATCH 45/52] docs: replace git_config to repo_config Since this document was written, the built-in API has been updated a few times, but the document was left stale. Adjust to the current best practices by calling repo_config() on the repository instance the subcommand implementation receives as a parameter, instead of calling git_config() that used to be the common practice. Signed-off-by: K Jayatheerth Signed-off-by: Junio C Hamano --- Documentation/MyFirstContribution.adoc | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc index 7a3e913f363870..aca7212cfe2a42 100644 --- a/Documentation/MyFirstContribution.adoc +++ b/Documentation/MyFirstContribution.adoc @@ -322,7 +322,8 @@ on the command line, including the name of our command. (If `prefix` is empty for you, try `cd Documentation/ && ../bin-wrappers/git psuh`). That's not so helpful. So what other context can we get? -Add a line to `#include "config.h"`. Then, add the following bits to the +Add a line to `#include "config.h"` and `#include "repository.h"`. +Then, add the following bits to the function body: function body: ---- @@ -330,18 +331,18 @@ function body: ... - git_config(git_default_config, NULL); - if (git_config_get_string_tmp("user.name", &cfg_name) > 0) + repo_config(repo, git_default_config, NULL); + if (repo_config_get_string_tmp(repo, "user.name", &cfg_name)) printf(_("No name is found in config\n")); else printf(_("Your name: %s\n"), cfg_name); ---- -`git_config()` will grab the configuration from config files known to Git and -apply standard precedence rules. `git_config_get_string_tmp()` will look up +`repo_config()` will grab the configuration from config files known to Git and +apply standard precedence rules. `repo_config_get_string_tmp()` will look up a specific key ("user.name") and give you the value. There are a number of single-key lookup functions like this one; you can see them all (and more info -about how to use `git_config()`) in `Documentation/technical/api-config.adoc`. +about how to use `repo_config()`) in `Documentation/technical/api-config.adoc`. You should see that the name printed matches the one you see when you run: @@ -374,7 +375,7 @@ status_init_config(&s, git_status_config); ---- But as we drill down, we can find that `status_init_config()` wraps a call -to `git_config()`. Let's modify the code we wrote in the previous commit. +to `repo_config()`. Let's modify the code we wrote in the previous commit. Be sure to include the header to allow you to use `struct wt_status`: @@ -390,8 +391,8 @@ prepare it, and print its contents: ... - wt_status_prepare(the_repository, &status); - git_config(git_default_config, &status); + wt_status_prepare(repo, &status); + repo_config(repo, git_default_config, &status); ... From cddcee7f64263922770bbe5c528ba6af4bf81fb5 Mon Sep 17 00:00:00 2001 From: Eli Schwartz Date: Mon, 19 May 2025 13:09:42 -0400 Subject: [PATCH 46/52] meson: reformat default options to workaround bug in `meson configure` Since 13cb20fc46 ("meson: fix compilation with Visual Studio", 2025-01-22) it has not been possible to list build options via `meson configure`. This is due to Meson's static analysis of build options failing to handle constant folding, and thinking we set a totally invalid default `-std=`. This is reported upstream but we anyways need to work with existing versions. It turns out there is a simple solution: turn the entire default option into a conditional branch, which means Meson sees either nothing, or everything. As a result, Git users can once again see pretty-printed options before building. Reported-by: Ramsay Jones Bug: https://github.com/mesonbuild/meson/issues/14623 Signed-off-by: Eli Schwartz Signed-off-by: Junio C Hamano --- meson.build | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/meson.build b/meson.build index efe2871c9dba13..71396f3436bb44 100644 --- a/meson.build +++ b/meson.build @@ -178,14 +178,12 @@ project('git', 'c', capture: true, check: true, ).stdout().strip() : 'unknown', - default_options: [ - # Git requires C99 with GNU extensions, which of course isn't supported by - # MSVC. Funny enough, C99 doesn't work with MSVC either, as it has only - # learned to define __STDC_VERSION__ with C11 and later. We thus require - # GNU C99 and fall back to C11. Meson only learned to handle the fallback - # with version 1.3.0, so on older versions we use GNU C99 unconditionally. - 'c_std=' + (meson.version().version_compare('>=1.3.0') ? 'gnu99,c11' : 'gnu99'), - ], + # Git requires C99 with GNU extensions, which of course isn't supported by + # MSVC. Funny enough, C99 doesn't work with MSVC either, as it has only + # learned to define __STDC_VERSION__ with C11 and later. We thus require + # GNU C99 and fall back to C11. Meson only learned to handle the fallback + # with version 1.3.0, so on older versions we use GNU C99 unconditionally. + default_options: meson.version().version_compare('>=1.3.0') ? ['c_std=gnu99,c11'] : ['c_std=gnu99'], ) fs = import('fs') From f783b3fe740eeb021f8386df2de2ab9fa32eed1b Mon Sep 17 00:00:00 2001 From: Ramsay Jones Date: Mon, 19 May 2025 17:25:19 +0100 Subject: [PATCH 47/52] meson.build: quote the GITWEBDIR build configuration The build configuration options with (non-empty) values, for example filesystem paths potentially containing spaces, have been set using the '.set_quoted()' method. However, the GITWEBDIR value has been set using the '.set()' method instead. In order to correctly quote the GITWEBDIR value, replace the '.set()' method with '.set_quoted()'. Signed-off-by: Ramsay Jones Signed-off-by: Junio C Hamano --- meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/meson.build b/meson.build index 270ce933d0f5d6..48f31157a03e37 100644 --- a/meson.build +++ b/meson.build @@ -739,7 +739,7 @@ build_options_config.set('GIT_TEST_OPTS', '') build_options_config.set('GIT_TEST_PERL_FATAL_WARNINGS', '') build_options_config.set_quoted('GIT_TEST_UTF8_LOCALE', get_option('test_utf8_locale')) build_options_config.set_quoted('LOCALEDIR', fs.as_posix(get_option('prefix') / get_option('localedir'))) -build_options_config.set('GITWEBDIR', fs.as_posix(get_option('prefix') / get_option('datadir') / 'gitweb')) +build_options_config.set_quoted('GITWEBDIR', fs.as_posix(get_option('prefix') / get_option('datadir') / 'gitweb')) if get_option('sane_tool_path').length() != 0 sane_tool_path = (host_machine.system() == 'windows' ? ';' : ':').join(get_option('sane_tool_path')) From bdb38432f383ad397447bcfd80d1659f3c978644 Mon Sep 17 00:00:00 2001 From: Ramsay Jones Date: Mon, 19 May 2025 17:25:20 +0100 Subject: [PATCH 48/52] meson: correct install location of YAML.pm When executing an 'meson install' the YAML.pm file is incorrectly placed in the /share/perl5/Git/SVN directory. The YAML.pm file should be placed in a 'Memoize' subdirectory instead. In order to correct the location, update the 'install_dir' of the relevant target in the 'perl/Git/SVN/Memoize/meson.build' file. Signed-off-by: Ramsay Jones Signed-off-by: Junio C Hamano --- perl/Git/SVN/Memoize/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/perl/Git/SVN/Memoize/meson.build b/perl/Git/SVN/Memoize/meson.build index 233ec670d7de91..8c2e80d2d261cd 100644 --- a/perl/Git/SVN/Memoize/meson.build +++ b/perl/Git/SVN/Memoize/meson.build @@ -3,6 +3,6 @@ test_dependencies += custom_target( output: 'YAML.pm', command: generate_perl_command, install: true, - install_dir: get_option('datadir') / 'perl5/Git/SVN', + install_dir: get_option('datadir') / 'perl5/Git/SVN/Memoize', depends: [git_version_file], ) From 46a626c3891ad39f8534c5e649c38affa1f4e7e1 Mon Sep 17 00:00:00 2001 From: Ramsay Jones Date: Mon, 19 May 2025 17:25:21 +0100 Subject: [PATCH 49/52] meson: correct path to system config/attribute files The path to the system-wide config and attributes files are not being set correctly in the meson build. Unless explicitly overridden on the command line during setup, the 'gitconfig' and 'gitattributes' options are defaulting to absolute paths in the '/etc' system directory. This is only appropriate if the is set specifically to '/usr'. The directory in which these files are placed is generally referred to as the 'system configuration directory' or 'sysconfdir' for short. When the prefix is '/usr' then the sysconfdir is usually set to '/etc', but any other value for prefix results in the relative directory value 'etc' instead. (eg if prefix is '/usr/local', then the 'etc' relative value results in a system configuration directory of '/usr/local/etc'). When setting the 'sysconfdir' builtin option value, the meson system uses exactly this algorithm, so we can use get_option('sysconfdir') directly when setting the (non-overridden) build variables. In order to allow for overriding from the command line, remove the default values specified for the 'gitconfig' and 'gitattributes' options in the 'meson_options.txt' file. This allows the user to specify any pathname for those options, while being able to test for the unset (empty) value. An absolute pathname will be used unchanged and a relative pathname will be appended to '/'. These values are then used to set the 'ETC_GITCONFIG' and 'ETC_GITATTRIBUTES' build variables which are, in turn, passed to the compiler as '-D' arguments. When the 'gitconfig' or 'gitattributes' options are not used, then use the built-in 'sysconfdir' and set the ETC_GITCONFIG build variable to the string "/gitconfig". Similarly, set ETC_ATTRIBUTES to "/gitattributes". Signed-off-by: Ramsay Jones Signed-off-by: Junio C Hamano --- meson.build | 16 ++++++++++++++-- meson_options.txt | 8 ++++---- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/meson.build b/meson.build index 48f31157a03e37..8e8f228a374bc0 100644 --- a/meson.build +++ b/meson.build @@ -757,8 +757,6 @@ endif libgit_c_args = [ '-DBINDIR="' + get_option('bindir') + '"', '-DDEFAULT_GIT_TEMPLATE_DIR="' + get_option('datadir') / 'git-core/templates' + '"', - '-DETC_GITATTRIBUTES="' + get_option('gitattributes') + '"', - '-DETC_GITCONFIG="' + get_option('gitconfig') + '"', '-DFALLBACK_RUNTIME_PREFIX="' + get_option('prefix') + '"', '-DGIT_HOST_CPU="' + host_machine.cpu_family() + '"', '-DGIT_HTML_PATH="' + get_option('datadir') / 'doc/git-doc"', @@ -769,6 +767,20 @@ libgit_c_args = [ '-DSHELL_PATH="' + fs.as_posix(target_shell.full_path()) + '"', ] +system_attributes = get_option('gitattributes') +if system_attributes != '' + libgit_c_args += '-DETC_GITATTRIBUTES="' + system_attributes + '"' +else + libgit_c_args += '-DETC_GITATTRIBUTES="' + get_option('sysconfdir') / 'gitattributes"' +endif + +system_config = get_option('gitconfig') +if system_config != '' + libgit_c_args += '-DETC_GITCONFIG="' + system_config + '"' +else + libgit_c_args += '-DETC_GITCONFIG="' + get_option('sysconfdir') / 'gitconfig"' +endif + editor_opt = get_option('default_editor') if editor_opt != '' and editor_opt != 'vi' libgit_c_args += '-DDEFAULT_EDITOR="' + editor_opt + '"' diff --git a/meson_options.txt b/meson_options.txt index 8547c0eb47f8f4..5afbf8ec00e9d1 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -3,10 +3,10 @@ option('default_pager', type: 'string', value: 'less', description: 'Fall-back pager.') option('default_editor', type: 'string', value: 'vi', description: 'Fall-back editor.') -option('gitconfig', type: 'string', value: '/etc/gitconfig', - description: 'Path to the global git configuration file.') -option('gitattributes', type: 'string', value: '/etc/gitattributes', - description: 'Path to the global git attributes file.') +option('gitconfig', type: 'string', + description: 'Path to the global git configuration file. (default: etc/gitconfig)') +option('gitattributes', type: 'string', + description: 'Path to the global git attributes file. (default: etc/gitattributes)') option('pager_environment', type: 'string', value: 'LESS=FRX LV=-c', description: 'Environment used when spawning the pager') option('perl_cpan_fallback', type: 'boolean', value: true, From 837f637cf51ee066e98ceefea76cc6e9c3277469 Mon Sep 17 00:00:00 2001 From: Ramsay Jones Date: Mon, 19 May 2025 17:25:22 +0100 Subject: [PATCH 50/52] meson.build: correct setting of GIT_EXEC_PATH For the non-'runtime prefix' case, the meson build sets the GIT_EXEC_PATH build variable to an absolute path equivalent to /libexec/git-core. In comparison, the default make build sets it to a relative path equivalent to 'libexec/git-core'. Indeed, the make build requires the use of some means outside of the Makefile (eg. config.mak[.*] or the command-line) to set GIT_EXEC_PATH to anything other than 'libexec/git-core'. For example, the make invocation: $ make gitexecdir=/some/other/bin all install will build git with GIT_EXEC_PATH set to '/some/other/bin' and install the 'library' executables to that location. However, without setting the 'gitexecdir' make variable, irrespective of the 'runtime prefix' setting, the GIT_EXEC_PATH is always set to 'libexec/git-core'. The meson built-in 'libexecdir' option can be used to provide a similar configurability. The default value for the option is 'libexec'. Attempting to set the option to '' on the command-line, will reset it to the '.' string, presumably to ensure a relative path value. This commit allows the meson build, similar to the above, to configure the project like: $ meson setup --buildtype=debugoptimized -Dprefix=$HOME -Dpcre2=disabled \ -Dlibexecdir=/some/other/bin build so that the GIT_EXEC_PATH is set to '/some/other/bin'. Absent the -Dlibexecdir argument, the GIT_EXEC_PATH is set to 'libexec/git-core'. In order to correct the value of GIT_EXEC_PATH, default the value to the static string value 'libexec/git-core', and only override if the value of the 'libexecdir' option has a value different to 'libexec' or '.'. Also, like the Makefile, add a check for an absolute path when the runtime prefix option is true (and if so, error out). Signed-off-by: Ramsay Jones Signed-off-by: Junio C Hamano --- meson.build | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/meson.build b/meson.build index 8e8f228a374bc0..bd14bc15a171b5 100644 --- a/meson.build +++ b/meson.build @@ -1592,10 +1592,19 @@ else error('Unsupported CSPRNG backend: ' + csprng_backend) endif +git_exec_path = 'libexec/git-core' +libexec = get_option('libexecdir') +if libexec != 'libexec' and libexec != '.' + git_exec_path = libexec +endif + if get_option('runtime_prefix') libgit_c_args += '-DRUNTIME_PREFIX' build_options_config.set('RUNTIME_PREFIX', 'true') - git_exec_path = get_option('libexecdir') / 'git-core' + + if git_exec_path.startswith('/') + error('runtime_prefix requires a relative libexecdir not:', libexec) + endif if compiler.has_header('mach-o/dyld.h') libgit_c_args += '-DHAVE_NS_GET_EXECUTABLE_PATH' @@ -1632,7 +1641,6 @@ if get_option('runtime_prefix') endif else build_options_config.set('RUNTIME_PREFIX', 'false') - git_exec_path = get_option('prefix') / get_option('libexecdir') / 'git-core' endif libgit_c_args += '-DGIT_EXEC_PATH="' + git_exec_path + '"' From 187ce0222f73dd5e8e8c0f5d0b764b4820cc9143 Mon Sep 17 00:00:00 2001 From: Ramsay Jones Date: Mon, 19 May 2025 17:25:23 +0100 Subject: [PATCH 51/52] configure.ac: upgrade to a compilation check for sysinfo Commit f5e3c6c57d ("meson: do a full usage-based compile check for sysinfo", 2025-04-25) updated the 'sysinfo()' check, as part of the meson build, due to the failure of the check on Solaris. Prior to that commit, the meson build only checked the availability of the '' header file. On Solaris, both the header and the 'sysinfo()' function exist, but are completely unrelated to the same function on Linux (and cygwin). Commit 50dec7c566 ("config.mak.uname: add sysinfo() configuration for cygwin", 2025-04-17) added a similar 'sysinfo()' check to the autoconf build. This check looked for the 'sysinfo()' function itself, rather than just the header, but it will fail (incorrectly set HAVE_SYSINFO) for the same reason. In order to correctly identify the 'sysinfo()' function we require as part of 'git-gc' (used in the 'total_ram() function), we also upgrade to a compilation check, in a similar way to the meson commit. Note that since commit c9a51775a3 ("builtin/gc.c: correct RAM calculation when using sysinfo", 2025-04-17) both the 'totalram' and 'mem_unit' fields of the 'struct sysinfo' are used, so the new check includes both of those fields in the compile check. Signed-off-by: Ramsay Jones Signed-off-by: Junio C Hamano --- configure.ac | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index d7e0503f1ec580..f6caab919a3e0e 100644 --- a/configure.ac +++ b/configure.ac @@ -1069,9 +1069,28 @@ GIT_CONF_SUBST([CHARSET_LIB]) # # Define HAVE_SYSINFO=YesPlease if sysinfo is available. -GIT_CHECK_FUNC(sysinfo, - [HAVE_SYSINFO=YesPlease], - [HAVE_SYSINFO=]) +# +AC_DEFUN([HAVE_SYSINFO_SRC], [ +AC_LANG_PROGRAM([[ +#include +#include +]], [[ +struct sysinfo si; +uint64_t t = 0; +if (!sysinfo(&si)) { + t = si.totalram; + if (si.mem_unit > 1) + t *= (uint64_t)si.mem_unit; +} +return t; +]])]) + +AC_MSG_CHECKING([for sysinfo]) +AC_COMPILE_IFELSE([HAVE_SYSINFO_SRC], + [AC_MSG_RESULT([yes]) + HAVE_SYSINFO=YesPlease], + [AC_MSG_RESULT([no]) + HAVE_SYSINFO=]) GIT_CONF_SUBST([HAVE_SYSINFO]) # From 34673cd0e81df9ccc075dd5e25ec92bf3128b3e9 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Tue, 27 May 2025 13:58:38 -0700 Subject: [PATCH 52/52] The eighteenth batch Signed-off-by: Junio C Hamano --- Documentation/RelNotes/2.50.0.adoc | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/Documentation/RelNotes/2.50.0.adoc b/Documentation/RelNotes/2.50.0.adoc index f721ea350d7af5..4bcd3ed38362ae 100644 --- a/Documentation/RelNotes/2.50.0.adoc +++ b/Documentation/RelNotes/2.50.0.adoc @@ -76,6 +76,15 @@ UI, Workflows & Features been under "scalar"'s control are taught an option not to enable the scheduled maintenance on it. + * The userdiff pattern for shell scripts has been updated to cope + with more bash-isms. + + * "git merge-tree" learned an option to see if it resolves cleanly + without actually creating a result. + + * The commit title in the "rebase -i" todo file are now prefixed with + '#', just like a merge commit being replayed. + Performance, Internal Implementation, Development Support etc. -------------------------------------------------------------- @@ -167,6 +176,15 @@ Performance, Internal Implementation, Development Support etc. * The dependency on the_repository variable has been reduced from the code paths in "git replay". + * Support to create a loose object file with unknown object type has + been dropped. + + * The code path to access the "packed-refs" file while "fsck" is + taught to mmap the file, instead of reading the whole file in the + memory. + + * Assorted fixes for issues found with CodeQL. + Fixes since v2.49 ----------------- @@ -325,6 +343,10 @@ Fixes since v2.49 automatically (as opposed to be done only upon manual request). (merge 6389579b2f ps/ci-gitlab-enable-msvc-meson-job later to maint). + * "git apply" and "git add -i/-p" code paths no longer unnecessarily + expand sparse-index while working. + (merge ecf9ba20e3 ds/sparse-apply-add-p later to maint). + * Other code cleanup, docfix, build fix, etc. (merge 227c4f33a0 ja/doc-block-delimiter-markup-fix later to maint). (merge 2bfd3b3685 ab/decorate-code-cleanup later to maint). @@ -355,3 +377,5 @@ Fixes since v2.49 (merge e5dd0a05ed ly/am-split-stgit-leakfix later to maint). (merge bac220e154 rc/t1001-test-path-is-file later to maint). (merge 91db6c735d ly/reftable-writer-leakfix later to maint). + (merge 20e4e9ad0b jc/doc-synopsis-option-markup later to maint). + (merge cddcee7f64 es/meson-configure-build-options-fix later to maint).