Commit Graph

76105 Commits

Author SHA1 Message Date
Patrick Steinhardt
8ee018d863 rerere: let rerere_path() write paths into a caller-provided buffer
Same as with `get_worktree_git_dir()` a couple of commits ago, the
`rerere_path()` function returns paths that need not be free'd by the
caller because `git_path()` internally uses `get_pathname()`.

Refactor the function to instead accept a caller-provided buffer that
the path will be written into, passing on ownership to the caller. This
refactoring prepares us for the removal of `git_path()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28 13:54:11 -08:00
Patrick Steinhardt
07242c2a5a path: drop git_common_path() in favor of repo_common_path()
Remove `git_common_path()` in favor of the `repo_common_path()` family
of functions, which makes the implicit dependency on `the_repository` go
away.

Note that `git_common_path()` used to return a string allocated via
`get_pathname()`, which uses a rotating set of statically allocated
buffers. Consequently, callers didn't have to free the returned string.
The same isn't true for `repo_common_path()`, so we also have to add
logic to free the returned strings.

This refactoring also allows us to remove `repo_common_pathv()` from the
public interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:23 -08:00
Patrick Steinhardt
8e4710f011 worktree: return allocated string from get_worktree_git_dir()
The `get_worktree_git_dir()` function returns a string constant that
does not need to be free'd by the caller. This string is computed for
three different cases:

  - If we don't have a worktree we return a path into the Git directory.
    The returned string is owned by `the_repository`, so there is no
    need for the caller to free it.

  - If we have a worktree, but no worktree ID then the caller requests
    the main worktree. In this case we return a path into the common
    directory, which again is owned by `the_repository` and thus does
    not need to be free'd.

  - In the third case, where we have an actual worktree, we compute the
    path relative to "$GIT_COMMON_DIR/worktrees/". This string does not
    need to be released either, even though `git_common_path()` ends up
    allocating memory. But this doesn't result in a memory leak either
    because we write into a buffer returned by `get_pathname()`, which
    returns one out of four static buffers.

We're about to drop `git_common_path()` in favor of `repo_common_path()`,
which doesn't use the same mechanism but instead returns an allocated
string owned by the caller. While we could adapt `get_worktree_git_dir()`
to also use `get_pathname()` and print the derived common path into that
buffer, the whole schema feels a lot like premature optimization in this
context. There are some callsites where we call `get_worktree_git_dir()`
in a loop that iterates through all worktrees. But none of these loops
seem to be even remotely in the hot path, so saving a single allocation
there does not feel worth it.

Refactor the function to instead consistently return an allocated path
so that we can start using `repo_common_path()` in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:23 -08:00
Patrick Steinhardt
3859e39659 path: drop git_path_buf() in favor of repo_git_path_replace()
Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The
latter does essentially the same, with the only exception that it does
not rely on `the_repository` but takes the repo as separate parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt
bba59f58a4 path: drop git_pathdup() in favor of repo_git_path()
Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does
essentially the same, with the only exception that it does not rely on
`the_repository` but takes the repo as separate parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt
7f17900b5b path: drop unused strbuf_git_path() function
The `strbuf_git_path()` function isn't used anywhere, and neither should
it grow any callers because it depends on `the_repository`. Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt
f5c714e2a7 path: refactor repo_submodule_path() family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "submodule" family of
functions accordingly.

Note that in contrast to the other `repo_*_path()` families, we have to
pass in the repository as a non-constant pointer. This is because we end
up calling `repo_read_gitmodules()` deep down in the callstack, which
may end up modifying the repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt
f9467895d8 submodule: refactor submodule_to_gitdir() to accept a repo
The `submodule_to_gitdir()` function implicitly uses `the_repository` to
resolve submodule paths. Refactor the function to instead accept a repo
as parameter to remove the dependency on global state.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:21 -08:00
Patrick Steinhardt
93a8cfaf3c path: refactor repo_worktree_path() family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "worktree" family of
functions accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:21 -08:00
Patrick Steinhardt
bdfc07bfdf path: refactor repo_git_path() family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "gitdir" family of
functions accordingly.

Note that the `repo_git_pathv()` function is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:21 -08:00
Patrick Steinhardt
70a16ff8a1 path: refactor repo_common_path() family of functions
The functions provided by the "path" subsystem to derive repository
paths for the commondir, gitdir, worktrees and submodules are quite
inconsistent. Some functions have a `strbuf_` prefix, others have
different return values, some don't provide a variant working on top of
`strbuf`s.

We're thus about to refactor all of these family of functions so that
they follow a common pattern:

  - `repo_*_path()` returns an allocated string.

  - `repo_*_path_append()` appends the path to the caller-provided
    buffer while returning a constant pointer to the buffer. This
    clarifies whether the buffer is being appended to or rewritten,
    which otherwise wasn't immediately obvious.

  - `repo_*_path_replace()` replaces contents of the buffer with the
    computed path, again returning a pointer to the buffer contents.

The returned constant pointer isn't being used anywhere yet, but it will
be used in subsequent commits. Its intent is to allow calling patterns
like the following somewhat contrived example:

    if (!stat(&st, repo_common_path_replace(repo, &buf, ...)) &&
        !unlink(repo_common_path_replace(repo, &buf, ...)))
            ...

Refactor the commondir family of functions accordingly and adapt all
callers.

Note that `repo_common_pathv()` is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:21 -08:00
Junio C Hamano
bc204b7427 The seventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 10:23:35 -08:00
Junio C Hamano
1f124f3024 Merge branch 'kn/reflog-migration-fix-fix'
Fix bugs in an earlier attempt to fix "git refs migration".

* kn/reflog-migration-fix-fix:
  refs/reftable: fix uninitialized memory access of `max_index`
  reftable: write correct max_update_index to header
2025-02-03 10:23:35 -08:00
Junio C Hamano
b83a2f9006 Merge branch 'kn/pack-write-with-reduced-globals'
Code clean-up.

* kn/pack-write-with-reduced-globals:
  pack-write: pass hash_algo to internal functions
  pack-write: pass hash_algo to `write_rev_file()`
  pack-write: pass hash_algo to `write_idx_file()`
  pack-write: pass repository to `index_pack_lockfile()`
  pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03 10:23:34 -08:00
Junio C Hamano
f49905d47d Merge branch 'ps/build-meson-fixes'
More build fixes and enhancements on meson based build procedure.

* ps/build-meson-fixes:
  ci: wire up Visual Studio build with Meson
  ci: raise error when Meson generates warnings
  meson: fix compilation with Visual Studio
  meson: make the CSPRNG backend configurable
  meson: wire up fuzzers
  meson: wire up generation of distribution archive
  meson: wire up development environments
  meson: fix dependencies for generated headers
  meson: populate project version via GIT-VERSION-GEN
  GIT-VERSION-GEN: allow running without input and output files
  GIT-VERSION-GEN: simplify computing the dirty marker
2025-02-03 10:23:34 -08:00
Junio C Hamano
803b5acaa7 Merge branch 'ps/3.0-remote-deprecation'
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.

* ps/3.0-remote-deprecation:
  remote: announce removal of "branches/" and "remotes/"
  builtin/pack-redundant: remove subcommand with breaking changes
  ci: repurpose "linux-gcc" job for deprecations
  ci: merge linux-gcc-default into linux-gcc
  Makefile: wire up build option for deprecated features
2025-02-03 10:23:33 -08:00
Junio C Hamano
c43136d67b Merge branch 'jk/combine-diff-cleanup'
Code clean-up for code paths around combined diff.

* jk/combine-diff-cleanup:
  tree-diff: make list tail-passing more explicit
  tree-diff: simplify emit_path() list management
  tree-diff: use the name "tail" to refer to list tail
  tree-diff: drop list-tail argument to diff_tree_paths()
  combine-diff: drop public declaration of combine_diff_path_size()
  tree-diff: inline path_appendnew()
  tree-diff: pass whole path string to path_appendnew()
  tree-diff: drop path_appendnew() alloc optimization
  run_diff_files(): de-mystify the size of combine_diff_path struct
  diff: add a comment about combine_diff_path.parent.path
  combine-diff: use pointer for parent paths
  tree-diff: clear parent array in path_appendnew()
  combine-diff: add combine_diff_path_new()
  run_diff_files(): delay allocation of combine_diff_path
2025-02-03 10:23:33 -08:00
Junio C Hamano
caf17423d3 Merge branch 'tb/unsafe-hash-cleanup'
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.

* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-02-03 10:23:32 -08:00
Junio C Hamano
58b5801aa9 The sixth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 09:44:16 -08:00
Junio C Hamano
81309f424b Merge branch 'jc/show-index-h-update'
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.

* jc/show-index-h-update:
  show-index: the short help should say the command reads from its input
2025-01-31 09:44:16 -08:00
Junio C Hamano
bdd1988eb3 Merge branch 'ja/doc-notes-markup-updates'
Doc mark-up updates.

* ja/doc-notes-markup-updates:
  doc: convert git-notes to new documentation format
2025-01-31 09:44:15 -08:00
Junio C Hamano
ecba2c181c Merge branch 'sk/strlen-returns-size_t'
Code clean-up.

* sk/strlen-returns-size_t:
  date.c: Fix type missmatch warings from msvc
2025-01-31 09:44:15 -08:00
Junio C Hamano
dccd9c5cf2 Merge branch 'ja/doc-restore-markup-update'
Doc mark-up updates.

* ja/doc-restore-markup-update:
  doc: convert git-restore to new style format
2025-01-31 09:44:15 -08:00
Junio C Hamano
3b0d05c4a7 The fifth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29 14:05:10 -08:00
Junio C Hamano
296cf82f93 Merge branch 'ps/reflog-migration-with-logall-fix'
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.

* ps/reflog-migration-with-logall-fix:
  refs: fix migration of reflogs respecting "core.logAllRefUpdates"
2025-01-29 14:05:10 -08:00
Junio C Hamano
c5216a1bc6 Merge branch 'am/trace2-with-valueless-true'
The trace2 code was not prepared to show a configuration variable
that is set to true using the valueless true syntax, which has been
corrected.

* am/trace2-with-valueless-true:
  trace2: prevent segfault on config collection with valueless true
2025-01-29 14:05:10 -08:00
Junio C Hamano
d205f06ae0 Merge branch 'kn/reflog-symref-fix'
reflog entries for symbolic ref updates were broken, which has been
corrected.

* kn/reflog-symref-fix:
  refs: fix creation of reflog entries for symrefs
2025-01-29 14:05:10 -08:00
Junio C Hamano
8d6240d4c6 Merge branch 'rs/ref-fitler-used-atoms-value-fix'
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.

* rs/ref-fitler-used-atoms-value-fix:
  ref-filter: remove ref_format_clear()
  ref-filter: move is-base tip to used_atom
  ref-filter: move ahead-behind bases into used_atom
2025-01-29 14:05:09 -08:00
Junio C Hamano
de56e1d746 Merge branch 'ja/doc-commit-markup-updates'
Doc updates.

* ja/doc-commit-markup-updates:
  doc: migrate git-commit manpage secondary files to new format
  doc: convert git commit config to new format
  doc: make more direct explanations in git commit options
  doc: the mode param of -u of git commit is optional
  doc: apply new documentation guidelines to git commit
2025-01-29 14:05:09 -08:00
Junio C Hamano
f046ab2dd4 Merge branch 'ds/path-walk-1'
Introduce a new API to visit objects in batches based on a common
path, or by type.

* ds/path-walk-1:
  path-walk: drop redundant parse_tree() call
  path-walk: reorder object visits
  path-walk: mark trees and blobs as UNINTERESTING
  path-walk: visit tags and cached objects
  path-walk: allow consumer to specify object types
  t6601: add helper for testing path-walk API
  test-lib-functions: add test_cmp_sorted
  path-walk: introduce an object walk by path
2025-01-29 14:05:09 -08:00
Junio C Hamano
da898a5c64 The fourth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28 13:02:25 -08:00
Junio C Hamano
b09b10ad26 Merge branch 'jp/t8002-printf-fix'
Test fix.

* jp/t8002-printf-fix:
  t8002: fix ambiguous printf conversion specifications
2025-01-28 13:02:24 -08:00
Junio C Hamano
a17fd7dd3a Merge branch 'ps/reftable-sign-compare'
The reftable/ library code has been made -Wsign-compare clean.

* ps/reftable-sign-compare:
  reftable: address trivial -Wsign-compare warnings
  reftable/blocksource: adjust `read_block()` to return `ssize_t`
  reftable/blocksource: adjust type of the block length
  reftable/block: adjust type of the restart length
  reftable/block: adapt header and footer size to return a `size_t`
  reftable/basics: adjust `hash_size()` to return `uint32_t`
  reftable/basics: adjust `common_prefix_size()` to return `size_t`
  reftable/record: handle overflows when decoding varints
  reftable/record: drop unused `print` function pointer
  meson: stop disabling -Wsign-compare
2025-01-28 13:02:24 -08:00
Junio C Hamano
73e055d71e Merge branch 'mh/credential-cache-authtype-request-fix'
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.

* mh/credential-cache-authtype-request-fix:
  credential-cache: respect authtype capability
2025-01-28 13:02:24 -08:00
Junio C Hamano
f8b9821f7d Merge branch 'jk/pack-header-parse-alignment-fix'
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.

* jk/pack-header-parse-alignment-fix:
  index-pack, unpack-objects: use skip_prefix to avoid magic number
  index-pack, unpack-objects: use get_be32() for reading pack header
  parse_pack_header_option(): avoid unaligned memory writes
  packfile: factor out --pack_header argument parsing
  bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-28 13:02:23 -08:00
Junio C Hamano
3ddeb7f337 Merge branch 'ps/build-meson-subtree'
The meson-driven build is now aware of "git-subtree" housed in
contrib/subtree hierarchy.

* ps/build-meson-subtree:
  meson: wire up the git-subtree(1) command
  meson: introduce build option for contrib
  contrib/subtree: fix building docs
2025-01-28 13:02:23 -08:00
Junio C Hamano
63d555a2dc Merge branch 'mh/connect-sign-compare'
The code in connect.c has been updated to work around complaints
from -Wsign-compare.

* mh/connect-sign-compare:
  connect: address -Wsign-compare warnings
2025-01-28 13:02:23 -08:00
Junio C Hamano
8d335468ec Merge branch 'sk/unit-tests'
Move a few more unit tests to the clar test framework.

* sk/unit-tests:
  t/unit-tests: convert reftable tree test to use clar test framework
  t/unit-tests: adapt priority queue test to use clar test framework
  t/unit-tests: convert mem-pool test to use clar test framework
  t/unit-tests: handle dashes in test suite filenames
2025-01-28 13:02:22 -08:00
Junio C Hamano
f0a371a39d Merge branch 'jc/show-usage-help'
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others.  The built-in commands
have been fixed to show them on the standard output consistently.

* jc/show-usage-help:
  builtin: send usage() help text to standard output
  oddballs: send usage() help text to standard output
  builtins: send usage_with_options() help text to standard output
  usage: add show_usage_if_asked()
  parse-options: add show_usage_with_options_if_asked()
  t0012: optionally check that "-h" output goes to stdout
2025-01-28 13:02:22 -08:00
Karthik Nayak
f11f0a5a2d refs/reftable: fix uninitialized memory access of max_index
When migrating reflogs between reference backends, maintaining the
original order of the reflog entries is crucial. To achieve this, an
`index` field is stored within the `ref_update` struct that encodes the
relative order of reflog entries. This field is used by the reftable
backend as update index for the respective reflog entries to maintain
that ordering.

These update indices must be respected when writing table headers, which
encode the minimum and maximum update index of contained records in the
header and footer. This logic was added in commit bc67b4ab5f (reftable:
write correct max_update_index to header, 2025-01-15), which started to
use `reftable_writer_set_limits()` to propagate the mininum and maximum
update index of all records contained in a ref transaction.

However, we only set the maximum update index for the first transaction
argument, even though there can be multiple such arguments. This is the
case when we write to multiple stacks in a single transaction, e.g. when
updating references in two different worktrees at once. Consequently,
the update index for all but the first argument remain uninitialized,
which may cause undefined behaviour.

Fix this by moving the assignment of the maximum update index in
`reftable_be_transaction_finish()` inside the loop, which ensures that
all elements of the array are correctly initialized.

Furthermore, initialize the `max_index` field to 0 when queueing a new
transaction argument. This is not strictly necessary, as all elements of
`write_transaction_table_arg.max_index` are now assigned correctly.
However, this initialization is added for consistency and to safeguard
against potential future changes that might inadvertently introduce
uninitialized memory access.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 08:21:41 -08:00
Patrick Steinhardt
8ccc75c245 remote: announce removal of "branches/" and "remotes/"
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.

Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24 08:08:56 -08:00
Junio C Hamano
5f8f7081f7 The third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 15:07:03 -08:00
Junio C Hamano
39ba2e8e56 Merge branch 'jc/cli-doc-option-and-config'
Doc update.

* jc/cli-doc-option-and-config:
  gitcli: document that command line trumps config and env
2025-01-23 15:07:02 -08:00
Junio C Hamano
6ecb4fc149 Merge branch 'mh/doc-credential-helpers-with-pat'
Document that it is insecure to use Personal Access Tokens, which
some hosting providers take as username/password, embedded in URLs.

* mh/doc-credential-helpers-with-pat:
  docs: discuss caching personal access tokens
  docs: list popular credential helpers
2025-01-23 15:07:02 -08:00
Junio C Hamano
294673a17e Merge branch 'ak/instaweb-python-port-binding-fix'
The "instaweb" bound only to local IP address without "--local" and
to all addresses with "--local", which was the other way around, when
using Python's http.server class, which has been corrected.

* ak/instaweb-python-port-binding-fix:
  instaweb: fix ip binding for the python http.server
2025-01-23 15:07:02 -08:00
Junio C Hamano
aa31820d9d Merge branch 'sj/meson-doc-technical-dependency-fix'
The meson build procedure for Documentation/technical/ hierarchy was
missing necessary dependencies, which has been corrected.

* sj/meson-doc-technical-dependency-fix:
  meson: fix missing deps for technical articles
2025-01-23 15:07:02 -08:00
Junio C Hamano
d8093fd6c1 Merge branch 'tc/meson-use-our-version-def-h'
The meson build procedure looked for the 'version-def.h' file in a
wrong directory, which has been corrected.

* tc/meson-use-our-version-def-h:
  meson: ensure correct version-def.h is used
2025-01-23 15:07:01 -08:00
Junio C Hamano
7e3cb2e515 Merge branch 'en/object-name-with-funny-refname-fix'
Extended SHA-1 expression parser did not work well when a branch
with an unusual name (e.g. "foo{bar") is involved.

* en/object-name-with-funny-refname-fix:
  object-name: be more strict in parsing describe-like output
  object-name: fix resolution of object names containing curly braces
2025-01-23 15:07:01 -08:00
Taylor Blau
04292c3796 hash.h: drop unsafe_ function variants
Now that all callers have been converted from:

    the_hash_algo->unsafe_init_fn();

to

    unsafe_hash_algo(the_hash_algo)->init_fn();

and similar, we can remove the scaffolding for the unsafe_ function
variants and force callers to use the new unsafe_hash_algo() mechanic
instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
Taylor Blau
a8dd3821fe csum-file: introduce hashfile_checkpoint_init()
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.

While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.

Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.

A few things worth noting:

  - In the change to builtin/fast-import.c::stream_blob(), we can see
    that by removing the explicit reference to
    'the_hash_algo->unsafe_init_fn()', we are hardened against the
    hashfile API changing away from the_hash_algo (or its unsafe
    variant) in the future.

  - The bulk-checkin code no longer needs to explicitly zero-initialize
    the hashfile_checkpoint, since it is now done as a result of calling
    'hashfile_checkpoint_init()'.

  - Also in the bulk-checkin code, we add an additional call to
    prepare_to_stream() outside of the main loop in order to initialize
    'state->f' so we know which hash function implementation to use when
    calling 'hashfile_checkpoint_init()'.

    This is OK, since subsequent 'prepare_to_stream()' calls are noops.
    However, we only need to call 'prepare_to_stream()' when we have the
    HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
    'prepare_to_stream()' does not assign 'state->f', so we have nothing
    to initialize.

  - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
    appropriately guarded.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00