Commit Graph

76616 Commits

Author SHA1 Message Date
Karthik Nayak
7e873eb390 meson: rename 'third_party_sources' to 'third_party_excludes'
The 'third_party_sources' variable was moved to the root 'meson.build'
file in the previous commit. The variable is actually used to exclude
third party sources, so rename it accordingly to 'third_party_excludes'
to avoid confusion. While here, remove a duplicate from the list.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-23 10:36:18 -07:00
Karthik Nayak
1597b6e86e meson: move headers definition from 'contrib/coccinelle'
The Meson build for coccinelle static analysis lists all headers to
analyse. Due to the way Meson exports variables between subdirs, this
variable is also available in the root Meson build.

An upcoming commit, will add a new check complimenting 'hdr-check' in
the Makefile. This would require the list of headers. So move the
'coccinelle_headers' to the root Meson build and rename it to 'headers',
remove the root path being appended to each header and retain that in
the coccinelle Meson build since it is specific to the coccinelle build.

Also move the 'third_party_sources' variable to the root Meson build
since it is also a dependency for the 'headers' variable. This also
makes it easier to understand as the variable is now propagated from the
top level to the bottom.

While 'headers_to_check' is only computed when we have a repository and
the 'git' executable is present, the variable itself is exposed as an
empty array. This allows dependencies in upcoming commits to simply
check for length of the array and not worry about dependencies required
to actually populate the array.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-23 10:36:18 -07:00
Karthik Nayak
8e980b7f25 coccinelle: meson: rename variables to be more specific
In Meson, included subdirs export their variables to top level Meson
builds. In 'contrib/coccinelle/meson.build', we define two such
variables `sources` and `headers`. While these variables are specific to
the checks in the 'contrib/coccinelle/' directory, they also pollute the
top level 'meson.build'.

Rename them to be more specific, this ensures that they aren't
mistakenly used in the upper levels and avoid variable name collisions.

While here, change the empty list denotation to be consistent with other
places.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-23 10:34:12 -07:00
Karthik Nayak
a064b0be55 ci/github: install git before checking out the repository
The GitHub's CI workflow uses 'actions/checkout@v4' to checkout the
repository. This action defaults to using the GitHub REST API to obtain
the repository if the `git` executable isn't available.

The step to build Git in the GitHub workflow can be summarized as:

  ...
  - uses: actions/checkout@v4 #1
  - run: ci/install-dependencies.sh #2
  ...
  - run: sudo --preserve-env --set-home --user=builder ci/run-build-and-tests.sh #3
  ...

Step #1, clones the repository, since the `git` executable isn't present
at this step, it uses GitHub's REST API to obtain a tar of the
repository.

Step #2, installs all dependencies, which includes the `git` executable.

Step #3, sets up the build, which includes setting up meson in the meson
job. At this point the `git` executable is present.

This means while the `git` executable is present, the repository doesn't
contain the '.git' folder. To keep both the CI's (GitLab and GitHub)
behavior consistent and to ensure that the build is performed on a
real-world scenario, install `git` before the repository is checked out.
This ensures that 'actions/checkout@v4' will clone the repository
instead of using a tarball. We also update the package cache while
installing `git`, this is because some distros will fail to locate the
package without updating the cache.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-23 10:34:12 -07:00
Junio C Hamano
5a4d746bb6 Merge branch 'es/meson-build-skip-coccinelle' into kn/meson-hdr-check
* es/meson-build-skip-coccinelle:
  meson: disable coccinelle configuration when building from a tarball
2025-04-08 14:11:20 -07:00
Junio C Hamano
9d22ac5122 The third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:23:21 -07:00
Junio C Hamano
7b420ef2c0 Merge branch 'js/imap-send-peer-cert-verify'
* js/imap-send-peer-cert-verify:
  imap-send: explicitly verify the peer certificate
2025-04-07 14:23:20 -07:00
Junio C Hamano
45e31f0bac Merge branch 'js/mingw-admins-are-special'
"Dubious ownership" checks on Windows has been tightened up.

* js/mingw-admins-are-special:
  test-tool path-utils: support debugging "dubious ownership" issues
  mingw: special-case administrators even more
2025-04-07 14:23:20 -07:00
Junio C Hamano
97b747ca11 Merge branch 'tb/bitamp-typofix'
Typofix.

* tb/bitamp-typofix:
  pseudo-merge.h: fix a typo
2025-04-07 14:23:20 -07:00
Junio C Hamano
3bc7f869f0 Merge branch 'dm/completion-remote-names-fix'
The bash command line completion script (in contrib/) has been
updated to cope with remote repository nicknames with slashes in
them.

* dm/completion-remote-names-fix:
  completion: fix bugs with slashes in remote names
  completion: add helper to count path components
2025-04-07 14:23:19 -07:00
Junio C Hamano
bc5be63c4c Merge branch 'pw/doc-pack-refs-markup-fix'
Doc markup fix.

* pw/doc-pack-refs-markup-fix:
  pack-refs doc: fix indentation for --exclude
2025-04-07 14:23:19 -07:00
Junio C Hamano
2a4f95969a Merge branch 'pw/build-breaking-changes-doc'
A documentation page was left out from formatting and installation,
which has been corrected.

* pw/build-breaking-changes-doc:
  docs: add BreakingChanges to TECH_DOCS target
2025-04-07 14:23:19 -07:00
Junio C Hamano
8a753b9a44 Merge branch 'jh/hash-init-fixes'
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.

* jh/hash-init-fixes:
  index-pack, unpack-objects: restore missing ->init_fn
2025-04-07 14:23:18 -07:00
Junio C Hamano
58a8c38226 Merge branch 'tb/combine-cruft-below-size'
"git repack" learned "--combine-cruft-below-size" option that
controls how cruft-packs are combined.

* tb/combine-cruft-below-size:
  repack: begin combining cruft packs with `--combine-cruft-below-size`
  repack: avoid combining cruft packs with `--max-cruft-size`
  t/t7704-repack-cruft.sh: consolidate `write_blob()`
  t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
  t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2025-04-07 14:23:18 -07:00
Junio C Hamano
6a9e1c3507 Merge branch 'ja/doc-branch-markup'
Doc mark-up updates.

* ja/doc-branch-markup:
  doc: apply new format to git-branch man page
  completion: take into account the formatting backticks for options
2025-04-07 14:23:18 -07:00
Junio C Hamano
68c048c84c Merge branch 'cc/lop-remote'
Bugfix in newly introduced large-object-promisor remote support.

* cc/lop-remote:
  promisor-remote: compare remote names case sensitively
  promisor-remote: fix possible issue when no URL is advertised
  promisor-remote: fix segfault when remote URL is missing
  t5710: arrange to delete the client before cloning
2025-04-07 14:23:17 -07:00
Junio C Hamano
477cc3b6c7 Merge branch 'jc/name-rev-stdin'
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.

* jc/name-rev-stdin:
  name-rev: remove "--stdin" support
  t6120: further modernize
  t6120: avoid hiding "git" exit status
  t: introduce WITH_BREAKING_CHANGES prerequisite
  t: extend test_lazy_prereq
  t: document test_lazy_prereq
2025-04-07 14:23:17 -07:00
Junio C Hamano
e63c3e4296 Merge branch 'kn/ci-meson-check-build-docs-fix'
GitHub Actions CI switched on a CI/CD variable that does not exist
when choosing what packages to install etc., which has been
corrected.

* kn/ci-meson-check-build-docs-fix:
  ci/github: add missing 'CI_JOB_IMAGE' env variable
2025-04-07 14:23:17 -07:00
Junio C Hamano
fbca35381b Merge branch 'aj/doc-restore-p-update'
Stale description in "git restore -p" documentation has been
updated.

* aj/doc-restore-p-update:
  doc: restore: remove note on --patch w/ pathspecs
2025-04-07 14:23:16 -07:00
Junio C Hamano
5b97a56fa0 The second batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-29 16:39:11 +09:00
Junio C Hamano
30eadc9d59 Merge branch 'hj/doc-rev-list-ancestry-fix'
Doc update.

* hj/doc-rev-list-ancestry-fix:
  doc: add missing commit C to the graph for --ancestry-path=H D..M
2025-03-29 16:39:11 +09:00
Junio C Hamano
81d8747cea Merge branch 'es/meson-building-docs-requires-perl'
Build update.

* es/meson-building-docs-requires-perl:
  meson: fix perl detection when docs are enabled, but perl bindings aren't
2025-03-29 16:39:11 +09:00
Junio C Hamano
ff926a6d1b Merge branch 'en/random-cleanups'
Miscellaneous code clean-ups.

* en/random-cleanups:
  merge-ort: remove extraneous word in comment
  merge-ort: fix accidental strset<->strintmap
  t7615: be more explicit about diff algorithm used
  t6423: fix a comment that accidentally reversed two commits
  stash: remove merge-recursive.h include
2025-03-29 16:39:10 +09:00
Junio C Hamano
6767149eca Merge branch 'rs/xdiff-context-length-fix'
The xdiff code on 32-bit platform misbehaved when an insanely large
context size is given, which has been corrected.

* rs/xdiff-context-length-fix:
  xdiff: avoid arithmetic overflow in xdl_get_hunk()
2025-03-29 16:39:10 +09:00
Junio C Hamano
f76fe4ab06 Merge branch 'jk/use-wunreachable-code-for-devs'
Enable -Wunreachable-code for developer builds.

* jk/use-wunreachable-code-for-devs:
  config.mak.dev: enable -Wunreachable-code
  git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
  run-command: use errno to check for sigfillset() error
2025-03-29 16:39:10 +09:00
Junio C Hamano
b9b404fa1c Merge branch 'en/diff-rename-follow-fix'
A corner-case bug in "git log --follow -B" has been fixed.

* en/diff-rename-follow-fix:
  diffcore-rename: fix BUG when break detection and --follow used together
2025-03-29 16:39:09 +09:00
Junio C Hamano
27fe152e88 Merge branch 'tb/multi-cruft-pack-refresh-fix'
Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.

* tb/multi-cruft-pack-refresh-fix:
  builtin/pack-objects.c: freshen objects from existing cruft packs
2025-03-29 16:39:09 +09:00
Junio C Hamano
f3db666cca Merge branch 'am/dir-dedup-decl-of-repository'
Code cleanup.

* am/dir-dedup-decl-of-repository:
  dir.h: remove duplicate forward declaration of struct repository
2025-03-29 16:39:09 +09:00
Junio C Hamano
64aef9217f Merge branch 'ps/meson-with-breaking-changes'
Update meson based build procedure for breaking changes support.

* ps/meson-with-breaking-changes:
  meson: don't install git-pack-redundant(1) docs with breaking changes
  meson: don't compile git-pack-redundant(1) with breaking changes
  meson: define WITH_BREAKING_CHANGES when enabling breaking changes
2025-03-29 16:39:08 +09:00
Junio C Hamano
650b2e2fdb Merge branch 'jk/fetch-ref-prefix-cleanup'
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs.  Revamp the logic to simplify.

* jk/fetch-ref-prefix-cleanup:
  fetch: use ref prefix list to skip ls-refs
  fetch: avoid ls-refs only to ask for HEAD symref update
  fetch: stop protecting additions to ref-prefix list
  fetch: ask server to advertise HEAD for config-less fetch
  refspec_ref_prefixes(): clean up refspec_item logic
  t5516: beef up exact-oid ref prefixes test
  t5516: drop NEEDSWORK about v2 reachability behavior
  t5516: prefer "oid" to "sha1" in some test titles
  t5702: fix typo in test name
2025-03-29 16:39:08 +09:00
Junio C Hamano
ef0d6b7151 Merge branch 'ab/decorate-code-cleanup'
Code clean-up.

* ab/decorate-code-cleanup:
  decorate: fix sign comparison warnings
2025-03-29 16:39:08 +09:00
Junio C Hamano
eb7923be1f Merge branch 'en/merge-ort-prepare-to-remove-recursive'
First step of deprecating and removing merge-recursive.

* en/merge-ort-prepare-to-remove-recursive:
  am: switch from merge_recursive_generic() to merge_ort_generic()
  merge-ort: fix merge.directoryRenames=false
  t3650: document bug when directory renames are turned off
  merge-ort: support having merge verbosity be set to 0
  merge-ort: allow rename detection to be disabled
  merge-ort: add new merge_ort_generic() function
2025-03-29 16:39:07 +09:00
Junio C Hamano
8d6413a1be Merge branch 'ps/refname-avail-check-optim'
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.

* ps/refname-avail-check-optim:
  refs: reuse iterators when determining refname availability
  refs/iterator: implement seeking for files iterators
  refs/iterator: implement seeking for packed-ref iterators
  refs/iterator: implement seeking for ref-cache iterators
  refs/iterator: implement seeking for reftable iterators
  refs/iterator: implement seeking for merged iterators
  refs/iterator: provide infrastructure to re-seek iterators
  refs/iterator: separate lifecycle from iteration
  refs: stop re-verifying common prefixes for availability
  refs/files: batch refname availability checks for initial transactions
  refs/files: batch refname availability checks for normal transactions
  refs/reftable: batch refname availability checks
  refs: introduce function to batch refname availability checks
  builtin/update-ref: skip ambiguity checks when parsing object IDs
  object-name: allow skipping ambiguity checks in `get_oid()` family
  object-name: introduce `repo_get_oid_with_flags()`
2025-03-29 16:39:07 +09:00
Junio C Hamano
01d17c0530 Merge branch 'cc/signed-fast-export-import'
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.

* cc/signed-fast-export-import:
  fast-export, fast-import: add support for signed-commits
  fast-export: do not modify memory from get_commit_buffer
  git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
  fast-export: rename --signed-tags='warn' to 'warn-verbatim'
  fast-export: fix missing whitespace after switch
  git-fast-import.adoc: add missing LF in the BNF
2025-03-29 16:39:07 +09:00
Junio C Hamano
66b90d9bad Start 2.50 cycle (batch #1)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-26 16:26:11 +09:00
Junio C Hamano
87a8e533e3 Merge branch 'ja/doc-block-delimiter-markup-fix'
Doc markup updates.

* ja/doc-block-delimiter-markup-fix:
  doc: add a blank line around block delimiters
2025-03-26 16:26:11 +09:00
Junio C Hamano
52241c96c7 Merge branch 'en/merge-process-renames-crash-fix'
The merge-recursive and merge-ort machinery crashed in corner cases
when certain renames are involved.

* en/merge-process-renames-crash-fix:
  merge-ort: fix slightly overzealous assertion for rename-to-self
  t6423: add a testcase causing a failed assertion in process_renames
2025-03-26 16:26:11 +09:00
Junio C Hamano
1a764cdbdc Merge branch 'ua/some-builtins-wo-the-repository'
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.

* ua/some-builtins-wo-the-repository:
  builtin/checkout-index: stop using `the_repository`
  builtin/for-each-ref: stop using `the_repository`
  builtin/ls-files: stop using `the_repository`
  builtin/pack-refs: stop using `the_repository`
  builtin/send-pack: stop using `the_repository`
  builtin/verify-commit: stop using `the_repository`
  builtin/verify-tag: stop using `the_repository`
  config: teach repo_config to allow `repo` to be NULL
2025-03-26 16:26:10 +09:00
Junio C Hamano
def5e32bc5 Merge branch 'tb/refs-exclude-fixes'
The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs.  This has been corrected.

* tb/refs-exclude-fixes:
  refs.c: stop matching non-directory prefixes in exclude patterns
  refs.c: remove empty '--exclude' patterns
2025-03-26 16:26:10 +09:00
Junio C Hamano
de35b7b3ff Merge branch 'sj/ref-consistency-checks-more'
"git fsck" becomes more careful when checking the refs.

* sj/ref-consistency-checks-more:
  builtin/fsck: add `git refs verify` child process
  packed-backend: check whether the "packed-refs" is sorted
  packed-backend: add "packed-refs" entry consistency check
  packed-backend: check whether the refname contains NUL characters
  packed-backend: add "packed-refs" header consistency check
  packed-backend: check if header starts with "# pack-refs with: "
  packed-backend: check whether the "packed-refs" is regular file
  builtin/refs: get worktrees without reading head information
  t0602: use subshell to ensure working directory unchanged
2025-03-26 16:26:10 +09:00
Junio C Hamano
f50df872a4 Merge branch 'jt/diff-pairs'
A post-processing filter for "diff --raw" output has been
introduced.

* jt/diff-pairs:
  builtin/diff-pairs: allow explicit diff queue flush
  builtin: introduce diff-pairs command
  diff: add option to skip resolving diff statuses
  diff: return diff_filepair from diff queue helpers
2025-03-26 16:26:09 +09:00
Eli Schwartz
d2827dc31e meson: disable coccinelle configuration when building from a tarball
Wiring up coccinelle in the build, depends on running git commands to
get the list of files to operate on. Reasonable, for a feature mainly
used by people developing on git. If building git itself from a tarball
distribution of git's own source code, one likely does not need to run
coccinelle.

But running those git commands failed, and caused the build to error
out, if `spatch` was installed -- because the build assumed that its
presence indicated a desire to use it on this source tree. Instead, we
can expand the conditional to check for both `spatch` and the `.git`
file or directory.

Meson's `opt.require()` method allows us to add a prerequisite for the
feature option. If the prerequisite fails, then the option either:

- converts autodetection to disabled

- emits an informative error if the feature was set to enabled:
  ```
  ERROR: Feature coccinelle cannot be enabled: coccinelle can only be run from a git checkout
  ```

Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-25 16:20:15 -07:00
Johannes Schindelin
fa8cd29676 imap-send: explicitly verify the peer certificate
It is a bug to obtain the peer certificate without verifying it.

Having said that, from my reading of
https://www.openssl.org/docs/man1.1.1/man3/SSL_set_verify.html, it would
appear that Git is saved by the fact that it calls
`SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL)` already early on.

In other words, that `SSL_VERIFY_PEER` combined with the `NULL`
parameter (i.e. no overridden callback) would _already_ verify the peer
certificate.  The fact that we later call `SSL_get_peer_certificate()`
is mistaken by CodeQL to mean that that peer certificate still needs to
be verified, but that had already happened at that point.

Nevertheless, it is better to verify the peer certificate explicitly
than to rely on some side effect that is really hard to reason about
(and that took me more than one business day to analyze fully). It also
makes it easier for static analyzers to validate the correctness of the
code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-25 15:48:58 -07:00
Johannes Schindelin
5bb88e89ef test-tool path-utils: support debugging "dubious ownership" issues
This adds a new sub-sub-command for `test-tool`, simply passing through
the command-line arguments to the `is_path_owned_by_current_user()`
function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-25 04:45:56 -07:00
Johannes Schindelin
03a4e46d12 mingw: special-case administrators even more
The check for dubious ownership has one particular quirk on Windows: if
running as an administrator, files owned by the Administrators _group_
are considered owned by the user.

The rationale for that is: When running in elevated mode, Git creates
files that aren't owned by the individual user but by the Administrators
group.

There is yet another quirk, though: The check I introduced to determine
whether the current user is an administrator uses the
`CheckTokenMembership()` function with the current process token. And
that check only succeeds when running in elevated mode!

Let's be a bit more lenient here and look harder whether the current
user is an administrator. We do this by looking for a so-called "linked
token". That token exists when administrators run in non-elevated mode,
and can be used to create a new process in elevated mode. And feeding
_that_ token to the `CheckTokenMembership()` function succeeds!

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-25 04:45:56 -07:00
David Mandelberg
778d2f1760 completion: fix bugs with slashes in remote names
Previously, some calls to for-each-ref passed fixed numbers of path
components to strip from refs, assuming that remote names had no slashes
in them. This made completions like:

git push github/dseomn :com<Tab>

Result in:

git push github/dseomn :dseomn/completion-remote-slash

With this patch, it instead results in:

git push github/dseomn :completion-remote-slash

Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-23 23:03:13 -07:00
David Mandelberg
5637bdc352 completion: add helper to count path components
A follow-up commit will use this with for-each-ref to strip the right
number of path components from refnames.

Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-23 23:03:12 -07:00
Taylor Blau
484d7adcda repack: begin combining cruft packs with --combine-cruft-below-size
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.

Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.

This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.

The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).

But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.

There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 03:42:07 -07:00
Taylor Blau
0855ed966c repack: avoid combining cruft packs with --max-cruft-size
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.

This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.

But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).

So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.

This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.

The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.

Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:

    if (args->max_pack_size && !cruft_expiration) {
        collapse_small_cruft_packs(in, args->max_pack_size, existing);
    } else {
        for_each_string_list_item(item, &existing->non_kept_packs)
            fprintf(in, "-%s.pack\n", item->string);
        for_each_string_list_item(item, &existing->cruft_packs)
            fprintf(in, "-%s.pack\n", item->string);
    }

This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.

Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).

Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 03:42:07 -07:00
Taylor Blau
7fb12bb27e t/t7704-repack-cruft.sh: consolidate write_blob()
A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.

Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).

Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 03:42:06 -07:00