Commit Graph

71640 Commits

Author SHA1 Message Date
Taylor Blau
e5d48bf38b ewah: implement bitmap_is_empty()
In a future commit, we will want to check whether or not a bitmap has
any bits set in any of its words. The best way to do this (prior to the
existence of this patch) is to call `bitmap_popcount()` and check
whether the result is non-zero.

But this is semi-wasteful, since we do not need to know the exact number
of bits set, only whether or not there is at least one of them.

Implement a new helper function to check just that.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
dab60934e3 pack-bitmap: pass bitmapped_pack struct to pack-reuse functions
When trying to assemble a pack with bitmaps using `--use-bitmap-index`,
`pack-objects` asks the pack-bitmap machinery for a bitmap which
indicates the set of objects we can "reuse" verbatim from on-disk.

This set is roughly comprised of: a prefix of objects in the bitmapped
pack (or preferred pack, in the case of a multi-pack reachability
bitmap), plus any other objects not included in the prefix, excluding
any deltas whose base we are not sending in the resulting pack.

The pack-bitmap machinery is responsible for computing this bitmap, and
does so with the following functions:

  - reuse_partial_packfile_from_bitmap()
  - try_partial_reuse()

In the existing implementation, the first function is responsible for
(a) marking the prefix of objects in the reusable pack, and then (b)
calling try_partial_reuse() on any remaining objects to ensure that they
are also reusable (and removing them from the bitmapped set if they are
not).

Likewise, the `try_partial_reuse()` function is responsible for checking
whether an isolated object (that is, an object from the bitmapped
pack/preferred pack not contained in the prefix from earlier) may be
reused, i.e. that it isn't a delta of an object that we are not sending
in the resulting pack.

These functions are based on two core assumptions, which we will unwind
in this and the following commits:

  1. There is only a single pack from the bitmap which is eligible for
     verbatim pack-reuse. For single-pack bitmaps, this is trivially the
     bitmapped pack. For multi-pack bitmaps, this is (currently) the
     MIDX's preferred pack.

  2. The pack eligible for reuse has its first object in bit position 0,
     and all objects from that pack follow in pack-order from that first
     bit position.

In order to perform verbatim pack reuse over multiple packs, we must
unwind these two assumptions. Most notably, in order to reuse bits from
a given packfile, we need to know the first bit position occupied by
an object form that packfile. To propagate this information around, pass
a `struct bitmapped_pack *` anywhere we previously passed a `struct
packed_git *`, since the former contains the bitmap position we're
interested in (as well as a pointer to the latter).

As an additional step, factor out a sub-routine from the main
`reuse_partial_packfile_from_bitmap()` function, called
`reuse_partial_packfile_from_bitmap_1()`. This new function will be
responsible for figuring out which objects may be reused from a single
pack, and the existing function will dispatch multiple calls to its new
helper function for each reusable pack.

Consequently, `reuse_partial_packfile_from_bitmap()` will now maintain
an array of reusable packs instead of a single such pack. We currently
expect that array to have only a single element, so this awkward state
is short-lived. It will serve as useful scaffolding in subsequent
commits as we begin to work towards enabling multi-pack reuse.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
307d75bbe6 midx: implement midx_locate_pack()
The multi-pack index API exposes a `midx_contains_pack()` function that
takes in a string ending in either ".idx" or ".pack" and returns whether
or not the MIDX contains a given pack corresponding to that string.

There is no corresponding function to locate the position of a pack
within the MIDX's pack order (sorted lexically by pack filename).

We could add an optional out parameter to `midx_contains_pack()` that is
filled out with the pack's position when the parameter is non-NULL. To
minimize the amount of fallout from this change, instead introduce a new
function by renaming `midx_contains_pack()` to `midx_locate_pack()`,
adding that output parameter, and then reimplementing
`midx_contains_pack()` in terms of it.

Future patches will make use of this new function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
5f5ccd9595 midx: implement BTMP chunk
When a multi-pack bitmap is used to implement verbatim pack reuse (that
is, when verbatim chunks from an on-disk packfile are copied
directly[^1]), it does so by using its "preferred pack" as the source
for pack-reuse.

This allows repositories to pack the majority of their objects into a
single (often large) pack, and then use it as the single source for
verbatim pack reuse. This increases the amount of objects that are
reused verbatim (and consequently, decrease the amount of time it takes
to generate many packs). But this performance comes at a cost, which is
that the preferred packfile must pace its growth with that of the entire
repository in order to maintain the utility of verbatim pack reuse.

As repositories grow beyond what we can reasonably store in a single
packfile, the utility of verbatim pack reuse diminishes. Or, at the very
least, it becomes increasingly more expensive to maintain as the pack
grows larger and larger.

It would be beneficial to be able to perform this same optimization over
multiple packs, provided some modest constraints (most importantly, that
the set of packs eligible for verbatim reuse are disjoint with respect
to the subset of their objects being sent).

If we assume that the packs which we treat as candidates for verbatim
reuse are disjoint with respect to any of their objects we may output,
we need to make only modest modifications to the verbatim pack-reuse
code itself. Most notably, we need to remove the assumption that the
bits in the reachability bitmap corresponding to objects from the single
reuse pack begin at the first bit position.

Future patches will unwind these assumptions and reimplement their
existing functionality as special cases of the more general assumptions
(e.g. that reuse bits can start anywhere within the bitset, but happen
to start at 0 for all existing cases).

This patch does not yet relax any of those assumptions. Instead, it
implements a foundational data-structure, the "Bitampped Packs" (`BTMP`)
chunk of the multi-pack index. The `BTMP` chunk's contents are described
in detail here. Importantly, the `BTMP` chunk contains information to
map regions of a multi-pack index's reachability bitmap to the packs
whose objects they represent.

For now, this chunk is only written, not read (outside of the test-tool
used in this patch to test the new chunk's behavior). Future patches
will begin to make use of this new chunk.

[^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the
  pack that wasn't used verbatim.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
fba68184b8 midx: factor out fill_pack_info()
When selecting which packfiles will be written while generating a MIDX,
the MIDX internals fill out a 'struct pack_info' with various pieces of
book-keeping.

Instead of filling out each field of the `pack_info` structure
individually in each of the two spots that modify the array of such
structures (`ctx->info`), extract a common routine that does this for
us.

This reduces the code duplication by a modest amount. But more
importantly, it zero-initializes the structure before assigning values
into it. This hardens us for a future change which will add additional
fields to this structure which (until this patch) was not
zero-initialized.

As a result, any new fields added to the `pack_info` structure need only
be updated in a single location, instead of at each spot within midx.c.

There are no functional changes in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
a96015a517 pack-bitmap: plug leak in find_objects()
The `find_objects()` function creates an object_list for any tips of the
reachability query which do not have corresponding bitmaps.

The object_list is not used outside of `find_objects()`, but we never
free it with `object_list_free()`, resulting in a leak. Let's plug that
leak by calling `object_list_free()`, which results in t6113 becoming
leak-free.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
6cdb67b97d pack-bitmap-write: deep-clear the bb_commit slab
The `bb_commit` commit slab is used by the pack-bitmap-write machinery
to track various pieces of bookkeeping used to generate reachability
bitmaps.

Even though we clear the slab when freeing the bitmap_builder struct
(with `bitmap_builder_clear()`), there are still pointers which point to
locations in memory that have not yet been freed, resulting in a leak.

Plug the leak by introducing a suitable `free_fn` for the `struct
bb_commit` type, and make sure it is called on each member of the slab
via the `deep_clear_bb_data()` function.

Note that it is possible for both of the arguments to `bitmap_free()` to
be NULL, but `bitmap_free()` is a noop for NULL arguments, so it is OK
to pass them unconditionally.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Taylor Blau
66f0c71073 pack-objects: free packing_data in more places
The pack-objects internals use a packing_data struct to track what
objects are part of the pack(s) being formed.

Since these structures contain allocated fields, failing to
appropriately free() them results in a leak. Plug that leak by
introducing a clear_packing_data() function, and call it in the
appropriate spots.

This is a fairly straightforward leak to plug, since none of the callers
expect to read any values or have any references to parts of the address
space being freed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Junio C Hamano
1a87c842ec Start the 2.44 cycle
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 16:37:51 -08:00
Junio C Hamano
1ef1cce9c2 Merge branch 'tz/send-email-negatable-options'
Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email".  Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.

* tz/send-email-negatable-options:
  send-email: avoid duplicate specification warnings
  perl: bump the required Perl version to 5.8.1 from 5.8.0
2023-12-09 16:37:51 -08:00
Junio C Hamano
f8f87e0827 Merge branch 'ak/rebase-autosquash'
"git rebase --autosquash" is now enabled for non-interactive rebase,
but it is still incompatible with the apply backend.

* ak/rebase-autosquash:
  rebase: rewrite --(no-)autosquash documentation
  rebase: support --autosquash without -i
  rebase: fully ignore rebase.autoSquash without -i
2023-12-09 16:37:50 -08:00
Junio C Hamano
98d0a1f93e Merge branch 'vd/for-each-ref-unsorted-optimization'
"git for-each-ref --no-sort" still sorted the refs alphabetically
which paid non-trivial cost.  It has been redefined to show output
in an unspecified order, to allow certain optimizations to take
advantage of.

* vd/for-each-ref-unsorted-optimization:
  t/perf: add perf tests for for-each-ref
  ref-filter.c: use peeled tag for '*' format fields
  for-each-ref: clean up documentation of --format
  ref-filter.c: filter & format refs in the same callback
  ref-filter.c: refactor to create common helper functions
  ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  ref-filter.h: add functions for filter/format & format-only
  ref-filter.h: move contains caches into filter
  ref-filter.h: add max_count and omit_empty to ref_format
  ref-filter.c: really don't sort when using --no-sort
2023-12-09 16:37:50 -08:00
Junio C Hamano
e020e55a62 Merge branch 'ps/ban-a-or-o-operator-with-test'
Test and shell scripts clean-up.

* ps/ban-a-or-o-operator-with-test:
  Makefile: stop using `test -o` when unlinking duplicate executables
  contrib/subtree: convert subtree type check to use case statement
  contrib/subtree: stop using `-o` to test for number of args
  global: convert trivial usages of `test <expr> -a/-o <expr>`
2023-12-09 16:37:50 -08:00
Junio C Hamano
4297485172 Merge branch 'ss/format-patch-use-encode-headers-for-cover-letter'
"git format-patch --encode-email-headers" ignored the option when
preparing the cover letter, which has been corrected.

* ss/format-patch-use-encode-headers-for-cover-letter:
  format-patch: fix ignored encode_email_headers for cover letter
2023-12-09 16:37:49 -08:00
Junio C Hamano
340581bcf1 Merge branch 'ps/ref-tests-update'
Update ref-related tests.

* ps/ref-tests-update:
  t: mark several tests that assume the files backend with REFFILES
  t7900: assert the absence of refs via git-for-each-ref(1)
  t7300: assert exact states of repo
  t4207: delete replace references via git-update-ref(1)
  t1450: convert tests to remove worktrees via git-worktree(1)
  t: convert tests to not access reflog via the filesystem
  t: convert tests to not access symrefs via the filesystem
  t: convert tests to not write references via the filesystem
  t: allow skipping expected object ID in `ref-store update-ref`
2023-12-09 16:37:49 -08:00
Junio C Hamano
d8b0ec44b1 Merge branch 'jw/git-add-attr-pathspec'
"git add" and "git stash" learned to support the ":(attr:...)"
magic pathspec.

* jw/git-add-attr-pathspec:
  attr: enable attr pathspec magic for git-add and git-stash
2023-12-09 16:37:49 -08:00
Junio C Hamano
34401b7ddb Merge branch 'jk/chunk-bounds-more'
Code clean-up for jk/chunk-bounds topic.

* jk/chunk-bounds-more:
  commit-graph: mark chunk error messages for translation
  commit-graph: drop verify_commit_graph_lite()
  commit-graph: check order while reading fanout chunk
  commit-graph: use fanout value for graph size
  commit-graph: abort as soon as we see a bogus chunk
  commit-graph: clarify missing-chunk error messages
  commit-graph: drop redundant call to "lite" verification
  midx: check consistency of fanout table
  commit-graph: handle overflow in chunk_size checks
2023-12-09 16:37:48 -08:00
Junio C Hamano
b4e6618fdf Merge branch 'js/ci-discard-prove-state'
The way CI testing used "prove" could lead to running the test
suite twice needlessly, which has been corrected.

* js/ci-discard-prove-state:
  ci: avoid running the test suite _twice_
2023-12-09 16:37:48 -08:00
Junio C Hamano
14a4445d18 Merge branch 'ps/ci-gitlab'
Add support for GitLab CI.

* ps/ci-gitlab:
  ci: add support for GitLab CI
  ci: install test dependencies for linux-musl
  ci: squelch warnings when testing with unusable Git repo
  ci: unify setup of some environment variables
  ci: split out logic to set up failed test artifacts
  ci: group installation of Docker dependencies
  ci: make grouping setup more generic
  ci: reorder definitions for grouping functions
2023-12-09 16:37:48 -08:00
Junio C Hamano
712177ed04 Merge branch 'js/doc-unit-tests-with-cmake'
Update the base topic to work with CMake builds.

* js/doc-unit-tests-with-cmake:
  cmake: handle also unit tests
  cmake: use test names instead of full paths
  cmake: fix typo in variable name
  artifacts-tar: when including `.dll` files, don't forget the unit-tests
  unit-tests: do show relative file paths
  unit-tests: do not mistake `.pdb` files for being executable
  cmake: also build unit tests
2023-12-09 16:37:47 -08:00
Junio C Hamano
8bf6fbd00d Merge branch 'js/doc-unit-tests'
Process to add some form of low-level unit tests has started.

* js/doc-unit-tests:
  ci: run unit tests in CI
  unit tests: add TAP unit test framework
  unit tests: add a project plan document
2023-12-09 16:37:47 -08:00
Junio C Hamano
0946acfb50 Merge branch 'ps/httpd-tests-on-nixos'
Portability tweak.

* ps/httpd-tests-on-nixos:
  t9164: fix inability to find basename(1) in Subversion hooks
  t/lib-httpd: stop using legacy crypt(3) for authentication
  t/lib-httpd: dynamically detect httpd and modules path
2023-12-09 16:37:47 -08:00
Junio C Hamano
564d0252ca Git 2.43
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-20 10:28:15 +09:00
Junio C Hamano
c3cc3e1da7 Merge tag 'l10n-2.43.0-rnd2' of https://github.com/git-l10n/git-po
l10n-2.43.0-rnd2

* tag 'l10n-2.43.0-rnd2' of https://github.com/git-l10n/git-po:
  l10n: zh-TW: Git 2.43.0-rc1
  l10n: Update German translation
  l10n: bg.po: Updated Bulgarian translation (5579t)
  l10n: zh_CN: for git 2.43.0-rc1
  l10n: Update Catalan translation
  l10n: po-id for 2.43 (round 1)
  l10n: fr: v2.43.0 rnd 2
  l10n: update uk localization for v2.43
  l10n: sv.po: Update Swedish translation (5579t)
  l10n: tr: v2.43.0
2023-11-20 10:27:33 +09:00
Junio C Hamano
d003a26cca Merge branch 'vd/glossary-dereference-peel'
"To dereference" and "to peel" were sometimes used in in-code
comments and documentation but without description in the glossary.

* vd/glossary-dereference-peel:
  glossary: add definitions for dereference & peel
2023-11-20 09:57:23 +09:00
Junio C Hamano
e9eb93bb2a Merge branch 'tz/send-email-helpfix'
Typoes in "git send-email -h" have been corrected.

* tz/send-email-helpfix:
  send-email: remove stray characters from usage
2023-11-20 09:57:22 +09:00
Jiang Xin
d303432667 Merge branch 'l10n/zh-TW/2023-11-19' of github.com:l10n-tw/git-po
* 'l10n/zh-TW/2023-11-19' of github.com:l10n-tw/git-po:
  l10n: zh-TW: Git 2.43.0-rc1
2023-11-20 07:57:09 +08:00
Yi-Jyun Pan
3d735322df l10n: zh-TW: Git 2.43.0-rc1
Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>
2023-11-19 23:35:21 +08:00
Jiang Xin
88d26d3fa3 Merge branch 'master' of github.com:alshopov/git-po
* 'master' of github.com:alshopov/git-po:
  l10n: bg.po: Updated Bulgarian translation (5579t)
2023-11-19 20:56:21 +08:00
Ralf Thielow
ed8b3c3078 l10n: Update German translation
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
2023-11-18 14:19:34 +01:00
Alexander Shopov
f42a8bb329 l10n: bg.po: Updated Bulgarian translation (5579t)
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
2023-11-18 13:25:32 +01:00
Teng Long
64294acb07 l10n: zh_CN: for git 2.43.0-rc1
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2023-11-18 12:05:13 +08:00
Jiang Xin
4385297f0d Merge branch '2.43-uk-update' of github.com:arkid15r
* '2.43-uk-update' of github.com:arkid15r/git-ukrainian-l10n:
  l10n: update uk localization for v2.43
2023-11-18 10:51:56 +08:00
Jiang Xin
e3bd57a674 Merge branch 'catalan' of github.com:Softcatala/git-po
* 'catalan' of github.com:Softcatala/git-po:
  l10n: Update Catalan translation
2023-11-18 10:48:39 +08:00
Jiang Xin
ec93805ce3 Merge branch 'tr-l10n' of github.com:bitigchi/git-po
* 'tr-l10n' of github.com:bitigchi/git-po:
  l10n: tr: v2.43.0
2023-11-18 10:45:56 +08:00
Jiang Xin
4dcb316637 Merge branch 'fr_v2.43.0' of github.com:jnavila/git
* 'fr_v2.43.0' of github.com:jnavila/git:
  l10n: fr: v2.43.0 rnd 2
2023-11-18 10:43:22 +08:00
Jiang Xin
4dedb407b1 Merge branch 'po-id' of github.com:bagasme/git-po
* 'po-id' of github.com:bagasme/git-po:
  l10n: po-id for 2.43 (round 1)
2023-11-18 10:42:48 +08:00
Jiang Xin
1af36a14a4 Merge branch 'master' of github.com:nafmo/git-l10n-sv
* 'master' of github.com:nafmo/git-l10n-sv:
  l10n: sv.po: Update Swedish translation (5579t)
2023-11-18 10:42:04 +08:00
Todd Zullinger
6ff658cc78 send-email: avoid duplicate specification warnings
A warning is issued for options which are specified more than once
beginning with perl-Getopt-Long >= 2.55.  In addition to causing users
to see warnings, this results in test failures which compare the output.
An example, from t9001-send-email.37:

  | +++ diff -u expect actual
  | --- expect      2023-11-14 10:38:23.854346488 +0000
  | +++ actual      2023-11-14 10:38:23.848346466 +0000
  | @@ -1,2 +1,7 @@
  | +Duplicate specification "no-chain-reply-to" for option "no-chain-reply-to"
  | +Duplicate specification "to-cover|to-cover!" for option "to-cover"
  | +Duplicate specification "cc-cover|cc-cover!" for option "cc-cover"
  | +Duplicate specification "no-thread" for option "no-thread"
  | +Duplicate specification "no-to-cover" for option "no-to-cover"
  |  fatal: longline.patch:35 is longer than 998 characters
  |  warning: no patches were sent
  | error: last command exited with $?=1
  | not ok 37 - reject long lines

Remove the duplicate option specs.  These are primarily the explicit
'--no-' prefix opts which were added in f471494303 (git-send-email.perl:
support no- prefix with older GetOptions, 2015-01-30).  This was done
specifically to support perl-5.8.0 which includes Getopt::Long 2.32[1].

Getopt::Long 2.33 added support for the '--no-' prefix natively by
appending '!' to the option specification string, which was included in
perl-5.8.1 and is not present in perl-5.8.0.  The previous commit bumped
the minimum supported Perl version to 5.8.1 so we no longer need to
provide the '--no-' variants for negatable options manually.

Teach `--git-completion-helper` to output the '--no-' options.  They are
not included in the options hash and would otherwise be lost.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17 07:26:34 +09:00
Todd Zullinger
d13a73e383 perl: bump the required Perl version to 5.8.1 from 5.8.0
The following commit will make use of a Getopt::Long feature which is
only present in Perl >= 5.8.1.  Document that as the minimum version we
support.

Many of our Perl scripts will continue to run with 5.8.0 but this change
allows us to adjust them as needed without breaking any promises to our
users.

The Perl requirement was last changed in d48b284183 (perl: bump the
required Perl version to 5.8 from 5.6.[21], 2010-09-24).  At that time,
5.8.0 was 8 years old.  It is now over 21 years old.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17 07:26:32 +09:00
Victoria Dye
294bfc2441 t/perf: add perf tests for for-each-ref
Add performance tests for 'for-each-ref'. The tests exercise different
combinations of filters/formats/options, as well as the overall performance
of 'git for-each-ref | git cat-file --batch-check' to demonstrate the
performance difference vs. 'git for-each-ref' with "%(*fieldname)" format
specifiers.

All tests are run against a repository with 40k loose refs - 10k commits,
each having a unique:

- branch
- custom ref (refs/custom/special_*)
- annotated tag pointing at the commit
- annotated tag pointing at the other annotated tag (i.e., a nested tag)

After those tests are finished, the refs are packed with 'pack-refs --all'
and the same tests are rerun.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:03:01 +09:00
Victoria Dye
188782ecb1 ref-filter.c: use peeled tag for '*' format fields
In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'),
"dereferencing" a tag refers to a recursive peel of the tag object. Unlike
these cases, the dereferencing prefix ('*') in 'for-each-ref' format
specifiers triggers only a single, non-recursive dereference of a given tag
object. For most annotated tags, a single dereference is all that is needed
to access the tag's associated commit or tree; "recursive" and
"non-recursive" dereferencing are functionally equivalent in these cases.
However, nested tags (annotated tags whose target is another annotated tag)
dereferenced once return another tag, where a recursive dereference would
return the commit or tree.

Currently, if a user wants to filter & format refs and include information
about a recursively-dereferenced tag, they can do so with something like
'cat-file --batch-check':

    git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
        git cat-file --batch-check="%(objectname) %(rest)"

But the combination of commands is inefficient. So, to improve the
performance of this use case and align the defererencing behavior of
'for-each-ref' with that of other commands, update the ref formatting code
to use the peeled tag (from 'peel_iterated_oid()') to populate '*' fields
rather than the tag's immediate target object (from 'get_tagged_oid()').

Additionally, add a test to 't6300-for-each-ref' to verify new nested tag
behavior and update 't6302-for-each-ref-filter.sh' to print the correct
value for nested dereferenced fields.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:03:01 +09:00
Victoria Dye
d1dfe6e936 for-each-ref: clean up documentation of --format
Move the description of the `*` prefix from the --format option
documentation to the part of the command documentation that deals with other
object type-specific modifiers. Also reorganize and reword the remaining
--format documentation so that the explanation of the default format doesn't
interrupt the details on format string interpolation.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:03:00 +09:00
Victoria Dye
bd98f9774e ref-filter.c: filter & format refs in the same callback
Update 'filter_and_format_refs()' to try to perform ref filtering &
formatting in a single ref iteration, without an intermediate 'struct
ref_array'. This can only be done if no operations need to be performed on a
pre-filtered array; specifically, if the refs are

- filtered on reachability,
- sorted, or
- formatted with ahead-behind information

they cannot be filtered & formatted in the same iteration. In that case,
fall back on the current filter-then-sort-then-format flow.

This optimization substantially improves memory usage due to no longer
storing a ref array in memory. In some cases, it also dramatically reduces
runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads
all refs into a 'struct ref_array' to printing only the first ref).

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:03:00 +09:00
Victoria Dye
613d991549 ref-filter.c: refactor to create common helper functions
Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
'filter_refs()' into new helper functions:

* Extract the code to grow a 'struct ref_array' and append a given 'struct
  ref_array_item *' to it from 'ref_array_push()' into 'ref_array_append()'.
* Extract the code to filter a given ref by refname & object ID then create
  a new 'struct ref_array_item *' from 'filter_one()' into
  'apply_ref_filter()'.
* Extract the code for filter pre-processing, contains cache creation, and
  ref iteration from 'filter_refs()' into 'do_filter_refs()'.

In later patches, these helpers will be used by new ref-filter API
functions. This patch does not result in any user-facing behavior changes or
changes to callers outside of 'ref-filter.c'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:03:00 +09:00
Victoria Dye
9c575b0202 ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
Rename 'ref_filter_handler()' to 'filter_one()' to more clearly distinguish
it from other ref filtering callbacks that will be added in later patches.
The "*_one()" naming convention is common throughout the codebase for
iteration callbacks.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:03:00 +09:00
Victoria Dye
e7574b0c6b ref-filter.h: add functions for filter/format & format-only
Add two new public methods to 'ref-filter.h':

* 'print_formatted_ref_array()' which, given a format specification & array
  of ref items, formats and prints the items to stdout.
* 'filter_and_format_refs()' which combines 'filter_refs()',
  'ref_array_sort()', and 'print_formatted_ref_array()' into a single
  function.

This consolidates much of the code used to filter and format refs in
'builtin/for-each-ref.c', 'builtin/tag.c', and 'builtin/branch.c', reducing
duplication and simplifying the future changes needed to optimize the filter
& format process.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:02:59 +09:00
Victoria Dye
6d6e5c53b0 ref-filter.h: move contains caches into filter
Move the 'contains_cache' and 'no_contains_cache' used in filter_refs into
an 'internal' struct of the 'struct ref_filter'. In later patches, the
'struct ref_filter *' will be a common data structure across multiple
filtering functions. These caches are part of the common functionality the
filter struct will support, so they are updated to be internally accessible
wherever the filter is used.

The design used here mirrors what was introduced in 576de3d956
(unpack_trees: start splitting internal fields from public API, 2023-02-27)
for 'unpack_trees_options'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:02:59 +09:00
Victoria Dye
9d4fcfe1ff ref-filter.h: add max_count and omit_empty to ref_format
Add an internal 'array_opts' struct to 'struct ref_format' containing
formatting options that pertain to the formatting of an entire ref array:
'max_count' and 'omit_empty'. These values are specified by the '--count'
and '--omit-empty' options, respectively, to 'for-each-ref'/'tag'/'branch'.
Storing these values in the 'ref_format' will simplify the consolidation of
ref array formatting logic across builtins in later patches.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:02:59 +09:00
Victoria Dye
56d26ade97 ref-filter.c: really don't sort when using --no-sort
When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the
printed refs are still sorted by ascending refname. Change the handling of
sort options in these commands so that '--no-sort' to truly disables
sorting.

'--no-sort' does not disable sorting in these commands is because their
option parsing does not distinguish between "the absence of '--sort'"
(and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in
an empty 'sorting_options' string list, which is parsed by
'ref_sorting_options()' to create the 'struct ref_sorting *' for the
command. If the string list is empty, 'ref_sorting_options()' interprets
that as "the absence of '--sort'" and returns the default ref sorting
structure (equivalent to "refname" sort).

To handle '--no-sort' properly while preserving the "refname" sort in the
"absence of --sort'" case, first explicitly add "refname" to the string list
*before* parsing options. This alone doesn't actually change any behavior,
since 'compare_refs()' already falls back on comparing refnames if two refs
are equal w.r.t all other sort keys.

Now that the string list is populated by default, '--no-sort' is the only
way to empty the 'sorting_options' string list. Update
'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the
string list is empty, and add a condition to 'ref_array_sort()' to skip the
sort altogether if the sort structure is NULL. Note that other functions
using 'struct ref_sorting *' do not need any changes because they already
ignore NULL values.

Finally, remove the condition around sorting in 'ls-remote', since it's no
longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any
sorting by default. This default is preserved by simply leaving its sort key
string list empty before parsing options; if no additional sort keys are
set, 'struct ref_sorting *' is NULL and sorting is skipped.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-16 14:02:58 +09:00