Commit Graph

75288 Commits

Author SHA1 Message Date
Patrick Steinhardt
ef46ad0815 reftable: rename scratch buffer
Both `struct block_writer` and `struct reftable_writer` have a `buf`
member that is being reused to optimize the number of allocations.
Rename the variable to `scratch` to clarify its intend and provide a
comment explaining why it exists.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 08:39:38 +09:00
Patrick Steinhardt
0f5762b043 refs: adapt initial_transaction flag to be unsigned
The `initial_transaction` flag is tracked as a signed integer, but we
typically pass around flags via unsigned integers. Adapt the type
accordingly.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 08:39:38 +09:00
Patrick Steinhardt
d94ac23d3b reftable/block: optimize allocations by using scratch buffer
The block writer needs to compute the key for every record that one adds
to the writer. The buffer for this key is stored on the stack and thus
reallocated on every call to `block_writer_add()`, which is inefficient.

Refactor the code so that we store the buffer in the `block_writer`
struct itself so that we can reuse it. This reduces the number of
allocations when writing many refs, e.g. when migrating one million refs
from the "files" backend to the "reftable backend. Before this change:

    HEAP SUMMARY:
        in use at exit: 80,048 bytes in 49 blocks
      total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated

After this change:

    HEAP SUMMARY:
        in use at exit: 80,048 bytes in 49 blocks
      total heap usage: 2,013,250 allocs, 2,013,201 frees, 347,543,583 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:17 +09:00
Patrick Steinhardt
aa248b8ab2 reftable/block: rename block_writer::buf variable
Adapt the name of the `block_writer::buf` variable to instead be called
`block`. This aligns it with the existing `block_len` variable, which
tracks the length of this buffer, and is generally a bit more tied to
the actual context where this variable gets used.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:17 +09:00
Patrick Steinhardt
66ed011bf7 reftable/writer: optimize allocations by using a scratch buffer
Both `writer_add_record()` and `reftable_writer_add_ref()` get executed
for every single ref record we're adding to the reftable writer. And as
both functions use a local buffer to write data, the allocations we have
to do here add up during larger transactions.

Refactor the code to use a scratch buffer part of the `reftable_writer`
itself such that we can reuse it. This signifcantly reduces the number
of allocations during large transactions, e.g. when migrating refs from
the "files" backend to the "reftable" backend. Before this change:

    HEAP SUMMARY:
        in use at exit: 80,048 bytes in 49 blocks
      total heap usage: 5,032,171 allocs, 5,032,122 frees, 418,792,092 bytes allocated

After this change:

    HEAP SUMMARY:
        in use at exit: 80,048 bytes in 49 blocks
      total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated

This also translate into a small speedup:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     827.2 ms ±  16.5 ms    [User: 689.4 ms, System: 124.9 ms]
      Range (min … max):   809.0 ms … 924.7 ms    50 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     813.6 ms ±  11.6 ms    [User: 679.0 ms, System: 123.4 ms]
      Range (min … max):   786.7 ms … 833.5 ms    50 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.02 ± 0.02 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:16 +09:00
Patrick Steinhardt
a7004abd0b refs: don't normalize log messages with REF_SKIP_CREATE_REFLOG
When the `REF_SKIP_CREATE_REFLOG` flag is set we skip the creation of
the reflog entry, but we still normalize the reflog message when we
queue the update. This is a waste of resources as the normalized message
will never get used in the first place.

Fix this issue by skipping the normalization in case the flag is set.
This leads to a surprisingly large speedup when migrating from the
"files" to the "reftable" backend:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     878.5 ms ±  14.9 ms    [User: 726.5 ms, System: 139.2 ms]
      Range (min … max):   858.4 ms … 941.3 ms    50 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     831.1 ms ±  10.5 ms    [User: 694.1 ms, System: 126.3 ms]
      Range (min … max):   812.4 ms … 851.4 ms    50 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.06 ± 0.02 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

And an ever larger speedup when migrating the other way round:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     923.6 ms ±  11.6 ms    [User: 705.5 ms, System: 208.1 ms]
      Range (min … max):   905.3 ms … 946.5 ms    50 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     818.5 ms ±   9.0 ms    [User: 627.6 ms, System: 180.6 ms]
      Range (min … max):   802.2 ms … 842.9 ms    50 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = HEAD) ran
        1.13 ± 0.02 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:16 +09:00
Patrick Steinhardt
e4929cdf79 refs: skip collision checks in initial transactions
Reference transactions use `refs_verify_refname_available()` to check
for colliding references. This check consists of two parts:

  - Checks for whether multiple ref updates in the same transaction
    conflict with each other.

  - Checks for whether existing refs conflict with any refs part of the
    transaction.

While we generally cannot avoid the first check, the second check is
superfluous in cases where the transaction is an initial one in an
otherwise empty ref store. The check results in multiple ref reads as
well as the creation of a ref iterator for every ref we're checking,
which adds up quite fast when performing the check for many refs.

Introduce a new flag that allows us to skip this check and wire it up in
such that the backends pass it when running an initial transaction. This
leads to significant speedups when migrating ref storage backends. From
"files" to "reftable":

    Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):     472.4 ms ±   6.7 ms    [User: 175.9 ms, System: 285.2 ms]
      Range (min … max):   463.5 ms … 483.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      86.1 ms ±   1.9 ms    [User: 67.9 ms, System: 16.0 ms]
      Range (min … max):    82.9 ms …  90.9 ms    29 runs

    Summary
      migrate files:reftable (refcount = 100000, revision = HEAD) ran
        5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~)

And from "reftable" to "files":

    Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):     452.7 ms ±   3.4 ms    [User: 209.9 ms, System: 235.4 ms]
      Range (min … max):   445.9 ms … 457.5 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      95.2 ms ±   2.2 ms    [User: 73.6 ms, System: 20.6 ms]
      Range (min … max):    91.7 ms … 100.8 ms    28 runs

    Summary
      migrate reftable:files (refcount = 100000, revision = HEAD) ran
        4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:16 +09:00
Patrick Steinhardt
00bd6c3e46 refs: use "initial" transaction semantics to migrate refs
Until now, we couldn't use "initial" transaction semantics to migrate
refs because the "files" backend only supported writing regular refs via
the initial transaction because it simply mapped the transaction to a
"packed-refs" transaction. But with the preceding commit, the "files"
backend has learned to also write symbolic and root refs in the initial
transaction by creating a second transaction for all refs that need to
be written as loose refs.

Adapt the code to migrate refs to commit the transaction as an initial
transaction. This results in a signiticant speedup when migrating many
refs:

    Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):      3.247 s ±  0.034 s    [User: 0.485 s, System: 2.722 s]
      Range (min … max):    3.216 s …  3.309 s    10 runs

    Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
      Time (mean ± σ):     453.6 ms ±   1.9 ms    [User: 214.6 ms, System: 230.5 ms]
      Range (min … max):   451.5 ms … 456.4 ms    10 runs

    Summary
      migrate reftable:files (refcount = 100000, revision = HEAD) ran
        7.16 ± 0.08 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)

As the reftable backend doesn't (yet) special-case initial transactions
there is no comparable speedup for that backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:16 +09:00
Patrick Steinhardt
c0b9cf3b55 refs/files: support symbolic and root refs in initial transaction
The "files" backend has implemented special logic when committing
the first transactions in an otherwise empty ref store: instead of
writing all refs as separate loose files, it instead knows to write them
all into a "packed-refs" file directly. This is significantly more
efficient than having to write each of the refs as separate "loose" ref.

The only user of this optimization is git-clone(1), which only uses this
mechanism to write regular refs. Consequently, the implementation does
not know how to handle both symbolic and root refs. While fine in the
context of git-clone(1), this keeps us from using the mechanism in more
cases.

Adapt the logic to also support symbolic and root refs by using a second
transaction that we use for all of the refs that need to be written as
loose refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
Patrick Steinhardt
1c299d03e5 refs: introduce "initial" transaction flag
There are two different ways to commit a transaction:

  - `ref_transaction_commit()` can be used to commit a regular
    transaction and is what almost every caller wants.

  - `initial_ref_transaction_commit()` can be used when it is known that
    the ref store that the transaction is committed for is empty and
    when there are no concurrent processes. This is used when cloning a
    new repository.

Implementing this via two separate functions has a couple of downsides.
First, every reference backend needs to implement a separate callback
even in the case where they don't special-case the initial transaction.
Second, backends are basically forced to reimplement the whole logic for
how to commit the transaction like the "files" backend does, even though
backends may wish to only tweak certain behaviour of a "normal" commit.
Third, it is awkward that callers must never prepare the transaction as
this is somewhat different than how a transaction typically works.

Refactor the code such that we instead mark initial transactions via a
separate flag when starting the transaction. This addresses all of the
mentioned painpoints, where the most important part is that it will
allow backends to have way more leeway in how exactly they want to
handle the initial transaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
Patrick Steinhardt
83b8ed8bba refs/files: move logic to commit initial transaction
Move the logic to commit initial transactions such that we can start to
call it in `files_transaction_finish()` in a subsequent commit without
requiring a separate function declaration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
Patrick Steinhardt
a0efef1446 refs: allow passing flags when setting up a transaction
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.

Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:14 +09:00
Junio C Hamano
facbe4f633 The eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-08 12:56:28 +09:00
Junio C Hamano
02a2d5706d Merge branch 'jk/left-right-bitmap'
When called with '--left-right' and '--use-bitmap-index', 'rev-list'
will produce output without any left/right markers, which has been
corrected.

* jk/left-right-bitmap:
  rev-list: skip bitmap traversal for --left-right
2024-11-08 12:56:28 +09:00
Junio C Hamano
1ee7dbde67 Merge branch 'ps/upgrade-clar'
Buildfix and upgrade of Clar to a newer version.

* ps/upgrade-clar:
  cmake: set up proper dependencies for generated clar headers
  cmake: fix compilation of clar-based unit tests
  Makefile: extract script to generate clar declarations
  Makefile: adjust sed command for generating "clar-decls.h"
  t/unit-tests: update clar to 206accb
2024-11-08 12:56:28 +09:00
Junio C Hamano
31fe1390cd Merge branch 'cw/config-extensions'
Centralize documentation for repository extensions into a single place.

* cw/config-extensions:
  doc: consolidate extensions in git-config documentation
2024-11-08 12:56:27 +09:00
Junio C Hamano
a2ac8b0707 Merge branch 'kn/ci-clang-format-tidy'
Updates the '.clang-format' to match project conventions.

* kn/ci-clang-format-tidy:
  clang-format: align consecutive macro definitions
  clang-format: re-adjust line break penalties
2024-11-08 12:56:26 +09:00
Junio C Hamano
c14fa9a511 Merge branch 'kn/arbitrary-suffixes'
Update the project's CodingGuidelines to discourage naming functions
with a "_1()" suffix.

* kn/arbitrary-suffixes:
  CodingGuidelines: discourage arbitrary suffixes in function names
2024-11-08 12:56:26 +09:00
Taylor Blau
8f8d6eee53 The seventh batch 2024-11-01 12:59:31 -04:00
Taylor Blau
1c5a712f26 Merge branch 'jk/dumb-http-finalize'
The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.

* jk/dumb-http-finalize:
  packfile: use oidread() instead of hashcpy() to fill object_id
  packfile: use object_id in find_pack_entry_one()
  packfile: convert find_sha1_pack() to use object_id
  http-walker: use object_id instead of bare hash
  packfile: warn people away from parse_packed_git()
  packfile: drop sha1_pack_index_name()
  packfile: drop sha1_pack_name()
  packfile: drop has_pack_index()
  dumb-http: store downloaded pack idx as tempfile
  t5550: count fetches in "previously-fetched .idx" test
  midx: avoid duplicate packed_git entries
2024-11-01 12:53:32 -04:00
Taylor Blau
6d81fe64dd Merge branch 'kh/update-ref'
Documentation updates to 'git-update-ref(1)'.

* kh/update-ref:
  Documentation: mutually link update-ref and symbolic-ref
  Documentation/git-update-ref.txt: discuss symbolic refs
  Documentation/git-update-ref.txt: remove confusing paragraph
  Documentation/git-update-ref.txt: demote symlink to last section
  Documentation/git-update-ref.txt: remove safety paragraphs
  Documentation/git-update-ref.txt: drop “flag”
2024-11-01 12:53:30 -04:00
Taylor Blau
aebc4bd8ce Merge branch 'ak/more-typofixes'
More typofixes.

* ak/more-typofixes:
  t: fix typos
2024-11-01 12:53:29 -04:00
Taylor Blau
43ac23945c Merge branch 'rs/grep-lookahead'
Fix 'git grep' regression on macOS by disabling lookahead when
encountering invalid UTF-8 byte sequences.

* rs/grep-lookahead:
  grep: disable lookahead on error
2024-11-01 12:53:28 -04:00
Taylor Blau
81a5461518 Merge branch 'ak/t1016-cleanup'
Test cleanup.

* ak/t1016-cleanup:
  t1016: clean up style
2024-11-01 12:53:27 -04:00
Taylor Blau
59dc0ab83c Merge branch 'ua/atoi'
Replace various calls to atoi() with strtol_i() and strtoul_ui(), and
add improved error handling.

* ua/atoi:
  imap: replace atoi() with strtol_i() for UIDVALIDITY and UIDNEXT parsing
  merge: replace atoi() with strtol_i() for marker size validation
  daemon: replace atoi() with strtoul_ui() and strtol_i()
2024-11-01 12:53:26 -04:00
Taylor Blau
20ab7fa3b6 Merge branch 'sa/notes-edit'
Teach 'git notes add' and 'git notes append' a new '-e' flag,
instructing them to open the note in $GIT_EDITOR before saving.

* sa/notes-edit:
  notes: teach the -e option to edit messages in editor
2024-11-01 12:53:25 -04:00
Taylor Blau
8237b49ade Merge branch 'sk/t9101-cleanup'
Test cleanup.

* sk/t9101-cleanup:
  t9101: ensure no whitespace after redirect
2024-11-01 12:53:24 -04:00
Taylor Blau
7b1f01f02e Merge branch 'ss/duplicate-typos'
Typofixes.

* ss/duplicate-typos:
  global: Fix duplicate word typos
2024-11-01 12:53:23 -04:00
Taylor Blau
aabbcf2783 Merge branch 'ps/upload-pack-doc'
Documentation update to clarify that 'uploadpack.allowAnySHA1InWant'
implies both 'allowTipSHA1InWant' and 'allowReachableSHA1InWant'.

* ps/upload-pack-doc:
  doc: document how uploadpack.allowAnySHA1InWant impact other allow options
2024-11-01 12:53:22 -04:00
Taylor Blau
07c6066f82 Merge branch 'kh/mv-breakage'
Demonstrate an assertion failure in 'git mv'.

* kh/mv-breakage:
  t7001: add failure test which triggers assertion
2024-11-01 12:53:21 -04:00
Taylor Blau
787297b396 Merge branch 'rj/cygwin-exit'
Treat ECONNABORTED the same as ECONNRESET in 'git credential-cache' to
work around a possible Cygwin regression. This resolves a race condition
caused by changes in Cygwin's handling of socket closures, allowing the
client to exit cleanly when encountering ECONNABORTED.

* rj/cygwin-exit:
  credential-cache: treat ECONNABORTED like ECONNRESET
2024-11-01 12:53:19 -04:00
Taylor Blau
a524cc77ad Merge branch 'ua/t3404-cleanup'
Test update.

* ua/t3404-cleanup:
  t3404: replace test with test_line_count()
  t3404: avoid losing exit status with focus on `git show` and `git cat-file`
2024-11-01 12:53:18 -04:00
Taylor Blau
268fd2fe58 Merge branch 'ps/platform-compat-fixes'
Various platform compatibility fixes split out of the larger effort
to use Meson as the primary build tool.

* ps/platform-compat-fixes:
  t6006: fix prereq handling with `test_format ()`
  http: fix build error on FreeBSD
  builtin/credential-cache: fix missing parameter for stub function
  t7300: work around platform-specific behaviour with long paths on MinGW
  t5500, t5601: skip tests which exercise paths with '[::1]' on Cygwin
  t3404: work around platform-specific behaviour on macOS 10.15
  t1401: make invocation of tar(1) work with Win32-provided one
  t/lib-gpg: fix setup of GNUPGHOME in MinGW
  t/lib-gitweb: test against the build version of gitweb
  t/test-lib: wire up NO_ICONV prerequisite
  t/test-lib: fix quoting of TEST_RESULTS_SAN_FILE
2024-11-01 12:53:17 -04:00
Taylor Blau
47c3170a3e Merge branch 'jc/breaking-changes-early-adopter-option'
Describe the policy to introduce breaking changes.

* jc/breaking-changes-early-adopter-option:
  BreakingChanges: early adopter option
2024-11-01 12:53:14 -04:00
Jeff King
16a186fede rev-list: skip bitmap traversal for --left-right
Running:

  git rev-list --left-right --use-bitmap-index one...two

will produce output without any left-right markers, since the bitmap
traversal returns only a single set of reachable commits. Instead we
should refuse to use bitmaps here and produce the correct output using a
traditional traversal.

This is probably not the only remaining option that misbehaves with
bitmaps, but it's particularly egregious in that it feels like it
_could_ work. Doing two separate traversals for the left/right sides and
then taking the symmetric set differences should yield the correct
answer, but our traversal code doesn't know how to do that.

It's not clear if naively doing two separate traversals would always be
a performance win. A traditional traversal only needs to walk down to
the merge base, but bitmaps always fill out the full reachability set.
So depending on your bitmap coverage, we could end up walking old bits
of history twice to fill out the same uninteresting bits on both sides.
We'd also of course end up with a very large --boundary set, if the user
asked for that.

So this might or might not be something worth implementing later. But
for now, let's make sure we don't produce the wrong answer if somebody
tries it.

The test covers this, but also the same thing with "--count" (which is
what I originally tried in a real-world case). Ironically the
try_bitmap_count() code already realizes that "--left-right" won't work
there. But that just causes us to fall back to the regular bitmap
traversal code, which itself doesn't handle counting (we produce a list
of objects rather than a count).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-11-01 11:02:27 -04:00
Taylor Blau
23d289d273 The sixth batch 2024-10-30 13:36:44 -04:00
Taylor Blau
6aa984af3b Merge branch 'sk/t7011-cleanup'
Test cleanup.

* sk/t7011-cleanup:
  t7011: ensure no whitespace after redirect
2024-10-30 13:08:07 -04:00
Taylor Blau
8d305fdaac Merge branch 'co/t6050-pipefix'
Avoid losing exit status by having Git command being tested on the
upstream side of a pipe.

* co/t6050-pipefix:
  t6050: avoid pipes with upstream Git commands
2024-10-30 13:08:06 -04:00
Taylor Blau
cec4461e3f Merge branch 'ks/t4205-fixup'
Testfix.

* ks/t4205-fixup:
  t4205: fix typo in 'NUL termination with --stat'
2024-10-30 13:08:05 -04:00
Taylor Blau
9947803926 Merge branch 'kh/submitting-patches'
Docfix.

* kh/submitting-patches:
  SubmittingPatches: tags -> trailers
2024-10-30 13:08:04 -04:00
Taylor Blau
6f763d798b Merge branch 'ps/ref-filter-sort'
Teaches the ref-filter machinery to recognize and avoid cases where
sorting would be redundant.

* ps/ref-filter-sort:
  ref-filter: format iteratively with lexicographic refname sorting
2024-10-30 13:08:02 -04:00
Taylor Blau
bc627658b0 Merge branch 'ps/reftable-strbuf'
Implements a new reftable-specific strbuf replacement to reduce
reftable's dependency on Git-specific data structures.

* ps/reftable-strbuf:
  reftable: handle trivial `reftable_buf` errors
  reftable/stack: adapt `stack_filename()` to handle allocation failures
  reftable/record: adapt `reftable_record_key()` to handle allocation failures
  reftable/stack: adapt `format_name()` to handle allocation failures
  t/unit-tests: check for `reftable_buf` allocation errors
  reftable/blocksource: adapt interface name
  reftable: convert from `strbuf` to `reftable_buf`
  reftable/basics: provide new `reftable_buf` interface
  reftable: stop using `strbuf_addf()`
  reftable: stop using `strbuf_addbuf()`
2024-10-30 13:08:01 -04:00
Patrick Steinhardt
dd6003f200 t6006: fix prereq handling with test_format ()
In df383b5842 (t/test-lib: wire up NO_ICONV prerequisite, 2024-10-16) we
have introduced a new NO_ICONV prerequisite that makes us skip tests in
case Git is not compiled with support for iconv. This change subtly
broke t6006: while the test suite still passes, some of its tests won't
execute because they run into an error.

    ./t6006-rev-list-format.sh: line 92: test_expect_%e: command not found

The broken tests use `test_format ()`, and the mentioned commit simply
prepended the new prerequisite to its arguments. But that does not work,
as the function is not aware of prereqs at all and will now treat all of
its arguments incorrectly.

Fix this by making the function aware of prereqs by accepting an
optional fourth argument. Adapt the callsites accordingly.

Reported-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-28 13:44:38 -04:00
Jeff King
863f2459a2 packfile: use oidread() instead of hashcpy() to fill object_id
When chasing a REF_DELTA, we need to pull the raw hash bytes out of the
mmap'd packfile into an object_id struct. We do that with a raw
hashcpy() of the appropriate length (that happens directly now, though
before the previous commit it happened inside find_pack_entry_one(),
also using a hashcpy).

But I think this creates a potentially dangerous situation due to
d4d364b2c7 (hash: convert `oidcmp()` and `oideq()` to compare whole
hash, 2024-06-14). When using sha1, we'll have uninitialized bytes in
the latter part of the object_id.hash buffer, which could fool oideq(),
etc.

We should use oidread() instead, which correctly zero-pads the extra
bytes, as of c98d762ed9 (global: ensure that object IDs are always
padded, 2024-06-14).

As far as I can see, this has not been a problem in practice because the
object_id we feed to find_pack_entry_one() is never used with oideq(),
etc. It is being compared to the bytes mmap'd from a pack idx file,
which of course do not have the extra padding bytes themselves. So
there's no bug here, but this just puzzled me while looking at the code.
We should do the more obviously safe thing, both for future-proofing and
to avoid confusing readers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King
479ab76c9f packfile: use object_id in find_pack_entry_one()
The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.

As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.

The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King
4d99559147 packfile: convert find_sha1_pack() to use object_id
The find_sha1_pack() function has a few problems:

  - it's badly named, since it works with any object hash

  - it takes the hash as a bare pointer rather than an object_id struct

We can fix both of these easily, as all callers actually have a real
object_id anyway.

I also found the existence of this function somewhat confusing, as it is
about looking in an arbitrary set of linked packed_git structs. It's
good for things like dumb-http which are looking in downloaded remote
packs, and not our local packs. But despite the name, it is not a good
way to find the pack which contains a local object (it skips the use of
the midx, the pack mru list, and so on).

So let's also add an explanatory comment above the declaration that may
point people in the right direction.

I suspect the calls in fast-import.c, which use the packed_git list from
the repository struct, could actually just be using find_pack_entry().
But since we'd need to keep it anyway for dumb-http, I didn't dig
further there. If we eventually drop dumb-http support, then it might be
worth examining them to see if we can get rid of the function entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King
0af861e0c8 http-walker: use object_id instead of bare hash
We long ago switched most code to using object_id structs instead of
bare "unsigned char *" hashes. This gives us more type safety from the
compiler, and generally makes it easier to understand what we expect in
each parameter.

But the dumb-http code has lagged behind. And indeed, the whole "walker"
subsystem interface has the same problem, though http-walker is the only
user left.

So let's update the walker interface to pass object_id structs (which we
already have anyway at all call sites!), and likewise use those within
the http-walker methods that it calls.

This cleans up the dumb-http code a bit, but will also let us fix a few
more commonly used helper functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King
6b2fc22050 packfile: warn people away from parse_packed_git()
With a name like parse_packed_git(), you might think it's the right way
to access a local pack index and its associated objects. But not so!
It's a one-off used by the dumb-http code to access pack idx files we've
downloaded from the remote, but whose packs we might not have.

There's only one caller left for this function, and ideally we'd drop it
completely and just inline it there. But that would require exposing
other internals from packfile.[ch], like alloc_packed_git() and
check_packed_git_idx().

So let's leave it be for now, and just warn people that it's probably
not what they're looking for. Perhaps in the long run if we eventually
drop dumb-http support, we can remove the function entirely then.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King
4390fea963 packfile: drop sha1_pack_index_name()
Like sha1_pack_name() that we dropped in the previous commit, this
function uses an error-prone static strbuf and the somewhat misleading
name "sha1".

The only caller left is in pack-redundant.c. While this command is
marked for potential removal in our BreakingChanges document, we still
have it for now. But it's simple enough to convert it to use its own
strbuf with the underlying odb_pack_name() function, letting us drop the
otherwise obsolete function.

Note that odb_pack_name() does its own strbuf_reset(), so it's safe to
use directly within a loop like this.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King
c2dc4c9fbb packfile: drop sha1_pack_name()
The sha1_pack_name() function has a few ugly bits:

  - it writes into a static strbuf (and not even a ring buffer of them),
    which can lead to subtle invalidation problems

  - it uses the term "sha1", but it's really using the_hash_algo, which
    could be sha256

There's only one caller of it left. And in fact that caller is better
off using the underlying odb_pack_name() function itself, since it's
just copying the result into its own strbuf anyway.

Converting that caller lets us get rid of this now-obselete function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00