Similar as the preceding commit, we may try to do a zero-sized
allocation when reloading a reftable stack that ain't got any tables.
It is implementation-defined whether malloc(3p) returns a NULL pointer
in that case or a zero-sized object. In case it does return a NULL
pointer though it causes us to think we have run into an out-of-memory
situation, and thus we return an error.
Fix this by only allocating arrays when they have at least one entry.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported [1] that Git started to fail with an out-of-memory error
when initializing repositories with the reftable backend on NonStop
platforms. A bisect led to 802c0646ac (reftable/merged: handle
allocation failures in `merged_table_init_iter()`, 2024-10-02), which
changed how we allocate memory when initializing a merged table.
The root cause of this seems to be that NonStop returns a `NULL` pointer
when doing a zero-sized allocation. This would've already happened
before the above change, but we never noticed because we did not check
the result. Now we do notice and thus return an out-of-memory error to
the caller.
Fix the issue by skipping the allocation altogether in case there are no
readers.
[1]: <00ad01db5017$aa9ce340$ffd6a9c0$@nexbridge.com>
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to compact tables we need at least two tables. Bail out early
from `reftable_stack_auto_compact()` in case we have less than two
tables.
In the original, `stack_table_sizes_for_compaction()` yields an array
that has the same length as the number of tables. This array is then
passed on to `suggest_compaction_segment()`, which returns an empty
segment in case we have less than two tables. The segment is then passed
to `segment_size()`, which will return `0` because both start and end of
the segment are `0`. And because we only call `stack_compact_range()` in
case we have a positive segment size we don't perform auto-compaction at
all. Consequently, this change does not result in a user-visible change
in behaviour when called with a single table.
But when called with no tables this protects us against a potential
out-of-memory error: `stack_table_sizes_for_compaction()` would try to
allocate a zero-byte object when there aren't any tables, and that may
lead to a `NULL` pointer on some platforms like NonStop which causes us
to bail out with an out-of-memory error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When growing the `names` array fails we would end up with a `NULL`
pointer. This causes two problems:
- We would run into a segfault because we try to free names that we
have assigned to the array already.
- We lose track of the old array and cannot free its contents.
Fix this issue by using a temporary variable. Like this we do not
clobber the old array that we tried to reallocate, which will remain
valid when a call to realloc(3P) fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library uses pluggable allocators, which means that we
shouldn't ever use the standard allocator functions. But it is an easy
mistake to make to accidentally use e.g. free(3P) instead of the
reftable-specific `reftable_free()` function, and we do not have any
mechanism to detect this misuse right now.
Introduce a couple of macros that ban the standard allocators, similar
to how we do it in "banned.h".
Note that we do not ban the following two classes of functions:
- Macros like `FREE_AND_NULL()` or `REALLOC_ARRAY()`. As those expand
to code that contains already-banned functions we'd get a compiler
error even without banning those macros explicitly.
- Git-specific allocators like `xmalloc()` and friends. The primary
reason is that there are simply too many of them, so we're rather
aiming for best effort here. Furthermore, the eventual goal is to
make them unavailable in the reftable library place by not pulling
them in via "git-compat-utils.h" anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several calls to `FREE_AND_NULL()` in the reftable library,
which of course uses free(3P). As the reftable allocators are pluggable
we should rather call the reftable specific function, which is
`reftable_free()`.
Introduce a new macro `REFTABLE_FREE_AND_NULL()` and adapt the callsites
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a small set of calls to free(3P) in the reftable library. As
the reftable allocators are pluggable we should rather call the reftable
specific function, which is `reftable_free()`.
Convert the code accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle trivial allocation failures in the reftable library and its unit
tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tree interfaces of the reftable library handle both insertion and
searching of tree nodes with a single function, where the behaviour is
altered between the two via an `insert` bit. This makes it quit awkward
to handle allocation failures because on inserting we'd have to check
for `NULL` pointers and return an error, whereas on searching entries we
don't have to handle it as an allocation error.
Split up concerns of this function into two separate functions, one for
inserting entries and one for searching entries. This makes it easy for
us to check for allocation errors as `tree_insert()` should never return
a `NULL` pointer now. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when adding entries to the pqueue. Adapt its
only caller accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `block_writer_init()` and
`block_reader_init()`. This requires us to bubble up error codes into
`writer_reinit_block_writer()`. Adapt call sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `new_indexed_table_ref_iter()`. While at
it, rename the function to match our coding style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_stack_auto_compact()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_stack_reload_once()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reader_init_iter()`. This requires us to
also adapt `reftable_reader_init_*_iterator()` to bubble up the new
error codes. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when creating unindexed readers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `merged_table_init_iter()`. While at it,
merge `merged_iter_init()` into the function. It only has a single
caller and merging them makes it easier to handle allocation failures
consistently.
This change also requires us to adapt `reftable_stack_init_*_iterator()`
to bubble up the new error codes of `merged_table_iter_init()`. Adapt
callsites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_new_writer()`. Adapt the
function to return an error code to return such failures. While at it,
rename it to match our code style as we have to touch up every callsite
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation errors in `writer_index_hash()`. Adjust its only
caller in `reftable_writer_add_ref()` accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when decoding records. While at it, fix some
error codes to be `REFTABLE_FORMAT_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when copying records. While at it, convert
from `xstrdup()` to `reftable_strdup()`. Adapt callsites to check for
error codes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `parse_names()` by returning `NULL` in
case any allocation fails. While at it, refactor the function to return
the array directly instead of assigning it to an out-pointer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_calloc()`.
While at it, remove our use of `st_mult()` that would cause us to die on
an overflow. From the caller's point of view there is not much of a
difference between arguments that are too large to be multiplied and a
request that is too big to handle by the allocator: in both cases the
allocation cannot be fulfilled. And in neither of these cases do we want
the reftable library to die.
While we could use `unsigned_mult_overflows()` to handle the overflow
gracefully, we instead open-code it to further our goal of converting
the reftable codebase to become a standalone library that can be reused
by external projects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library provides the ability to swap out allocators. There
is a gap here though, because we continue to use `xstrdup()` even in the
case where all the other allocators have been swapped out.
Introduce `reftable_strdup()` that uses `reftable_malloc()` to do the
allocation.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split between "basics" and "publicbasics" is somewhat arbitrary and
not in line with how we typically structure code in the reftable
library. While we do indeed split up headers into a public and internal
part, we don't do that for the compilation unit itself. Furthermore, the
declarations for "publicbasics.c" are in "reftable-malloc.h", which
isn't in line with our naming schema, either.
Fix these inconsistencies by:
- Merging "publicbasics.c" into "basics.c".
- Renaming "reftable-malloc.h" to "reftable-basics.h" as the public
header.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library does not use the same memory allocation functions
as the rest of the Git codebase. Instead, as the reftable library is
supposed to be usable as a standalone library without Git, it provides a
set of pluggable memory allocators.
Compared to `xmalloc()` and friends these allocators are _not_ expected
to die when an allocation fails. This design choice is concious, as a
library should leave it to its caller to handle any kind of error. While
it is very likely that the caller cannot really do much in the case of
an out-of-memory situation anyway, we are not the ones to make that
decision.
Curiously though, we never handle allocation errors even though memory
allocation functions are allowed to fail. And as we do not plug in Git's
memory allocator via `reftable_set_alloc()` either the consequence is
that we'd instead segfault as soon as we run out of memory.
While the easy fix would be to wire up `xmalloc()` and friends, it
would only fix the usage of the reftable library in Git itself. Other
users like libgit2, which is about to revive its efforts to land a
backend for reftables, wouldn't be able to benefit from this solution.
Instead, we are about to do it the hard way: adapt all allocation sites
to perform error checking. Introduce a new error code for out-of-memory
errors that we will wire up in subsequent steps.
This commit also serves as the motivator for all the remaining steps in
this series such that we do not have to repeat the same arguments in
every single subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cp/unit-test-reftable-stack:
t-reftable-stack: add test for stack iterators
t-reftable-stack: add test for non-default compaction factor
t-reftable-stack: use reftable_ref_record_equal() to compare ref records
t-reftable-stack: use Git's tempfile API instead of mkstemp()
t: harmonize t-reftable-stack.c with coding guidelines
t: move reftable/stack_test.c to the unit testing framework
Exclude patterns can be used by reference backends to skip over blocks
of references that are uninteresting to the caller. Reference backends
do not have to wire up support for them, and all callers are expected to
behave as if the backend didn't support them. In fact, the only backend
that supports exclude patterns right now is the "packed" backend.
Exclude patterns can be quite an important performance optimization in
repositories that have loads of references. The patterns are set up in
case "transfer.hideRefs" and friends are configured during a fetch, so
handling these patterns becomes important once there are lots of hidden
refs in a served repository.
Now that we have properly re-seekable reftable iterators we can also
wire up support for these patterns in the "reftable" backend. Doing so
is conceptually simple: once we hit a reference whose prefix matches the
current exclude pattern we re-seek the iterator to the first reference
that doesn't match the pattern anymore. This schema only works for
trivial patterns that do not have any globbing characters in them, but
this restriction also applies do the "packed" backend.
This makes t1419 work with the "reftable" backend with some slight
modifications. Of course it also speeds up listing of references with
hidden refs. The following benchmark prints one reference with 1 million
hidden references:
Benchmark 1: HEAD~
Time (mean ± σ): 93.3 ms ± 2.1 ms [User: 90.3 ms, System: 2.5 ms]
Range (min … max): 89.8 ms … 97.2 ms 33 runs
Benchmark 2: HEAD
Time (mean ± σ): 4.2 ms ± 0.6 ms [User: 2.2 ms, System: 1.8 ms]
Range (min … max): 3.1 ms … 8.1 ms 765 runs
Summary
HEAD ran
22.15 ± 3.19 times faster than HEAD~
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 67ce50ba26 (Merge branch 'ps/reftable-reusable-iterator', 2024-05-30)
we have refactored the interface of reftable iterators such that they
can be reused in theory. This patch series only landed the required
changes on the interface level, but didn't yet implement the actual
logic to make iterators reusable.
As it turns out almost all of the infrastructure already does support
re-seeking. The only exception is the table iterator, which does not
reset its `is_finished` bit. Do so and add a couple of tests that verify
that we can re-seek iterators.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have recently migrated all of the reftable unit tests that were part
of the reftable library into our own unit testing framework. As part of
that migration we have duplicated some of the functionality that was
part of the reftable test framework into each of the migrated test
suites. This was a sensible decision to not have all of the migrations
dependent on each other, but now that the migration is done it makes
sense to deduplicate the functionality again.
Introduce a new reftable test library that hosts some shared code and
adapt tests to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever one adds another test library compilation unit one has to wire
it up twice in the Makefile: once to append it to `UNIT_TEST_OBJS`, and
once to append it to the `UNIT_TEST_PROGS` target. Ideally, we'd just
reuse the `UNIT_TEST_OBJS` variable in the target so that we can avoid
the duplication. But it also contains all the objects for our test
programs, each of which contains a `cmd_main()`, and thus we cannot link
them all into the target executable.
Refactor the code such that `UNIT_TEST_OBJS` does not contain the unit
test program objects anymore, which we can instead manually append to
the `OBJECTS` variable. Like this, the former variable now only contains
objects for test libraries and can thus be reused.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `write_head_info()` we announce references to the remote client. We
need to honor "transfer.hideRefs" here so that we do not announce any
references that the client shouldn't be able to learn about. This is
done via two separate mechanisms:
- We hand over exclude patterns to the reference backend. We can only
honor "plain" exclude patterns here that do not have prefixes with
special meaning such as "^" or "!". Filtering down the references is
handled by `hidden_refs_to_excludes()`.
- In `show_ref_cb()` we perform a second check against hidden refs.
For one this is done such that we can handle those special prefixes.
And second, handling exclude patterns in ref backends is optional,
so we also have to handle "normal" patterns.
The special-meaning "^" prefix alters whether a hidden ref applies to
the namespace-stripped reference name or the full name. So while we
would usually call `refs_for_each_namespaced_ref()` to only get those
references in the current namespace, we can't because we'd get the
already-rewritten reference names. Instead, we are forced to use
`refs_for_each_fullref_in()` and then manually strip away the namespace
prefix such that we have access to both names.
But this also means that we do not get namespace handling for exclude
patterns, which `refs_for_each_namespaced_ref()` brings for free. This
results in a bug because we potentially end up hiding away references
based on their namespaced name and not on the stripped name as we really
should be doing.
Fix this by manually rewriting the exclude patterns to their namespaced
variants.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reference namespaces allow commands like git-upload-pack(1) to serve
different sets of references to the client depending on which namespace
is enabled, which is for example useful in fork networks. Namespaced
refs are stored with a `refs/namespaces/$namespace` prefix, but all the
user will ultimately see is a stripped version where that prefix is
removed.
The way that this interacts with "transfer.hideRefs" is not immediately
obvious: the hidden refs can either apply to the stripped references, or
to the non-stripped ones that still have the namespace prefix. In fact,
the "transfer.hideRefs" machinery does the former and applies to the
stripped reference by default, but rules can have "^" prefixed to switch
this behaviour to instead match against the full reference name.
Namespaces are exclusively handled at the generic "refs" layer, the
respective backends have no clue that such a thing even exists. This
also has the consequence that they cannot handle hiding references as
soon as reference namespaces come into play because they neither know
whether a namespace is active, nor do they know how to strip references
if they are active.
Handling such exclude patterns in `refs_for_each_namespaced_ref()` and
`refs_for_each_fullref_in_prefixes()` is broken though, as both support
that the user passes both namespaces and exclude patterns. In the case
where both are set we will exclude references with unstripped names,
even though we really wanted to exclude references based on their
stripped names.
This only surfaces when:
- A repository uses reference namespaces.
- "transfer.hideRefs" is active.
- The namespaced references are packed into the "packed-refs" file.
None of our tests exercise this scenario, and thus we haven't ever hit
it. While t5509 exercises both (1) and (2), it does not happen to hit
(3). It is trivial to demonstrate the bug though by explicitly packing
refs in the tests, and then we indeed surface the breakage.
Fix this bug by prefixing exclude patterns with the namespace in the
generic layer. The newly introduced function will be used outside of
"refs.c" in the next patch, so we add a declaration to "refs.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interpret-trailers command failed to recognise the end of the
message when the commit log ends in an incomplete line.
* bl/trailers-and-incomplete-last-line-fix:
interpret-trailers: handle message without trailing newline
Cygwin does have /dev/tty support that is needed by things like
single-key input mode.
* rj/cygwin-has-dev-tty:
config.mak.uname: add HAVE_DEV_TTY to cygwin config section
In a few corner cases "git diff --exit-code" failed to report
"changes" (e.g., renamed without any content change), which has
been corrected.
* rs/diff-exit-code-fix:
diff: report dirty submodules as changes in builtin_diff()
diff: report copies and renames as changes in run_diff_cmd()
The environment GIT_ADVICE has been intentionally kept undocumented
to discourage its use by interactive users. Add documentation to
help tool writers.
* ds/doc-wholesale-disabling-advice-messages:
advice: recommend GIT_ADVICE=0 for tools
A file descriptor left open is now properly closed when "git
sparse-checkout" updates the sparse patterns.
* jk/sparse-fdleak-fix:
sparse-checkout: use fdopen_lock_file() instead of xfdopen()
sparse-checkout: check commit_lock_file when writing patterns
sparse-checkout: consolidate cleanup when writing patterns