As submodules are proper self-contained repositories, it is perfectly
valid for them to have a different ref storage format than their parent
repository. There is no obvious way for users to ask for the ref storage
format when initializing submodules though. Whether the setup of such
mixed-ref-storage-format constellations is all that useful remains to be
seen. But there is no good reason to not expose such an option, and we
will require it in a subsequent patch.
Introduce a new `--ref-format=` option for git-submodule(1) that allows
the user to pick the ref storage format. This option will also be used
in a subsequent commit, where we start to propagate the same flag from
git-clone(1) to cloning submodules with the `--recursive` switch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For most of the subcommands of git-submodule(1), we end up passing a
bunch of arguments to the submodule helper. This quickly leads to overly
long lines, where it becomes hard to spot what has changed when one
needs to modify them.
Break up these lines into one argument per line, similarly to how it is
done for the "clone" subcommand already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Transport URLs can be prefixed with "foo::", which would tell us that
the transport uses a remote helper called "foo". We extract the helper
name by `xstrndup()`ing the prefix before the double-colons, but never
free that string.
Fix this leak by assigning the result to a separate local variable that
we can then free upon returning.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/refs-wo-the-repository:
refs/reftable: stop using `the_repository`
refs/packed: stop using `the_repository`
refs/files: stop using `the_repository`
refs/files: stop using `the_repository` in `parse_loose_ref_contents()`
refs: stop using `the_repository`
Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.
This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:
$ git multi-pack-index write --incremental
The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:
- The `compute_sorted_entries()` function is taught to reject objects
which appear in any existing MIDX layer.
- Functions like `write_midx_revindex()` are adjusted to write
pack_order values which are offset by the number of objects in the
base MIDX layer.
- The end of `write_midx_internal()` is adjusted to move
non-incremental MIDX files when necessary (i.e. when creating an
incremental chain with an existing non-incremental MIDX in the
repository).
There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).
The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:
1. Most of the "interesting" behavior is already thoroughly covered in
t5319-multi-pack-index.sh, which handles the core logic of reading
objects through a MIDX.
The new tests in t5334-incremental-multi-pack-index.sh are mostly
focused on creating and destroying incremental MIDXs, as well as
stitching their results together across layers.
2. A new GIT_TEST environment variable is added called
"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
entire test suite to write incremental MIDXs after repacking when
combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.
This exercises the long tail of other interesting behavior that is
defined implicitly throughout the rest of the CI suite. It is
likewise added to the linux-TEST-vars job.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for sub-directories to appear in $GIT_DIR/objects/pack by
adjusting the copy, remove, and chmod invocations to perform their
behavior recursively.
This prepares us for the new $GIT_DIR/objects/pack/multi-pack-index.d
directory which will be added in a following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two years ago, commit ff1e653c8e (midx: respect
'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
environment variable which caused the test suite to write MIDX bitmaps
after any 'git repack' invocation.
At the time, this was done to help flush out any bugs with MIDX bitmaps
that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
script.
Two years later, that flag has served us well and is no longer providing
meaningful coverage, as the script in t5326 has matured substantially
and covers many more interesting cases than it did back when ff1e653c8e
was originally written.
Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
as it is no longer serving a useful purpose. More importantly, removing
this variable clears the way for us to introduce a new one to help
similarly flush out bugs related to incremental MIDX chains.
Because these incremental MIDX chains are (for now) incompatible with
MIDX bitmaps, we cannot have both.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the verification implementation used by `git multi-pack-index
verify` to perform verification for incremental MIDX chains by
independently validating each layer within the chain.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the MIDX machinery's internals have been taught to understand
incremental MIDXs over the previous handful of commits, the MIDX
machinery itself can begin reading incremental MIDXs.
(Note that while the on-disk format for incremental MIDXs has been
defined, the writing end has not been implemented. This will take place
in the commit after next.)
The core of this change involves following the order specified in the
MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
adding them to the previous layer's `->base_midx` pointer at each step.
In order to implement this, the `load_multi_pack_index()` function is
taught to call a new `load_multi_pack_index_chain()` function if loading
a non-incremental MIDX failed via `load_multi_pack_index_one()`.
When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in
the file one-by-one and dispatches calls to
`load_multi_pack_index_one()` to read each layer of the MIDX chain. When
a layer was successfully read, it is added to the MIDX chain by calling
`add_midx_to_chain()` which validates the contents of the `BASE` chunk,
performs some bounds checks on the number of combined packs and objects,
and attaches the new MIDX by assigning its `base_midx` pointer to the
existing part of the chain.
As a supplement to this, introduce a new mode in the test-read-midx
test-tool which allows us to read the information for a specific MIDX in
the chain by specifying its trailing checksum via the command-line
arguments like so:
$ test-tool read-midx .git/objects [checksum]
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `midx_fanout_add_midx_fanout()` is used to help construct
the fanout table when generating a MIDX by reusing data from an existing
MIDX.
Prepare this function to work with incremental MIDXs by making a few
changes:
- The bounds checks need to be adjusted to start object lookups taking
into account the number of objects in the previous MIDX layer (i.e.,
by starting the lookups at position `m->num_objects_in_base` instead
of position 0).
- Likewise, the bounds checks need to end at `m->num_objects_in_base`
objects after `m->num_objects`.
- Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier
MIDX layers when dealing with an incremental MIDX chain by calling
itself when given a MIDX with a non-NULL `base_midx`.
Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
`packs_to_include`, 2024-06-11), we do not use this function with an
existing MIDX (incremental or not) when generating a MIDX with
--stdin-packs, and likewise for incremental MIDXs.
But it is still used when adding the fanout table from an incremental
MIDX when generating a non-incremental MIDX (without --stdin-packs, of
course).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `midx_preferred_pack()` is used to determine the identity
of the preferred pack, which is the identity of a unique pack within
the MIDX which is used as a tie-breaker when selecting from which pack
to represent an object that appears in multiple packs within the MIDX.
Historically we have said that the MIDX's preferred pack has the unique
property that all objects from that pack are represented in the MIDX.
But that isn't quite true: a more precise statement would be that all
objects from that pack *which appear in the MIDX* are selected from that
pack.
This helps us extend the concept of preferred packs across a MIDX chain,
where some object(s) in the preferred pack may appear in other packs
in an earlier MIDX layer, in which case those object(s) will not appear
in a subsequent MIDX layer from either the preferred pack or any other
pack.
Extend the concept of preferred packs by using the pack which represents
the object at the first position in MIDX pseudo-pack order belonging to
the current MIDX layer (i.e., at position 'm->num_objects_in_base').
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle
has been cleaned up, teach the former about how to operate in an
incremental MIDX-aware world in a similar fashion as in previous
commits.
Instead of using either of the two `midx_for_object()` or
`midx_for_pack()` helpers, this function is split into two: one that
determines whether a pack is contained in a single MIDX, and another
which calls the former in a loop over all MIDXs.
This approach does not require that we change any of the implementation
in what is now `midx_contains_pack_1()` as it still operates over a
single MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 307d75bbe6 (midx: implement `midx_locate_pack()`, 2023-12-14)
introduced `midx_locate_pack()`, which was described at the time as a
complement to the function `midx_contains_pack()` which allowed
callers to determine where in the MIDX lexical order a pack appeared, as
opposed to whether or not it was simply contained.
307d75bbe6 suggests that future patches would be added which would
introduce callers for this new function, but none ever were, meaning the
function has gone unused since its introduction.
Clean this up by in effect reverting 307d75bbe6, which removes the
unused functions and inlines its definition back into
`midx_contains_pack()`.
(Looking back through the list archives when 307d75bbe6 was written,
this was in preparation for this[1] patch from back when we had the
concept of "disjoint" packs while developing multi-pack verbatim reuse.
That concept was abandoned before the series was merged, but I never
dropped what would become 307d75bbe6 from the series, leading to the
state prior to this commit).
[1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as previous commits, teach the `fill_midx_entry()`
function to work in a incremental MIDX-aware fashion.
This function, unlike others which accept an index into either the
lexical order of objects or packs, takes in an object_id, and attempts
to fill a caller-provided 'struct pack_entry' with the remaining pieces
of information about that object from the MIDX.
The function uses `bsearch_midx()` which fills out the frame-local 'pos'
variable, recording the given object_id's lexical position within the
MIDX chain, if found (if no matching object ID was found, we'll return
immediately without filling out the `pack_entry` structure).
Once given that position, we jump back through the `->base_midx` pointer
to ensure that our `m` points at the MIDX layer which contains the given
object_id (and not an ancestor or descendant of it in the chain). Note
that we can drop the bounds check "if (pos >= m->num_objects)" because
`midx_for_object()` performs this check for us.
After that point, we only need to make two special considerations within
this function:
- First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
is a position in the concatenated lexical order of packs, so we must
ensure that we subtract `m->num_packs_in_base` before accessing the
MIDX-local `packs` array.
- Second, we must avoid translating the `pos` back to a MIDX-local
index, since we use it as an argument to `nth_midxed_offset()` which
expects a position relative to the concatenated lexical order of
objects.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as in previous commits, teach the function
`nth_midxed_offset()` about incremental MIDXs.
The given object `pos` is used to find the containing MIDX, and
translated back into a MIDX-local position by assigning the return value
of `midx_for_object()` to it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the special cases callers of `bsearch_midx()` have been dealt
with, teach `bsearch_midx()` to handle incremental MIDX chains.
The incremental MIDX-aware version of `bsearch_midx()` works by
repeatedly searching for a given OID in each layer along the
`->base_midx` pointer, stopping either when an exact match is found, or
the end of the chain is reached.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).
While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.
For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).
Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.
At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as in previous commits, teach the function
`nth_bitmapped_pack()` about incremental MIDXs by translating the given
`pack_int_id` from the concatenated lexical order to a MIDX-local
lexical position.
When accessing the containing MIDX's array of packs, use the local pack
ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
when accessing the data within that chunk.
(Note that the both the call to prepare_midx_pack() and the assignment
of bp->pack_int_id both care about the global pack_int_id, so avoid
shadowing the given 'pack_int_id' parameter).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `nth_midxed_object_oid()` returns the object ID for a given
object position in the MIDX lexicographic order.
Teach this function to instead operate over the concatenated
lexicographic order defined in an earlier step so that it is able to be
used with incremental MIDXs.
To do this, we need to both (a) adjust the bounds check for the given
'n', as well as record the MIDX-local position after chasing the
`->base_midx` pointer to find the MIDX which contains that object.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `prepare_midx_pack()` is part of the midx.h API and
loads the pack identified by the MIDX-local 'pack_int_id'. This patch
prepares that function to be aware of an incremental MIDX world.
To do this, introduce the second of the two general purpose helpers
mentioned in the previous commit. This commit introduces
`midx_for_pack()`, which is the pack-specific analog of
`midx_for_object()`, and works in the same fashion.
Like `midx_for_object()`, this function chases down the '->base_midx'
field until it finds the MIDX layer within the chain that contains the
given pack.
Use this function within `prepare_midx_pack()` so that the `pack_int_id`
it expects is now relative to the entire MIDX chain, and that it
prepares the given pack in the appropriate MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `nth_midxed_pack_int_id()` takes in a object position in
MIDX lexicographic order and returns an identifier of the pack from
which that object was selected in the MIDX.
Currently, the given object position is an index into the lexicographic
order of objects in a single MIDX. Change this position to instead refer
into the concatenated lexicographic order of all MIDXs in a MIDX chain.
This has two visible effects within the implementation of
`prepare_midx_pack()`:
- First, the given position is now an index into the concatenated
lexicographic order of all MIDXs in the order in which they appear
in the MIDX chain.
- Second the pack ID returned from this function is now also in the
concatenated order of packs among all layers of the MIDX chain in
the same order that they appear in the MIDX chain.
To do this, introduce the first of two general purpose helpers, this one
being `midx_for_object()`. `midx_for_object()` takes a double pointer to
a `struct multi_pack_index` as well as an object `pos` in terms of the
entire MIDX chain[^1].
The function chases down the '->base_midx' field until it finds the MIDX
layer within the chain that contains the given object. It then:
- modifies the double pointer to point to the containing MIDX, instead
of the tip of the chain, and
- returns the MIDX-local position[^2] at which the given object can be
found.
Use this function within `nth_midxed_pack_int_id()` so that the `pos` it
expects is now relative to the entire MIDX chain, and that it returns
the appropriate pack position for that object.
[^1]: As a reminder, this means that the object is identified among the
objects contained in all layers of the incremental MIDX chain, not any
particular layer. For example, consider MIDX chain with two individual
MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
4 objects appears earlier in the chain, then asking for object 6 would
return the second object in the MIDX with 3 objects.
[^2]: Building on the previous example, asking for object 6 in a MIDX
chain with (4, 3) objects, respectively, this would set the double
pointer to point at the MIDX containing three objects, and would
return an index to the second object within that MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The incremental MIDX chain feature is designed around the idea of
indexing into a concatenated lexicographic ordering of object IDs
present in the MIDX.
When given an object position, the MIDX machinery needs to be able to
locate both (a) which MIDX layer contains the given object, and (b) at
what position *within that MIDX layer* that object appears.
To do this, three new fields are added to the `struct multi_pack_index`:
- struct multi_pack_index *base_midx;
- uint32_t num_objects_in_base;
- uint32_t num_packs_in_base;
These three fields store the pieces of information suggested by their
respective field names. In turn, the `num_objects_in_base` and
`num_packs_in_base` fields are used to crawl backwards along the
`base_midx` pointer to locate the appropriate position for a given
object within the MIDX that contains it.
The following commits will update various parts of the MIDX machinery
(as well as their callers from outside of midx.c and midx-write.c) to be
aware and make use of these fields when performing object lookups.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to implement incremental multi-pack indexes (MIDXs) over the
next several commits by first describing the relevant prerequisites
(like a new chunk in the MIDX format, the directory structure for
incremental MIDXs, etc.)
The format is described in detail in the patch contents below, but the
high-level description is as follows.
Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and
each `*.midx` within that directory has a single "parent" MIDX, which is
the MIDX layer immediately before it in the MIDX chain. The chain order
resides in a file 'multi-pack-index-chain' in the same directory.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many existing tests in this script perform operation(s) and then use
test_when_finished to define how to undo the effect of the
operation(s).
This is backwards. When your operation(s) fail before you manage to
successfully call test_when_finished (remember, that these commands
must be all &&-chained, so a failure of an earlier operation mean
your test_when_finished may not be executed at all). You must
establish how to clean up your mess with test_when_finished before
you create the mess to be cleaned up.
Also make sure that the body of test_when_finished deals with case
where the cruft it wants to remove failed to be created, by using
"rm -f" (instead of "rm") to remove potential cruft files, and
having "|| :" after "git notes remove" to remove potential cruft
notes---both of these by default fail when asked to remove something
that does not exist, instead of being silently idempotent no-ops.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
helper/test-hashmap.c along with t0011-hashmap.sh test the hashmap.h
library. Migrate them to the unit testing framework for better
debugging, runtime performance and concise code.
Along with the migration, make 'add' tests from the shell script order
agnostic in unit tests, since they iterate over entries with the same
keys and we do not guarantee the order. This was already done for the
'iterate' tests[1].
The helper/test-hashmap.c is still not removed because it contains a
performance test meant to be run by the user directly (not used in
t/perf). And it makes sense for such a utility to be a helper.
[1]: e1e7a77141 (t: sort output of hashmap iteration, 2019-07-30)
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Josh Steadmon <steadmon@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On systems where running t7704.09 takes longer than 10 seconds, the test
can fail.
The test works by doing the following:
- First write three unreachable objects, backdating the mtime for a
single object ($foo) which we expect to prune.
- Repack the repository into a pack containing reachable objects, and
another three cruft packs, each containing one of the objects
written in the previous step.
- Backdate the mtimes of the cruft pack *.mtimes files themselves.
(Note that this does not affect what is pruned further down in the
test, but is done to ensure that the cruft packs are rewritten
during that step).
- Then repack with --cruft-expiration=10.seconds.ago, expecting to
prune one of the three unreachable objects written in the first
step.
- Assert that the surviving cruft packs were rewritten, object $foo is
pruned, and unreachable objects $bar, and $baz remain in the
repository.
If longer than 10 seconds pass between writing the three unreachable
objects (the first step) and the "git repack --cruft" (the fourth step),
we will mistakenly prune more objects than expected, causing the test to
fail.
The $foo object which we expect to prune has its mtime set back to
10,000 seconds relative to the current time, but we prune it with a
cutoff of 10.seconds.ago.
Instead, set the cutoff to be 1,000 seconds to give the test much longer
time to run without failing. This helps platforms where running
individual tests can perform slowly, on my machine this test runs much
more quickly:
$ hyperfine './t7704-repack-cruft.sh --run=9'
Benchmark 1: ./t7704-repack-cruft.sh --run=9
Time (mean ± σ): 647.4 ms ± 30.7 ms [User: 528.5 ms, System: 124.1 ms]
Range (min … max): 594.1 ms … 696.5 ms 10 runs
Reported-by: Randall Becker <randall.becker@nexbridge.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.
A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.
Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.
For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce code duplication by calling decimal_width() to count the digits
in the number of commits instead of calculating it locally.
It also has the advantage of returning int, which is the exact type
expected by the printf()-like function strbuf_addf() for field width
arguments.
Additionally, decimal_width() supports numbers bigger than 1410065407,
which is (hopefully) just a theoretical advantage.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This complements 64a6dd8ffc (refs: implement removal of ref storages,
2024-06-06).
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git stores only canonical modes for blobs. So for a regular file, we
care about only "100644" or "100755" (depending only on the executable
bit), but never modes where the group or other permissions are more
exotic. So never "100664", "100700", etc. When a file in the working
tree has such a mode, we quietly turn it into one of the two canonical
modes, and that's what is stored both in the index and in tree objects.
However, we don't canonicalize modes we read from incoming patches in
git-apply. These may appear in a few lines:
- "old mode" / "new mode" lines for mode changes
- "new file mode" lines for newly created files
- "deleted file mode" for removing files
For "new mode" and for "new file mode", this is harmless. The patch is
asking the result to have a certain mode, but:
- when we add an index entry (for --index or --cached), it is
canonicalized as we create the entry, via create_ce_mode().
- for a working tree file, try_create_file() passes either 0777 or
0666 to open(), so what you get depends only on your umask, not any
other bits (aside from the executable bit) in the original mode.
However, for "old mode" and "deleted file mode", there is a minor
annoyance. We compare the patch's expected preimage mode with the
current state. But that current state is always going to be a canonical
mode itself:
- updating an index entry via --cached will have the canonical mode in
the index
- for updating a working tree file, check_preimage() runs the mode
through ce_mode_from_stat(), which does the usual canonicalization
So if the patch feeds a non-canonical mode, it's impossible for it to
match, and we will always complain with something like:
file has type 100644, expected 100664
Since this is just a warning, the operation proceeds, but it's
confusing and annoying.
These cases should be pretty rare in practice. Git would never produce a
patch with non-canonical modes itself (since it doesn't store them).
And while we do accept patches from other programs, all of those lines
were invented by Git. So you'd need a program trying to be Git
compatible, but not handling canonicalization the same way. Reportedly
"quilt" is such a program.
We should canonicalize the modes as we read them so that the user never
sees the useless warning.
A few notes on the tests:
- I've covered instances of all lines for completeness, even though
the "new mode" / "new file mode" ones behave OK currently.
- the tests apply patches to both the index and working tree, and
check the result of both. Again, we know that all of these paths
canonicalize anyway, but it's giving us extra coverage (although we
are even less likely to have such a bug now since we canonicalize up
front).
- the test patches are missing "index" lines, which is also something
Git would never produce. But they don't matter for the test, they do
match the case from quilt we saw in the wild, and they avoid some
sha1/sha256 complexity.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the current testing setup for infix_walk(), the following
properties of an infix traversal of a tree remain untested:
- every node of the tree must be visited
- every node must be visited exactly once
In fact, only the property 'traversal in increasing order' is tested.
Modify test_infix_walk() to check for all the properties above.
This can be achieved by storing the nodes' keys linearly, in a nullified
buffer, as we visit them and then checking the input keys against this
buffer in increasing order. By checking that the element just after
the last input key is 'NULL' in the output buffer, we ensure that
every node is traversed exactly once.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the current testing setup for tree_search(), the case for
non-existent key is not exercised. Improve this by adding a
test-case for the same.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the current testing setup, tests for both tree_search() and
infix_walk() defined by reftable/tree.{c, h} are performed by
a single test function, test_tree(). Split tree_test() into
test_tree_search() and test_infix_walk() responsible for
independently testing tree_search() and infix_walk() respectively.
This improves the overall readability of the test file as well as
simplifies debugging.
Note that the last parameter in the tree_search() functiom is
'int insert' which when set, inserts the key if it is not found
in the tree. Otherwise, the function returns NULL for such cases.
While at it, use 'func' to pass function pointers and not '&func'.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/tree_test.c exercises the functions defined in
reftable/tree.{c, h}. Migrate reftable/tree_test.c to the unit
testing framework. Migration involves refactoring the tests to use
the unit testing framework instead of reftable's test framework and
renaming the tests to align with unit-tests' standards.
Also add a comment to help understand the test routine.
Note that this commit mostly moves the test from reftable/ to
t/unit-tests/ and most of the refactoring is performed by the
trailing commits.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to Documentation/CodingGuidelines, single-line control-flow
statements must omit curly braces (except for some special cases).
Make reftable/tree.c adhere to this guideline.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible that stat information of tracked files is modified without
actually modifying the content. Plumbing commands would detect such files
as modified, so that Git GUI runs `git update-info --refresh` in order to
synchronize the cached stat info with the reality. However, this can be
an expensive operation in large repositories. As remediation,
e534f3a886 (git-gui: Allow the user to disable update-index --refresh
during rescan, 2006-11-07) introduced an option to skip the expensive
part.
The option was named "trust file modification timestamp". But the catch
is that sometimes file timestamps can't be trusted. In this case, a file
would remain listed in Unstaged Changes although there are no changes.
So 16403d0b1f (git-gui: Refresh a file if it has an empty diff,
2006-11-11) introduced a popup message informing the user about the
situation and then removed the file from the Unstaged Changes list.
Now users had to click away the message box for every file that was
stat-dirty. Under the assumption that a file in such a state is not
the only one, 124355d32c (git-gui: Always start a rescan on an empty
diff, 2007-01-22) introduced a forced (potentially expensive) refresh
that would de-list all stat-dirty files after the first notification was
dismissed.
Along came 6c510bee20 (Lazy man's auto-CRLF, 2007-02-13) in Git. It
introduced a new case where a file in the worktree can have no essential
differences to the staged version, but still be detected as modified by
plumbing commands. This time, however, the index cannot be synchronized
fully by `git update-index --refresh`, so that the file remains listed
in Unstaged Changes until it is staged manually.
Needless to say that the message box now becomes an annoyance, because
it must be dismissed every time an affected file is selected, and the
file remains listed nevertheless.
Remove the message box. Write the notice that no differences were found
in the diff panel instead. Also include a link that, when clicked,
initiates the rescan. With this scheme, the rescan does not happen
automatically anymore, but requires an additional click. (This is now
two clicks in total for users who encounter stat-dirty files after
enabling the "trust file modification timestamps" option.) However,
users whom the rescan does not help (autocrlf-related dirty files) save
half the clicks because there is no message box to dismiss.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Supporting many platforms is only possible when we have the right tools to
ensure that support.
Teach platform maintainers how they can help us to help them, by
explaining what kind of tooling support we would like to have, and what
level of support becomes available as a result. Provide examples so that
platform maintainers can see what we're asking for in practice.
With this policy in place, we can make changes with stronger assurance
that we are not breaking anybody we promised not to. Instead, we can
feel confident that our existing testing and integration practices
protect those who care from breakage.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c8f815c208 (refs: remove functions without ref store, 2024-05-07), we
have removed functions of the refs subsystem that do not take a ref
store as input parameter. In order to make it easier for folks to figure
out how to replace calls to such functions in in-flight patch series, we
kept their definitions around in an ifdeffed block.
Now that Git v2.46 is out, it is rather unlikely that anybody still has
references to these old functions in their unreleased patches. Let's
thus drop them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for `http.proxy` describes that option, and the
environment variables it overrides, as supporting "the syntax understood
by curl". curl allows SOCKS proxies to use a path to a Unix domain
socket, like `socks5h://localhost/path/to/socket.sock`. Git should
therefore include, if present, the path part of the proxy URL in what it
passes to libcurl.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have stopped setting the default hash algorithm for
`the_repository`. Consequently, code that relies on `the_hash_algo` will
now crash when it hasn't explicitly been initialized, which may be the
case when running outside of a Git repository.
It was reported that git-ls-remote(1) may crash in such a way when using
a remote helper that advertises refspecs. This is because the refspec
announced by the helper will get parsed during capability negotiation.
At that point we haven't yet figured out what object format the remote
uses though, so when run outside of a repository then we will fail.
The course of action is somewhat dubious in the first place. Ideally, we
should only parse object IDs once we have asked the remote helper for
the object format. And if the helper didn't announce the "object-format"
capability, then we should always assume SHA256. But instead, we used to
take either SHA1 if there was no repository, or we used the hash of the
local repository, which is wrong.
Arguably though, crashing hard may not be in the best interest of our
users, either. So while the old behaviour was buggy, let's restore it
for now as a short-term fix. We should eventually revisit, potentially
by deferring the point in time when we parse the refspec until after we
have figured out the remote's object hash.
Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The actual file is copied out to /tmp, presumably so that the tester
can inspect it after the test is done, which may have been a useful
debugging aid.
But in the final shape of the test suite, such a code should not
exist. We cannot even assume that we are allowed to write into /tmp
(our TMPDIR may not even be pointing at it) or read from it for that
matter.
Noticed-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using `--fixed-value` with a key whose value is left empty (implied
as being "true"), 'git config' may crash when invoked like either of:
$ git config set --file=config --value=value --fixed-value \
section.key pattern
$ git config --file=config --fixed-value section.key value pattern
The original bugreport[1] bisects to 00bbdde141 (builtin/config:
introduce "set" subcommand, 2024-05-06), which is a red-herring, since
the original bugreport uses the new 'git config set' invocation.
The behavior likely bisects back to c90702a1f6 (config: plumb
--fixed-value into config API, 2020-11-25), which introduces the new
--fixed-value option in the first place.
Looking at the relevant frame from a failed process's coredump, the
crash appears in config.c::matches() like so:
(gdb) up
#1 0x000055b3e8b06022 in matches (key=0x55b3ea894360 "section.key", value=0x0,
store=0x7ffe99076eb0) at config.c:2884
2884 return !strcmp(store->fixed_value, value);
where we are trying to compare the `--fixed-value` argument to `value`,
which is NULL.
Avoid attempting to match `--fixed-value` for configuration keys with no
explicit value. A future patch could consider the empty value to mean
"true", "yes", "on", etc. when invoked with `--type=bool`, but let's
punt on that for now in the name of avoiding the segfault.
[1]: https://lore.kernel.org/git/CANrWfmTek1xErBLrnoyhHN+gWU+rw14y6SQ+abZyzGoaBjmiKA@mail.gmail.com/
Reported-by: Han Jiang <jhcarl0814@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reviewing guidelines document now explicitly encourages people
to give positive reviews and how.
* jc/doc-reviewing-guidelines-positive-reviews:
ReviewingGuidelines: encourage positive reviews more
"git rebase --help" referred to "offset" (the difference between
the location a change was taken from and the change gets replaced)
incorrectly and called it "fuzz", which has been corrected.
* jc/doc-rebase-fuzz-vs-offset-fix:
doc: difference in location to apply is "offset", not "fuzz"
merged_iter_pqueue_top() as defined by reftable/pq.{c, h} returns
the element at the top of a priority-queue's heap without removing
it. Since there are no tests for this function in the existing
setup, add tests for the same.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>