Code clean-up.
* js/comma-semicolon-confusion:
detect-compiler: detect clang even if it found CUDA
clang: warn when the comma operator is used
compat/regex: explicitly mark intentional use of the comma operator
wildmatch: avoid using of the comma operator
diff-delta: avoid using the comma operator
xdiff: avoid using the comma operator unnecessarily
clar: avoid using the comma operator unnecessarily
kwset: avoid using the comma operator unnecessarily
rebase: avoid using the comma operator unnecessarily
remote-curl: avoid using the comma operator unnecessarily
"git clone" still gave the message about the default branch name;
this message has been turned into an advice message that can be
turned off.
* jt/clone-guess-remote-head-fix:
advice: allow disabling default branch name advice
builtin/clone: suppress unexpected default branch advice
remote: allow `guess_remote_head()` to suppress advice
The job to coalesce loose objects into packfiles in "git
maintenance" now has configurable batch size.
* ds/maintenance-loose-objects-batchsize:
maintenance: add loose-objects.batchSize config
maintenance: force progress/no-quiet to children
"git reflog" learns "drop" subcommand, that discards the entire
reflog data for a ref.
* kn/reflog-drop:
reflog: implement subcommand to drop reflogs
reflog: improve error for when reflog is not found
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
Incrementally updating multi-pack index files.
* tb/incremental-midx-part-2:
midx: implement writing incremental MIDX bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
pack-bitmap.c: keep track of each layer's type bitmaps
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: open and store incremental bitmap layers
pack-revindex: prepare for incremental MIDX bitmaps
Documentation: describe incremental MIDX bitmaps
Documentation: remove a "future work" item from the MIDX docs
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.
* jh/hash-init-fixes:
index-pack, unpack-objects: restore missing ->init_fn
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.
* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
In 199f44cb2e (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.
A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.
Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.
Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.
Add a new config option that allows this limit to be adjusted or ignored
entirely.
While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.
The pass through of --no-quiet will be useful in a test in the next
change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For similar reasons as in the previous refactoring of `refspec_init()`
into `refspec_init_fetch()` and `refspec_init_push()`, apply the same
refactoring to `refspec_item_init()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two callers of this function, which ensures that a dispatched
call to refspec_item_init() does not fail.
In the following commit, we're going to add fetch/push-specific variants
of refspec_item_init(), which will turn one function into two. To avoid
introducing yet another pair of new functions (such as
refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()),
let's remove the thin wrapper entirely.
This duplicates a single line of code among two callers, but thins the
refspec.h API by one function, and prevents introducing two more in the
following commit.
Note that we still have a trailing Boolean argument in the function
`refspec_item_init()`. The following commit will address this.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 6d4c057859 (refspec: introduce struct refspec, 2018-05-16), we
have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly
suggests that we might introduce other modes in the future, which, while
possible, is highly unlikely.
But these values are treated as a Boolean, and stored in a struct field
called 'fetch'. So the following:
if (refspec->fetch == REFSPEC_FETCH) { ... }
, and
if (refspec->fetch) { ... }
are equivalent. Let's avoid renaming the Boolean values "true" and
"false" here and remove the two REFSPEC_ macros mentioned above.
Since this value is truly a Boolean and will only ever take on a value
of 0 or 1, we can declare it as a single bit unsigned field. In
practice this won't shrink the size of 'struct refspec', but it more
clearly indicates the intent.
Note that this introduces some awkwardness like:
refspec_item_init_or_die(&spec, refspec, 1);
, where it's unclear what the final "1" does. This will be addressed in
the following commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers")
accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1.
This fixes the following error on fetch:
fatal: fetch-pack: invalid index-pack output
Signed-off-by: Jensen Huang <hmz007@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t4151: This test left an untracked file in the way of the merge.
merge-recursive could only sometimes tell when untracked files were
in the way, and by the time it discovers others, it has already made
too many changes to back out of the merge. So, instead of writing the
results to e.g. 'file1' it would instead write them to
'file1~branch1'. This is confusing for users, because they might not
notice 'file1~branch1' and accidentally add and commit 'file1'.
In contrast, merge-ort correctly notices the file in the way before
making any changes and aborts. Since this test didn't care about the
file in the way, just remove it before calling git-am.
* t4255: Usage of merge-ort allows us to change two known failures into
successes.
* t6427: As noted a few commits ago, the choice of conflict label for
diff3 markers for the ancestor commit was previously handled by
merge-recursive.c rather than by callers. Since that has now changed,
`git am` needs to specify that label. Although the previous conflict
label ("constructed merge base") was already fairly somewhat slanted
towards `git am`, let's use wording more along the lines of the
related command-line flag from `git apply` and function involved to
tie it more closely to `git am`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git-reflog(1)' currently allows users to expire reflogs and
delete individual entries, it lacks functionality to completely remove
reflogs for specific references. This becomes problematic in
repositories where reflogs are not needed but continue to accumulate
entries despite setting 'core.logAllRefUpdates=false'.
Add a new 'drop' subcommand to git-reflog that allows users to delete
the entire reflog for a specified reference. Include an '--all' flag to
enable dropping all reflogs from all worktrees and an addon flag
'--single-worktree', to only drop all reflogs from the current worktree.
While here, remove an extraneous newline in the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git reflog expire' prints the error message '<ref> points nowhere!'
when used with a non-existent ref. This message is a bit confusing and
vague. Modify the message to be more clear and direct.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
stash was modified to use merge_ort_nonrecursive() instead of
merge_recursive_generic() back in commit 874cf2a604 (stash: apply
stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the
inclusion of merge-recursive.h unnecessary. In preparation for the
removal of merge-recursive.h, remove the unnecessary include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.
Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).
But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.
Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.
When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.
1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
over the packs we're going to retain (which are marked as kept
in-core by 'read_cruft_objects()').
This means that we will avoid enumerating additional packed copies
of objects found in any cruft packs which are larger than the given
size threshold. Thus there is no opportunity to call
'create_object_entry()' whatsoever.
2. We likewise will discard the loose copy (if one exists) of any
unreachable object packed in a cruft pack that is larger than the
threshold. Here our call path is 'add_unreachable_loose_objects()',
which uses the 'add_loose_object()' callback.
That function will eventually land us in 'want_object_in_pack()'
(via 'add_cruft_object_entry()'), and we'll discard the object as it
appears in one of the packs which we marked as kept in-core.
This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.
Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.
In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.
But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:
- exists in a non-cruft pack that we are retaining, regardless of that
pack's mtime, or
- exists in a cruft pack with an mtime at least as recent as the copy
we are debating whether or not to pack, in which case freshening
would be redundant.
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.
One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:
- The manpage is explicitly structured around object IDs. So if we see
a fully blown object ID, the intent should be quite clear in
general.
- The command is part of our plumbing layer and not a tool that users
would generally use in interactive workflows. As such, the warning
will likely not be visible to anybody in the first place.
- Users can and should use the fully-qualified refname in case there
is any potential for ambiguity. And given that this command is part
of our plumbing layer, one should always try to be as defensive as
possible and use fully-qualified refnames.
Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.
The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36 (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).
Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.
The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 467.3 ms ± 5.1 ms [User: 100.0 ms, System: 365.1 ms]
Range (min … max): 461.9 ms … 479.3 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 394.1 ms ± 5.8 ms [User: 63.3 ms, System: 327.6 ms]
Range (min … max): 384.9 ms … 405.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
And with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 146.9 ms ± 2.2 ms [User: 90.4 ms, System: 56.0 ms]
Range (min … max): 142.7 ms … 150.8 ms 19 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 63.2 ms ± 1.1 ms [User: 41.0 ms, System: 21.8 ms]
Range (min … max): 61.1 ms … 66.6 ms 41 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of Git 3.0, remove the hidden synonym for "--annotate-stdin"
for real. As this does not change the fact that it used to be
called "--stdin" in older version of Git, keep that passage in the
documentation for "--annotate-stdin".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export has a --signed-tags= option that controls how to handle tag
signatures. However, there is no equivalent for commit signatures; it
just silently strips the signature out of the commit (analogously to
--signed-tags=strip).
While signatures are generally problematic for fast-export/fast-import
(because hashes are likely to change), if they're going to support tag
signatures, there's no reason to not also support commit signatures.
So, implement a --signed-commits= option that mirrors the --signed-tags=
option.
On the fast-export side, try to be as much like signed-tags as possible,
in both implementation and in user-interface. This will change the
default behavior to '--signed-commits=abort' from what is now
'--signed-commits=strip'. In order to provide an escape hatch for users
of third-party tools that call fast-export and do not yet know of the
--signed-commits= option, add an environment variable
'FAST_EXPORT_SIGNED_COMMITS_NOABORT=1' that changes the default to
'--signed-commits=warn-strip'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export's helper function find_encoding() takes a `const char *`, but
modifies that memory despite the `const`. Ultimately, this memory came
from get_commit_buffer(), and you're not supposed to modify the memory
that you get from get_commit_buffer().
So, get rid of find_encoding() in favor of commit.h:find_commit_header(),
which gives back a string length, rather than mutating the memory to
insert a '\0' terminator.
Because find_commit_header() detects the "\n\n" string that separates the
headers and the commit message, move the call to be above the
`message = strstr(..., "\n\n")` call. This helps readability, and allows
for the value of `encoding` to be used for a better value of "..." so that
the same memory doesn't need to be checked twice. Introduce a
`commit_buffer_cursor` variable to avoid writing an awkward
`encoding ? encoding + encoding_len : committer_end` expression.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --signed-tags= option takes one of five arguments specifying how to
handle signed tags during export. Among these arguments, 'strip' is to
'warn-strip' as 'verbatim' is to 'warn' (the unmentioned argument is
'abort', which stops the fast-export process entirely). That is,
signatures are either stripped or copied verbatim while exporting, with
or without a warning.
Match the pattern and rename 'warn' to 'warn-verbatim' to make it clear
that it instructs fast-export to copy signatures verbatim.
To maintain backwards compatibility, 'warn' is still recognized as
deprecated synonym of 'warn-verbatim'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Documentation/CodingGuidelines" says that there should be whitespaces
around operators like 'if', 'switch', 'for', etc.
Let's fix this in "builtin/fast-export.c".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "delta-islands.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`.
Refactor the code to stop using `the_repository`. In most cases this is
trivial because we already had a repository available in the calling
context, with the only exception being `propagate_island_marks()`. Adapt
it so that the repository gets passed in via a parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "object-file-convert.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`. All of these callsites are transitively called
from `convert_object_file()`, which indeed has no repo as input.
Refactor the function so that it receives a repository as a parameter
and pass it through to all internal functions to get rid of the
dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.
Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.
Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions in "pack-write.c" that implicitly depend
on `the_repository` or `the_hash_algo`. Remove this dependency by
injecting the repository via a parameter and adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>