More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
The commit object parser has been taught to be a bit more lenient
to parse timestamps on the author/committer line with a malformed
author/committer ident.
* jk/parse-commit-with-malformed-ident:
parse_commit(): describe more date-parsing failure modes
parse_commit(): handle broken whitespace-only timestamp
parse_commit(): parse timestamp from end of line
t4212: avoid putting git on left-hand side of pipe
Remove full index requirement for `git diff-files`. Refactor the
ensure_expanded and ensure_not_expanded functions by introducing a
common helper function, ensure_index_state. Add test to ensure the index
is no expanded in `git diff-files`.
The `p2000` tests demonstrate a ~96% execution time reduction for 'git
diff-files' and a ~97% execution time reduction for 'git diff-files'
for a file using a sparse index:
Test before after
-----------------------------------------------------------------------
2000.94: git diff-files (full-v3) 0.09 0.08 -11.1%
2000.95: git diff-files (full-v4) 0.09 0.09 +0.0%
2000.96: git diff-files (sparse-v3) 0.52 0.02 -96.2%
2000.97: git diff-files (sparse-v4) 0.51 0.02 -96.1%
2000.98: git diff-files -- f2/f4/a (full-v3) 0.06 0.07 +16.7%
2000.99: git diff-files -- f2/f4/a (full-v4) 0.08 0.08 +0.0%
2000.100: git diff-files -- f2/f4/a (sparse-v3) 0.46 0.01 -97.8%
2000.101: git diff-files -- f2/f4/a (sparse-v4) 0.51 0.02 -96.1%
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before integrating the 'git diff-files' builtin with the sparse index
feature, add tests to t1092-sparse-checkout-compatibility.sh to ensure
it currently works with sparse-checkout and will still work with sparse
index after that integration.
When adding tests against a sparse-checkout definition, we test two
modes: all changes are within the sparse-checkout cone and some changes
are outside the sparse-checkout cone.
In order to have staged changes outside of the sparse-checkout cone,
make a directory called 'folder1' and copy `a` into 'folder1/a'.
'folder1/a' is identical to `a` in the base commit. These make
'folder1/a' in the index, while leaving it outside of the
sparse-checkout definition. Change content inside 'folder1/a' in order
to test 'folder1/a' being present on-disk with modifications.
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a small helper function called "verbose", with the idea that you
can write:
verbose foo
to get a message to stderr when the "foo" command fails, even if it does
not produce any output itself. This goes back to 8ad1652418 (t5304: use
helper to report failure of "test foo = bar", 2014-10-10). It does work,
but overall it has not been a big success for two reasons:
1. Test writers have to remember to put it there (and the resulting
test code is longer as a result).
2. It doesn't handle the opposite case (we expect "foo" to fail, but
it succeeds), leading to inconsistencies in tests (which you can
see in many hunks of this patch, e.g. ones involving "has_cr").
Most importantly, we added a136f6d8ff (test-lib.sh: support -x option
for shell-tracing, 2014-10-10) at the same time, and it does roughly the
same thing. The output is not quite as succinct as "verbose", and you
have to watch out for stray shell-traces ending up in stderr. But it
solves both of the problems above, and has clearly become the preferred
tool.
Let's consider the "verbose" function a failed experiment and remove the
last few callers (which are all many years old, and have been dwindling
as we remove them from scripts we touch for other reasons). It will be
one less thing for new test writers to see and wonder if they should be
using themselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since ls-files recently learned a "--format" option, we can use that
rather than asking for all of "--stage" and then pulling out the bits we
want with "cut". That's simpler and avoids two extra processes (one for
cut, and one for the subshell to hold the intermediate result).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid git on the left-hand side of a pipe, because it loses
the exit code of the command (and thus we'd miss things like segfaults
or unexpected failures). In the cases in t7001, we wouldn't expect
failures (they are just inspecting the repository state, and are not the
main point of the test), but it doesn't hurt to be careful.
In all but one case here we're piping "ls-files --stage" to cut off the
pathname (since we compare entries before and after moving). Let's pull
that into a helper function to avoid repeating the slightly awkward
replacement.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'on all' in the title of the test 'write-tree on all' was unclear;
remove it.
* Add a baseline 'test_all_match git write-tree' before making any
changes to the index, providing a reference point for the 'write-tree'
prior to any modifications.
* Add a comparison of the output of 'git status --porcelain=v2' to test
the working tree after 'write-tree' exits.
* Ensure SKIP_WORKTREE files weren't materialized on disk by using
"test_path_is_missing".
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reachability bitmap coverage exists in a repository, Git will use a
different (and hopefully faster) traversal to compute revision walks.
Consider a set of positive and negative tips (which we'll refer to with
their standard bitmap parlance by "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
in `prepare_bitmap_walk()` does something like:
1. Consider if we can even compute the set of objects with bitmaps,
and fall back to the usual traversal if we cannot. For example,
pathspec limiting traversals can't be computed using bitmaps (since
they don't know which objects are at which paths). The same is true
of certain kinds of non-trivial object filters.
2. If we can compute the traversal with bitmaps, partition the
(dereferenced) tips into two object lists, "haves", and "wants",
based on whether or not the objects have the UNINTERESTING flag,
respectively.
3. Fall back to the ordinary object traversal if either (a) there are
more than zero haves, none of which are in the bitmapped pack or
MIDX, or (b) there are no wants.
4. Construct a reachability bitmap for the "haves" side by walking
from the revision tips down to any existing bitmaps, OR-ing in any
bitmaps as they are found.
5. Then do the same for the "wants" side, stopping at any objects that
appear in the "haves" bitmap.
6. Filter the results if any object filter (that can be easily
computed with bitmaps alone) was given, and then return back to the
caller.
When there is good bitmap coverage relative to the traversal tips, this
walk is often significantly faster than an ordinary object traversal
because it can visit far fewer objects.
But in certain cases, it can be significantly *slower* than the usual
object traversal. Why? Because we need to compute complete bitmaps on
either side of the walk. If either one (or both) of the sides require
walking many (or all!) objects before they get to an existing bitmap,
the extra bitmap machinery is mostly or all overhead.
One of the benefits, however, is that even if the walk is slower, bitmap
traversals are guaranteed to provide an *exact* answer. Unlike the
traditional object traversal algorithm, which can over-count the results
by not opening trees for older commits, the bitmap walk builds an exact
reachability bitmap for either side, meaning the results are never
over-counted.
But producing non-exact results is OK for our traversal here (both in
the bitmap case and not), as long as the results are over-counted, not
under.
Relaxing the bitmap traversal to allow it to produce over-counted
results gives us the opportunity to make some significant improvements.
Instead of the above, the new algorithm only has to walk from the
*boundary* down to the nearest bitmap, instead of from each of the
UNINTERESTING tips.
The boundary-based approach still has degenerate cases, but we'll show
in a moment that it is often a significant improvement.
The new algorithm works as follows:
1. Build a (partial) bitmap of the haves side by first OR-ing any
bitmap(s) that already exist for UNINTERESTING commits between the
haves and the boundary.
2. For each commit along the boundary, add it as a fill-in traversal
tip (where the traversal terminates once an existing bitmap is
found), and perform fill-in traversal.
3. Build up a complete bitmap of the wants side as usual, stopping any
time we intersect the (partial) haves side.
4. Return the results.
And is more-or-less equivalent to using the *old* algorithm with this
invocation:
$ git rev-list --objects --use-bitmap-index $WANTS --not \
$(git rev-list --objects --boundary $WANTS --not $HAVES |
perl -lne 'print $1 if /^-(.*)/')
The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
shorter than the distance from (all of) the have tips to the nearest
bitmapped commit.
Note that when using the old bitmap traversal algorithm, the results can
be *slower* than without bitmaps! Under the new algorithm, the result is
computed faster with bitmaps than without (at the cost of over-counting
the true number of objects in a similar fashion as the non-bitmap
traversal):
# (Computing the number of tagged objects not on any branches
# without bitmaps).
$ time git rev-list --count --objects --tags --not --branches
20
real 0m1.388s
user 0m1.092s
sys 0m0.296s
# (Computing the same query using the old bitmap traversal).
$ time git rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m22.709s
user 0m21.628s
sys 0m1.076s
# (this commit)
$ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m1.518s
user 0m1.234s
sys 0m0.284s
The new algorithm is still slower than not using bitmaps at all, but it
is nearly a 15-fold improvement over the existing traversal.
In a more realistic setting (using my local copy of git.git), I can
observe a similar (if more modest) speed-up:
$ argv="--count --objects --branches --not --tags"
hyperfine \
-n 'no bitmaps' "git.compile rev-list $argv" \
-n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \
-n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv"
Benchmark 1: no bitmaps
Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms]
Range (min … max): 122.6 ms … 133.1 ms 22 runs
Benchmark 2: existing traversal
Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms]
Range (min … max): 365.1 ms … 374.8 ms 10 runs
Benchmark 3: boundary traversal
Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms]
Range (min … max): 166.1 ms … 169.2 ms 17 runs
Summary
'no bitmaps' ran
1.34 ± 0.02 times faster than 'boundary traversal'
2.96 ± 0.05 times faster than 'existing traversal'
Here, the new algorithm is also still slower than not using bitmaps, but
represents a more than 2-fold improvement over the existing traversal in
a more modest example.
Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
the new algorithm's result more competitive. A few other future
directions for improving bitmap traversal times beyond not using bitmaps
at all:
- Decrease the cost to decompress and OR together many bitmaps
together (particularly when enumerating the uninteresting side of
the walk). Here we could explore more efficient bitmap storage
techniques, like Roaring+Run and/or use SIMD instructions to speed
up ORing them together.
- Store pseudo-merge bitmaps, which could allow us to OR together
fewer "summary" bitmaps (which would also help with the above).
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The '--all' option of git-push built-in cmd support to push all branches
(refs under refs/heads) to remote. Under the usage, a user can easlily
work in some scenarios, for example, branches synchronization and batch
upload.
The '--all' was introduced for a long time, meanwhile, git supports to
customize the storage location under "refs/". when a new git user see
the usage like, 'git push origin --all', we might feel like we're
pushing _all_ the refs instead of just branches without looking at the
documents until we found the related description of it or '--mirror'.
To ensure compatibility, we cannot rename '--all' to another name
directly, one way is, we can try to add a new option '--heads' which be
identical with the functionality of '--all' to let the user understand
the meaning of representation more clearly. Actually, We've more or less
named options this way already, for example, in 'git-show-ref' and 'git
ls-remote'.
At the same time, we fix a related issue about the wrong help
information of '--all' option in code and add some test cases in
t5523, t5543 and t5583.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command. Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.
Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.
Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is $(git show -s --raw --pretty=format:%at HEAD) in this test
that is meant to grab the author time of the commit. We used to
have a bug in the command line option parser of the diff family of
commands, where "show -s --raw" was identical to "show -s".
With the "-s" bug fixed, "show -s --raw" would mean the same thing
as "show --raw", i.e. show the output from the diff machinery in the
"raw" format. And this test will start failing, so fix it before
that happens.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov noticed and reported "--patch --no-patch --raw"
behaves differently from just "--raw". It turns out that there are
a few interesting bugs in the implementation and documentation.
* First, the documentation for "--no-patch" was unclear that it
could be read to mean "--no-patch" countermands an earlier
"--patch" but not other things. The intention of "--no-patch"
ever since it was introduced at d09cd15d (diff: allow --no-patch
as synonym for -s, 2013-07-16) was to serve as a synonym for
"-s", so "--raw --patch --no-patch" should have produced no
output, but it can be (mis)read to allow showing only "--raw"
output.
* Then the interaction between "-s" and other format options were
poorly implemented. Modern versions of Git uses one bit each to
represent formatting options like "--patch", "--stat" in a single
output_format word, but for historical reasons, "-s" also is
represented as another bit in the same word. This allows two
interesting bugs to happen, and we have both X-<.
(1) After setting a format bit, then setting NO_OUTPUT with "-s",
the code to process another "--<format>" option drops the
NO_OUTPUT bit to allow output to be shown again. However,
the code to handle "-s" only set NO_OUTPUT without unsetting
format bits set earlier, so the earlier format bit got
revealed upon seeing the second "--<format>" option. This is
the problem Sergey observed.
(2) After setting NO_OUTPUT with "-s", code to process
"--<format>" option can forget to unset NO_OUTPUT, leaving
the command still silent.
It is tempting to change the meaning of "--no-patch" to mean
"disable only the patch format output" and reimplement "-s" as "not
showing anything", but it would be an end-user visible change in
behavior. Let's fix the interactions of these bits to first make
"-s" work as intended.
The fix is conceptually very simple.
* Whenever we set DIFF_FORMAT_FOO because we saw the "--foo"
option (e.g. DIFF_FORMAT_RAW is set when the "--raw" option is
given), we make sure we drop DIFF_FORMAT_NO_OUTPUT. We forgot to
do so in some of the options and caused (2) above.
* When processing "-s" option, we should not just set
DIFF_FORMAT_NO_OUTPUT bit, but clear other DIFF_FORMAT_* bits.
We didn't do so and retained format bits set by options
previously seen, causing (1) above.
It is even more tempting to lose NO_OUTPUT bit and instead take
output_format word being 0 as its replacement, but that would break
the mechanism "git show" uses to default to "--patch" output, where
the distinction between telling the command to be silent with "-s"
and having no output format specified on the command line matters,
and an explicit output format given on the command line should not
be "combined" with the default "--patch" format.
So, while we cannot lose the NO_OUTPUT bit, as a follow-up work, we
may want to replace it with OPTION_GIVEN bit, and
* make "--patch", "--raw", etc. set DIFF_FORMAT_$format bit and
DIFF_FORMAT_OPTION_GIVEN bit on for each format. "--no-raw",
etc. will set off DIFF_FORMAT_$format bit but still record the
fact that we saw an option from the command line by setting
DIFF_FORMAT_OPTION_GIVEN bit.
* make "-s" (and its synonym "--no-patch") clear all other bits
and set only the DIFF_FORMAT_OPTION_GIVEN bit on.
which I suspect would make the code much cleaner without breaking
any end-user expectations.
Once that is in place, transitioning "--no-patch" to mean the
counterpart of "--patch", just like "--no-raw" only defeats an
earlier "--raw", would be quite simple at the code level. The
social cost of migrating the end-user expectations might be too
great for it to be worth, but at least the "GIVEN" bit clean-up
alone may be worth it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The array of dirstat_file contained in the dirstat_dir structure is
not freed after the processing ends. Unfortunately, the member that
points at the array, .files, is incremented as the gather_dirstat()
function recursively walks it, and this needs to be plugged by
remembering the beginning of the array before gather_dirstat() mucks
with it and freeing it after we are done.
We can mark t4047 as leak-free. t4000, which is marked as
leak-free, now can exercise dirstat in it, which will happen next.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests in t2019-checkout-ambiguous-ref.sh redirect two invocations of
"git checkout" to files "stdout" and "stderr". Several assertions are
made using file "stderr". File "stdout", however, is unused.
Don't redirect standard output of "git checkout" to file "stdout" in
t2019-checkout-ambiguous-ref.sh to avoid creating unnecessary files.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three tests in file t1502-rev-parse-parseopt.sh use three redirections
with invocation of "git rev-parse --parseopt --". All three tests
redirect standard output to file "out" and file "spec" to standard
input. Two of the tests redirect standard output a second time to file
"actual", and the third test redirects standard error to file "err".
These tests check contents of files "actual" and "err", but don't use
the files named "out" for assertions. The two tests that redirect to
standard output twice might also be confusing to the reader.
Don't redirect standard output of "git rev-parse" to file "out" in
t1502-rev-parse-parseopt.sh to avoid creating unnecessary files.
Acked-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test 'fsck error and recovery on invalid object type' in file
t1450-fsck.sh redirects output of a failing "git fsck" invocation to
files "out" and "err" to assert presence of error messages in the output
of the command. Commit 31deb28f5e (fsck: don't hard die on invalid
object types, 2021-10-01) changed the way assertions in this test are
performed. The test doesn't compare the whole standard error with
prepared file "err.expect" and it doesn't assert that standard output is
empty.
Don't create unused files "err.expect" and "out" in test 'fsck error and
recovery on invalid object type'.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three tests in t1300-config.sh check that "git config --get" barfs when
syntax errors are present in the config file. The tests redirect
standard output and standard error of "git config --get" to files,
"actual" and "error" correspondingly. They assert presence of an error
message in file "error". However, these tests don't use file "actual"
for assertions.
Don't redirect standard output of "git config --get" to file "actual" in
t1300-config.sh to avoid creating unnecessary files.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three tests in t1300-config.sh check that "git config --get" barfs when
the config file contains various syntax errors: key=value pair without
equals sign, broken section line, and broken value string. The sample
config files include a comment describing the kind of broken syntax.
This description seems to have been copy-pasted from the "broken section
line" sample to the other two samples.
Fix descriptions of broken config file syntax in samples used in
t1300-config.sh.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test 'credential config with partial URLs' in t0300-credentials.sh
contains three "git credential fill" invocations. For two of the
invocations, the test asserts presence or absence of string "yep" in the
standard output. For the third test it checks for an error message in
standard error.
Don't redirect standard output of "git credential" to file "stdout" in
t0300-credentials.sh to avoid creating an unnecessary file when only
standard error is checked.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark strtok() and strtok_r() to be banned.
* tb/ban-strtok:
banned.h: mark `strtok()` and `strtok_r()` as banned
t/helper/test-json-writer.c: avoid using `strtok()`
t/helper/test-oidmap.c: avoid using `strtok()`
t/helper/test-hashmap.c: avoid using `strtok()`
string-list: introduce `string_list_setlen()`
string-list: multi-delimiter `string_list_split_in_place()`
The output given by "git blame" that attributes a line to contents
taken from the file specified by the "--contents" option shows it
differently from a line attributed to the working tree file.
* jk/blame-fake-commit-label:
blame: use different author name for fake commit generated by --contents
The character classifiers are supposed to allow passing EOF to them, a
negative value. It isn't part of any character class. Extend the tests
to cover that.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a filesystem-level corruption occurs in a .bitmap file, Git can react
poorly. This could take the form of a run-time error due to failing to
parse an EWAH bitmap or be more subtle such as returning the wrong set
of objects to a fetch or clone.
A natural first response to either of these kinds of errors is to run
'git fsck' to see if any files are corrupt. This currently ignores all
.bitmap files.
Add checks to 'git fsck' for all .bitmap files that are currently
associated with a multi-pack-index or pack file. Verify their checksums
using the hashfile API.
We iterate through all multi-pack-indexes and pack-files to be sure to
check all .bitmap files, not just the one that would be read by the
process. For example, a multi-pack-index bitmap overrules a pack-bitmap.
However, if the multi-pack-index is removed, the pack-bitmap may be
selected instead. Be thorough to include every file that could become
active in such a way. This includes checking files in alternates.
There is potential that we could extend this effort to check the
structure of the reachability bitmaps themselves, but it is very
expensive to do so. At minimum, it's as expensive as generating the
bitmaps in the first place, and that's assuming that we don't use the
trivial algorithm of verifying each bitmap individually. The trivial
algorithm will result in quadratic behavior (number of objects times
number of bitmapped commits) while the bitmap building operation
constructs a lattice of commits to build bitmaps incrementally and then
generate the final bitmaps from a subset of those commits.
If we were to extend 'git fsck' to check .bitmap file contents more
closely like this, then we would likely want to hide it behind an option
that signals the user is more willing to do expensive operations such as
this.
For testing, set up a repository with a pack-bitmap _and_ a
multi-pack-index bitmap. This requires some file movement to avoid
deleting the pack-bitmap during the repack that creates the
multi-pack-index bitmap. We can then verify that 'git fsck' is checking
all files, not just the "active" bitmap.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
safe.bareRepository=explicit is a safer default mode of operation, since
it guards against the embedded bare repository attack [1]. Most end
users don't use bare repositories directly, so they should be able to
set safe.bareRepository=explicit, with the expectation that they can
reenable bare repositories by specifying GIT_DIR or --git-dir.
However, the user might use a tool that invokes Git on bare repositories
without setting GIT_DIR (e.g. "go mod" will clone bare repositories
[2]), so even if a user wanted to use safe.bareRepository=explicit, it
wouldn't be feasible until their tools learned to set GIT_DIR.
To make this transition easier, add a trace message to note when we
attempt to set up a bare repository without setting GIT_DIR. This allows
users and tool developers to audit which of their tools are problematic
and report/fix the issue. When they are sufficiently confident, they
would switch over to "safe.bareRepository=explicit".
Note that this uses trace2_data_string(), which isn't supported by the
"normal" GIT_TRACE2 target, only _EVENT or _PERF.
[1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/
[2] https://go.dev/ref/mod
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test ensuring that the "wwwauth[]" field cannot be used to
inject malicious data into the credential helper stream.
Many of the credential helpers in contrib/credential read the
newline-delimited protocol stream one line at a time by repeatedly
calling fgets() into a fixed-size buffer.
This assumes that each line is no more than 1024 characters long, since
each iteration of the loop assumes that it is parsing starting at the
beginning of a new line in the stream. However, similar to a5bb10fd5e
(config: avoid fixed-sized buffer when renaming/deleting a section,
2023-04-06), if a line is longer than 1024 characters, a malicious actor
can embed another command within an existing line, bypassing the usual
checks introduced in 9a6bbee800 (credential: avoid writing values with
newlines, 2020-03-11).
As with the problem fixed in that commit, specially crafted input can
cause the helper to return the credential for the wrong host, letting an
attacker trick the victim into sending credentials for one host to
another.
Luckily, all parts of the credential helper protocol that are available
in a tagged release of Git are immune to this attack:
- "protocol" is restricted to known values, and is thus immune.
- "host" is immune because curl will reject hostnames that have a '='
character in them, which would be required to carry out this attack.
- "username" is immune, because the buffer characters to fill out the
first `fgets()` call would pollute the `username` field, causing the
credential helper to return nothing (because it would match a
username if present, and the username of the credential to be stolen
is likely not 1024 characters).
- "password" is immune because providing a password instructs
credential helpers to avoid filling credentials in the first place.
- "path" is similar to username; if present, it is not likely to match
any credential the victim is storing. It's also not enabled by
default; the victim would have to set credential.useHTTPPath
explicitly.
However, the new "wwwauth[]" field introduced via 5f2117b24f
(credential: add WWW-Authenticate header to cred requests, 2023-02-27)
can be used to inject data into the credential helper stream. For
example, running:
{
printf 'HTTP/1.1 401\r\n'
printf 'WWW-Authenticate: basic realm='
perl -e 'print "a" x 1024'
printf 'host=victim.com\r\n'
} | nc -Nlp 8080
in one terminal, and then:
git clone http://localhost:8080
in another would result in a line like:
wwwauth[]=basic realm=aaa[...]aaahost=victim.com
being sent to the credential helper. If we tweak that "1024" to align
our output with the helper's buffer size and the rest of the data on the
line, it can cause the helper to see "host=victim.com" on its own line,
allowing motivated attackers to exfiltrate credentials belonging to
"victim.com".
The below test demonstrates these failures and provides us with a test
to ensure that our fix is correct. That said, it has a couple of
shortcomings:
- it's in t0303, since that's the only mechanism we have for testing
random helpers. But that means nobody is going to run it under
normal circumstances.
- to get the attack right, it has to line up the stuffed name with the
buffer size, so we depend on the exact buffer size. I parameterized
it so it could be used to test other helpers, but in practice it's
not likely for anybody to do that.
Still, it's the best we can do, and will help us confirm the presence of
the problem (and our fixes) in the new few patches.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The email format does not allow blank lines in headers; detect such
input and report it as malformed and add a test for it.
Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes, adding a header different than CC or TO is desirable; for
example, when using Debbugs, it is best to use 'X-Debbugs-Cc' headers
to keep people in CC; this is an example use case enabled by the new
'--header-cmd' option.
The header unfolding logic is extracted to a subroutine so that it can
be reused; a test is added for coverage.
Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file started out as a test for picks and reverts with renames, but
has been subsequently populated with all kinds of basic tests, in
accordance with its generic name. Adjust the description to reflect
that.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time. Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.
* tb/enable-cruft-packs-by-default:
repository.h: drop unused `gc_cruft_packs`
builtin/gc.c: make `gc.cruftPacks` enabled by default
t/t9300-fast-import.sh: prepare for `gc --cruft` by default
t/t6500-gc.sh: add additional test cases
t/t6500-gc.sh: refactor cruft pack tests
t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
t/t5304-prune.sh: prepare for `gc --cruft` by default
builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
builtin/repack.c: fix incorrect reference to '-C'
pack-write.c: plug a leak in stage_tmp_packfiles()
The "%GT" placeholder for the "--format" option of "git log" and
friends caused BUG() to trigger on a commit signed with an unknown
key, which has been corrected.
* jk/gpg-trust-level-fix:
gpg-interface: set trust level of missing key to "undefined"
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We wouldn't expect cat-file to fail here, but it's good practice to
avoid putting git on the upstream of a pipe, as we otherwise ignore its
exit code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.
* ps/fix-geom-repack-with-alternates:
repack: disable writing bitmaps when doing a local repack
repack: honor `-l` when calculating pack geometry
t/helper: allow chmtime to print verbosely without modifying mtime
pack-objects: extend test coverage of `--stdin-packs` with alternates
pack-objects: fix error when same packfile is included and excluded
pack-objects: fix error when packing same pack twice
pack-objects: split out `--stdin-packs` tests into separate file
repack: fix generating multi-pack-index with only non-local packs
repack: fix trying to use preferred pack in alternates
midx: fix segfault with no packs and invalid preferred pack
The sendemail-validate validate hook learned to pass the total
number of input files and where in the sequence each invocation is
via environment variables.
* rj/send-email-validate-hook-count-messages:
send-email: export patch counters in validate environment
The code to parse capability list for v0 on-wire protocol fell into
an infinite loop when a capability appears multiple times, which
has been corrected.
* jk/protocol-cap-parse-fix:
v0 protocol: use size_t for capability length/offset
t5512: test "ls-remote --heads --symref" filtering with v0 and v2
t5512: allow any protocol version for filtered symref test
t5512: add v2 support for "ls-remote --symref" test
v0 protocol: fix sha1/sha256 confusion for capabilities^{}
t5512: stop referring to "v1" protocol
v0 protocol: fix infinite loop when parsing multi-valued capabilities
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
Apply similar treatment as in the previous commit to remove usage of
`strtok()` from the "oidmap" test helper.
Each of the different commands that the "json-writer" helper accepts
pops the next space-delimited token from the current line and interprets
it as a string, integer, or double (with the exception of the very first
token, which is the command itself).
To accommodate this, split the line in place by the space character, and
pass the corresponding string_list to each of the specialized `get_s()`,
`get_i()`, and `get_d()` functions.
`get_i()` and `get_d()` are thin wrappers around `get_s()` that convert
their result into the appropriate type by either calling `strtol()` or
`strtod()`, respectively. In `get_s()`, we mark the token as "consumed"
by incrementing the `consumed_nr` counter, indicating how many tokens we
have read up to that point.
Because each of these functions needs the string-list parts, the number
of tokens consumed, and the line number, these three are wrapped up in
to a struct representing the line state.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply similar treatment as in the previous commit to remove usage of
`strtok()` from the "oidmap" test helper.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid using the non-reentrant `strtok()` to separate the parts of each
incoming command. Instead of replacing it with `strtok_r()`, let's
instead use the more friendly pair of `string_list_split_in_place()` and
`string_list_remove_empty_items()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.
Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.
When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.
This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:
string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)
would yield a string list of:
["foo", "", "", "bar", "", "", "baz"]
Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.
To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.
[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the --contents option is used with git blame, and the contents of
the file have lines which can't be annotated by the history being
blamed, the user will see an author of "Not Committed Yet". This is
similar to the way blame handles working tree contents when blaming
without a revision.
This is slightly confusing since this data isn't the working copy and
while it is technically "not committed yet", its also coming from an
external file. Replace this author name with "External file
(--contents)" to better differentiate such lines from actual working
copy lines.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Glen Choo <chooglen@google.com>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several tests in t1300-config.sh that validate failing
invocations of "git config". However, there are no tests that check
what happens when "git config" is asked to retrieve a value for a
missing key.
Add tests that check this for various combinations of "<section>.<key>"
and "<section>.<subsection>.<key>".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>