The code to parse the From e-mail header has been updated to avoid
recursion.
* jk/mailinfo-iterative-unquote-comment:
mailinfo: avoid recursion when unquoting From headers
t5100: make rfc822 comment test more careful
Code clean-up for sanity checking of command line options for "git
show-ref".
* rs/show-ref-incompatible-options:
show-ref: use die_for_incompatible_opt3()
Bunch of small fix-ups to the reftable code.
* ps/reftable-fixes:
reftable/block: reuse buffer to compute record keys
reftable/block: introduce macro to initialize `struct block_iter`
reftable/merged: reuse buffer to compute record keys
reftable/stack: fix use of unseeded randomness
reftable/stack: fix stale lock when dying
reftable/stack: reuse buffers when reloading stack
reftable/stack: perform auto-compaction with transactional interface
reftable/stack: verify that `reftable_stack_add()` uses auto-compaction
reftable: handle interrupted writes
reftable: handle interrupted reads
reftable: wrap EXPECT macros in do/while
The optimization based on fsmonitor in the "diff --cached"
codepath is resurrected with the "fake-lstat" introduced earlier.
* jc/diff-cached-fsmonitor-fix:
diff-lib: fix check_removed() when fsmonitor is active
A new helper to let us pretend that we called lstat() when we know
our cache_entry is up-to-date via fsmonitor.
* jc/fake-lstat:
cache: add fake_lstat()
"git checkout -B <branch> [<start-point>]" allowed a branch that is
in use in another worktree to be updated and checked out, which
might be a bit unexpected. The rule has been tightened, which is a
breaking change. "--ignore-other-worktrees" option is required to
unbreak you, if you are used to the current behaviour that "-B"
overrides the safety.
* jc/checkout-B-branch-in-use:
checkout: forbid "-B <branch>" from touching a branch used elsewhere
checkout: refactor die_if_checked_out() caller
Code clean-up around use of configuration variables.
* jk/config-cleanup:
sequencer: simplify away extra git_config_string() call
gpg-interface: drop pointless config_error_nonbool() checks
push: drop confusing configset/callback redundancy
config: use git_config_string() for core.checkRoundTripEncoding
diff: give more detailed messages for bogus diff.* config
config: use config_error_nonbool() instead of custom messages
imap-send: don't use git_die_config() inside callback
git_xmerge_config(): prefer error() to die()
config: reject bogus values for core.checkstat
Some codepaths did not correctly parse configuration variables
specified with valueless "true", which has been corrected.
* jk/implicit-true:
fsck: handle NULL value when parsing message config
trailer: handle NULL value when parsing trailer-specific config
submodule: handle NULL value when parsing submodule.*.branch
help: handle NULL value for alias.* config
trace2: handle NULL values in tr2_sysenv config callback
setup: handle NULL value when parsing extensions
config: handle NULL value when parsing non-bools
"git bisect reset" has been taught to clean up state files and refs
even when BISECT_START file is gone.
* jk/bisect-reset-fix:
bisect: always clean on reset
"git $cmd --end-of-options --rev -- --path" for some $cmd failed
to interpret "--rev" as a rev, and "--path" as a path. This was
fixed for many programs like "reset" and "checkout".
* jk/end-of-options:
parse-options: decouple "--end-of-options" and "--"
Clean-up code that handles combinations of incompatible options.
* rs/incompatible-options-messages:
worktree: simplify incompatibility message for --orphan and commit-ish
worktree: standardize incompatibility messages
clean: factorize incompatibility message
revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
repack: use die_for_incompatible_opt3() for -A/-k/--cruft
push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.
* jc/revision-parse-int:
revision: parse integer arguments to --max-count, --skip, etc., more carefully
Tests update.
* ps/ref-tests-update-more:
t6301: write invalid object ID via `test-tool ref-store`
t5551: stop writing packed-refs directly
t5401: speed up creation of many branches
t4013: simplify magic parsing and drop "failure"
t3310: stop checking for reference existence via `test -f`
t1417: make `reflog --updateref` tests backend agnostic
t1410: use test-tool to create empty reflog
t1401: stop treating FETCH_HEAD as real reference
t1400: split up generic reflog tests from the reffile-specific ones
t0410: mark tests to require the reffiles backend
The sample pre-commit hook that tries to catch introduction of new
paths that use potentially non-portable characters did not notice
an existing path getting renamed to such a problematic path, when
rename detection was enabled.
* jp/use-diff-index-in-pre-commit-sample:
hooks--pre-commit: detect non-ASCII when renaming
Command line completion (in contrib/) learned to complete path
arguments to the "add/set" subcommands of "git sparse-checkout"
better.
* en/complete-sparse-checkout:
completion: avoid user confusion in non-cone mode
completion: avoid misleading completions in cone mode
completion: fix logic for determining whether cone mode is active
completion: squelch stray errors in sparse-checkout completion
trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.
* jh/trace2-redact-auth:
t0212: test URL redacting in EVENT format
t0211: test URL redacting in PERF format
trace2: redact passwords from https:// URLs by default
trace2: fix signature of trace2_def_param() macro
"git merge-file" learned to take the "--diff-algorithm" option to
use algorithm different from the default "myers" diff.
* ad/merge-file-diff-algo:
merge-file: add --diff-algorithm option
Clean-up code that handles combinations of incompatible options.
* rs/i18n-cannot-be-used-together:
i18n: factorize even more 'incompatible options' messages
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.
* js/update-urls-in-doc-and-comment:
doc: refer to internet archive
doc: update links for andre-simon.de
doc: switch links to https
doc: update links to current pages
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications. The default has been
flipped to disable this pessimization.
* ps/commit-graph-less-paranoid:
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
Introduce "git replay", a tool meant on the server side without
working tree to recreate a history.
* cc/git-replay:
replay: stop assuming replayed branches do not diverge
replay: add --contained to rebase contained branches
replay: add --advance or 'cherry-pick' mode
replay: use standard revision ranges
replay: make it a minimal server side command
replay: remove HEAD related sanity check
replay: remove progress and info output
replay: add an important FIXME comment about gpg signing
replay: change rev walking options
replay: introduce pick_regular_commit()
replay: die() instead of failing assert()
replay: start using parse_options API
replay: introduce new builtin
t6429: remove switching aspects of fast-rebase
Simplify API implementation to delete references by eliminating
duplication.
* ps/ref-deletion-updates:
refs: remove `delete_refs` callback from backends
refs: deduplicate code to delete references
refs/files: use transactions to delete references
t5510: ensure that the packed-refs file needs locking
With one exception, the synopsis for `git add` consistently lists the
short counterpart alongside the long-form of each option (for instance,
"[--edit | -e]"). The exception is that -A is not mentioned alongside
--all. Fix this inconsistency
Reported-by: Benjamin Lehmann <ben.lehmann@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.
The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
`diff -w -u`.
To accomodate for cases where the host system has no Git installation we
use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand. It will refuse to
run and thus cause the checks to fail.
Instead of improving the detection logic, fix our ".expect" files so
that we do not need any post-processing at all anymore. This allows us
to drop the `-w` flag when diffing so that we can always use diff(1)
now.
Note that we keep some of the post-processing of `chainlint.pl` output
intact to strip leading line numbers generated by the script. Having
these would cause a rippling effect whenever we add a new test that
sorts into the middle of existing tests and would require us to
renumerate all subsequent lines, which seems rather pointless.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our unquote_comment() function is recursive; when it sees a comment
within a comment, like:
(this is an (embedded) comment)
it recurses to handle the inner comment. This is fine for practical use,
but it does mean that you can easily run out of stack space with a
malicious header. For example:
perl -e 'print "From: ", "(" x 2**18;' |
git mailinfo /dev/null /dev/null
segfaults on my system. And since mailinfo is likely to be fed untrusted
input from the Internet (if not by human users, who might recognize a
garbage header, but certainly there are automated systems that apply
patches from a list) it may be possible for an attacker to trigger the
problem.
That said, I don't think there's an interesting security vulnerability
here. All an attacker can do is make it impossible to parse their email
and apply their patch, and there are lots of ways to generate bogus
emails. So it's more of an annoyance than anything.
But it's pretty easy to fix it. The recursion is not helping us preserve
any particular state from each level. The only flag in our parsing is
take_next_literally, and we can never recurse when it is set (since the
start of a new comment implies it was not backslash-escaped). So it is
really only useful for finding the end of the matched pair of
parentheses. We can do that easily with a simple depth counter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When processing "From" headers in an email, mailinfo "unquotes" quoted
strings and rfc822 parenthesized comments. For quoted strings, we
actually remove the double-quotes, so:
From: "A U Thor" <someone@example.com>
become:
Author: A U Thor
Email: someone@example.com
But for comments, we leave the outer parentheses in place, so:
From: A U (this is a comment) Thor <someone@example.com>
becomes:
Author: A U (this is a comment) Thor
Email: someone@example.com
So what is the comment "unquoting" actually doing? In our code, being in
a comment section has exactly two effects:
1. We'll unquote backslash-escaped characters inside a comment
section.
2. We _won't_ unquote double-quoted strings inside a comment section.
Our test for comments in t5100 checks this:
From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))
So it is covering (1), but not (2). Let's add in a quoted string to
cover this.
Moreover, because the comment appears at the end of the From header,
there's nothing to confirm that we correctly found the end of the
comment section (and not just the end-of-string). Let's instead move it
to the beginning of the header, which means we can confirm that the
existing quoted string is detected (which will only happen if we know
we've left the comment block).
As expected, the test continues to pass, but this will give us more
confidence as we refactor the code in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout -B <branch> [<start-point>]", being a "forced" version
of "-b", switches to the <branch>, after optionally resetting its
tip to the <start-point>, even if the <branch> is in use in another
worktree, which is somewhat unexpected.
Protect the <branch> using the same logic that forbids "git checkout
<branch>" from touching a branch that is in use elsewhere.
This is a breaking change that may deserve backward compatibliity
warning in the Release Notes. The "--ignore-other-worktrees" option
can be used as an escape hatch if the finger memory of existing
users depend on the current behaviour of "-B".
Reported-by: Willem Verstraeten <willem.verstraeten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When processing a header like a "From" line, mailinfo uses
unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
comments. It takes a NUL-terminated string on input, and loops over the
"in" pointer until it sees the NUL. When it finds the start of an
interesting block, it delegates to helper functions which also increment
"in", and return the updated pointer.
But there's a bug here: the helpers find the NUL with a post-increment
in the loop condition, like:
while ((c = *in++) != 0)
So when they do see a NUL (rather than the correct termination of the
quote or comment section), they return "in" as one _past_ the NUL
terminator. And thus the outer loop in unquote_quoted_pair() does not
realize we hit the NUL, and keeps reading past the end of the buffer.
We should instead make sure to return "in" positioned at the NUL, so
that the caller knows to stop their loop, too. A hacky way to do this is
to return "in - 1" after leaving the inner loop. But a slightly cleaner
solution is to avoid incrementing "in" until we are sure it contained a
non-NUL byte (i.e., doing it inside the loop body).
The two tests here show off the problem. Since we check the output,
they'll _usually_ report a failure in a normal build, but it depends on
what garbage bytes are found after the heap buffer. Building with
SANITIZE=address reliably notices the problem. The outcome (both the
exit code and the exact bytes) are just what Git happens to produce for
these cases today, and shouldn't be taken as an endorsement. It might be
reasonable to abort on an unterminated string, for example. The priority
for this patch is fixing the out-of-bounds memory access.
Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When iterating over entries in the block iterator we compute the key of
each of the entries and write it into a buffer. We do not reuse the
buffer though and thus re-allocate it on every iteration, which is
wasteful.
Refactor the code to reuse the buffer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a bunch of locations where we initialize members of `struct
block_iter`, which makes it harder than necessary to expand this struct
to have additional members. Unify the logic via a new `BLOCK_ITER_INIT`
macro that initializes all members.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When iterating over entries in the merged iterator's queue, we compute
the key of each of the entries and write it into a buffer. We do not
reuse the buffer though and thus re-allocate it on every iteration,
which is wasteful given that we never transfer ownership of the
allocated bytes outside of the loop.
Refactor the code to reuse the buffer. This also fixes a potential
memory leak when `merged_iter_advance_subiter()` returns an error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new reftable stack, Git will first create the stack with
a random suffix so that concurrent updates will not try to write to the
same file. This random suffix is computed via a call to rand(3P). But we
never seed the function via srand(3P), which means that the suffix is in
fact always the same.
Fix this bug by using `git_rand()` instead, which does not need to be
initialized. While this function is likely going to be slower depending
on the platform, this slowness should not matter in practice as we only
use it when writing a new reftable stack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When starting a transaction via `reftable_stack_init_addition()`, we
create a lockfile for the reftable stack itself which we'll write the
new list of tables to. But if we terminate abnormally e.g. via a call to
`die()`, then we do not remove the lockfile. Subsequent executions of
Git which try to modify references will thus fail with an out-of-date
error.
Fix this bug by registering the lock as a `struct tempfile`, which
ensures automatic cleanup for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reftable_stack_reload_once()` we iterate over all the tables added
to the stack in order to figure out whether any of the tables needs to
be reloaded. We use a set of buffers in this context to compute the
paths of these tables, but discard those buffers on every iteration.
This is quite wasteful given that we do not need to transfer ownership
of the allocated buffer outside of the loop.
Refactor the code to instead reuse the buffers to reduce the number of
allocations we need to do. Note that we do not have to manually reset
the buffer because `stack_filename()` does this for us already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever updating references or reflog entries in the reftable stack, we
need to add a new table to the stack, thus growing the stack's length by
one. The stack can grow to become quite long rather quickly, leading to
performance issues when trying to read records. But besides performance
issues, this can also lead to exhaustion of file descriptors very
rapidly as every single table requires a separate descriptor when
opening the stack.
While git-pack-refs(1) fixes this issue for us by merging the tables, it
runs too irregularly to keep the length of the stack within reasonable
limits. This is why the reftable stack has an auto-compaction mechanism:
`reftable_stack_add()` will call `reftable_stack_auto_compact()` after
its added the new table, which will auto-compact the stack as required.
But while this logic works alright for `reftable_stack_add()`, we do not
do the same in `reftable_addition_commit()`, which is the transactional
equivalent to the former function that allows us to write multiple
updates to the stack atomically. Consequentially, we will easily run
into file descriptor exhaustion in code paths that use many separate
transactions like e.g. non-atomic fetches.
Fix this issue by calling `reftable_stack_auto_compact()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have several tests that check whether we correctly perform
auto-compaction when manually calling `reftable_stack_auto_compact()`,
we don't have any tests that verify whether `reftable_stack_add()` does
call it automatically. Add one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are calls to write(3P) where we don't properly handle interrupts.
Convert them to use `write_in_full()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are calls to pread(3P) and read(3P) where we don't properly handle
interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,
respectively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `EXPECT` macros used by the reftable test framework are all using a
single `if` statement with the actual condition. This results in weird
syntax when using them in if/else statements like the following:
```
if (foo)
EXPECT(foo == 2)
else
EXPECT(bar == 2)
```
Note that there need not be a trailing semicolon. Furthermore, it is not
immediately obvious whether the else now belongs to the `if (foo)` or
whether it belongs to the expanded `if (foo == 2)` from the macro.
Fix this by wrapping the macros in a do/while loop.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the standard message for reporting the use of multiple mutually
exclusive options by calling die_for_incompatible_opt3() instead of
rolling our own. This has the benefits of showing only the actually
given options, reducing the number of strings to translate and making
the UI slightly more consistent.
Adjust the test to no longer insist on a specific order of the
reported options, as this implementation detail does not affect the
usefulness of the error message.
Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>