"git $cmd --end-of-options --rev -- --path" for some $cmd failed
to interpret "--rev" as a rev, and "--path" as a path. This was
fixed for many programs like "reset" and "checkout".
* jk/end-of-options:
parse-options: decouple "--end-of-options" and "--"
The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.
* jc/revision-parse-int:
revision: parse integer arguments to --max-count, --skip, etc., more carefully
The sample pre-commit hook that tries to catch introduction of new
paths that use potentially non-portable characters did not notice
an existing path getting renamed to such a problematic path, when
rename detection was enabled.
* jp/use-diff-index-in-pre-commit-sample:
hooks--pre-commit: detect non-ASCII when renaming
trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.
* jh/trace2-redact-auth:
t0212: test URL redacting in EVENT format
t0211: test URL redacting in PERF format
trace2: redact passwords from https:// URLs by default
trace2: fix signature of trace2_def_param() macro
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.
* js/update-urls-in-doc-and-comment:
doc: refer to internet archive
doc: update links for andre-simon.de
doc: switch links to https
doc: update links to current pages
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications. The default has been
flipped to disable this pessimization.
* ps/commit-graph-less-paranoid:
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email". Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.
* tz/send-email-negatable-options:
send-email: avoid duplicate specification warnings
perl: bump the required Perl version to 5.8.1 from 5.8.0
The way CI testing used "prove" could lead to running the test
suite twice needlessly, which has been corrected.
* js/ci-discard-prove-state:
ci: avoid running the test suite _twice_
ci: add support for GitLab CI
ci: install test dependencies for linux-musl
ci: squelch warnings when testing with unusable Git repo
ci: unify setup of some environment variables
ci: split out logic to set up failed test artifacts
ci: group installation of Docker dependencies
ci: make grouping setup more generic
ci: reorder definitions for grouping functions
The "rev-list" and other commands in the "log" family, being the
oldest part of the system, use their own custom argument parsers,
and integer values of some options are parsed with atoi(), which
allows a non-digit after the number (e.g., "1q") to be silently
ignored. As a natural consequence, an argument that does not begin
with a digit (e.g., "q") silently becomes zero, too.
Switch to use strtol_i() and parse_timestamp() appropriately to
catch bogus input.
Note that one may naïvely expect that --max-count, --skip, etc., to
only take non-negative values, but we must allow them to also take
negative values, as an escape hatch to countermand a limit set by an
earlier option on the command line; the underlying variables are
initialized to (-1) and "--max-count=-1", for example, is a
legitimate way to reinitialize the limit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we added generic end-of-options support in 51b4594b40
(parse-options: allow --end-of-options as a synonym for "--",
2019-08-06), we made them true synonyms. They both stop option parsing,
and they are both returned in the resulting argv if the KEEP_DASHDASH
flag is used.
The hope was that this would work for all callers:
- most generic callers would not pass KEEP_DASHDASH, and so would just
do the right thing (stop parsing there) without needing to know
anything more.
- callers with KEEP_DASHDASH were generally going to rely on
setup_revisions(), which knew to handle --end-of-options specially
But that turned out miss quite a few cases that pass KEEP_DASHDASH but
do their own manual parsing. For example, "git reset", "git checkout",
and so on want pass KEEP_DASHDASH so they can support:
git reset $revs -- $paths
but of course aren't going to actually do a traversal, so they don't
call setup_revisions(). And those cases currently get confused by
--end-of-options being left in place, like:
$ git reset --end-of-options HEAD
fatal: option '--end-of-options' must come before non-option arguments
We could teach each of these callers to handle the leftover option
explicitly. But let's try to be a bit more clever and see if we can
solve it centrally in parse-options.c.
The bogus assumption here is that KEEP_DASHDASH tells us the caller
wants to see --end-of-options in the result. But really, the callers
which need to know that --end-of-options was reached are those that may
potentially parse more options from argv. In other words, those that
pass the KEEP_UNKNOWN_OPT flag.
If such a caller is aware of --end-of-options (e.g., because they call
setup_revisions() with the result), then this will continue to do the
right thing, treating anything after --end-of-options as a non-option.
And if the caller is not aware of --end-of-options, they are better off
keeping it intact, because either:
1. They are just passing the options along to somebody else anyway, in
which case that somebody would need to know about the
--end-of-options marker.
2. They are going to parse the remainder themselves, at which point
choking on --end-of-options is much better than having it silently
removed. The point is to avoid option injection from untrusted
command line arguments, and bailing is better than quietly treating
the untrusted argument as an option.
This fixes bugs with --end-of-options across several commands, but I've
focused on two in particular here:
- t7102 confirms that "git reset --end-of-options --foo" now works.
This checks two things. One, that we no longer barf on
"--end-of-options" itself (which previously we did, even if the rev
was something vanilla like "HEAD" instead of "--foo"). And two, that
we correctly treat "--foo" as a revision rather than an option.
This fix applies to any other cases which pass KEEP_DASHDASH but not
KEEP_UNKNOWN_OPT, like "git checkout", "git check-attr", "git grep",
etc, which would previously choke on "--end-of-options".
- t9350 shows the opposite case: fast-export passed KEEP_UNKNOWN_OPT
but not KEEP_DASHDASH, but then passed the result on to
setup_revisions(). So it never saw --end-of-options, and would
erroneously parse "fast-export --end-of-options --foo" as having a
"--foo" option. This is now fixed.
Note that this does shut the door for callers which want to know if we
hit end-of-options, but don't otherwise need to keep unknown opts. The
obvious thing here is feeding it to the DWIM verify_filename()
machinery. And indeed, this is a problem even for commands which do
understand --end-of-options already. For example, without this patch,
you get:
$ git log --end-of-options --foo
fatal: option '--foo' must come before non-option arguments
because we refuse to accept "--foo" as a filename (because it starts
with a dash) even though we could know that we saw end-of-options. The
verify_filename() function simply doesn't accept this extra information.
So that is the status quo, and this patch doubles down further on that.
Commands like "git reset" have the same problem, but they won't even
know that parse-options saw --end-of-options! So even if we fixed
verify_filename(), they wouldn't have anything to pass to it.
But in practice I don't think this is a big deal. If you are being
careful enough to use --end-of-options, then you should also be using
"--" to disambiguate and avoid the DWIM behavior in the first place. In
other words, doing:
git log --end-of-options --this-is-a-rev -- --this-is-a-path
works correctly, and will continue to do so. And likewise, with this
patch now:
git reset --end-of-options --this-is-a-rev -- --this-is-a-path
will work, as well.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diff.renames is turned on, the diff-filter will not return renamed
files (or copied ones with diff.renames=copy) and potential non-ASCII
characters would not be caught by this hook.
Use the plumbing command diff-index instead of the porcelain one to not
be affected by diff.rename.
Signed-off-by: Julian Prein <druckdev@protonmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These pages are no longer reachable from their original locations,
which makes things difficult for readers. Instead, switch to linking to
the Internet Archive for the content.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Beyond the fact that it's somewhat traditional to respect sites'
self-identification, it's helpful for links to point to the things
that people expect them to reference. Here that means linking to
specific pages instead of a domain.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These sites offer https versions of their content.
Using the https versions provides some protection for users.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's somewhat traditional to respect sites' self-identification.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the added tests cases, skip testing the `GIT_TRACE2_REDACT=0` case
because we would need to exactly model the full JSON event stream like
we did in the preceding basic tests and I do not think it is worth it.
Furthermore, the Trace2 routines print the same content in normal, perf,
or event format, and in t0210 and t0211 we already tested the basic
functionality, so no need to repeat it here.
In this test, we use the test-helper to unit test each of the event
messages where URLs can appear and confirm that they are redacted in
each event.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This transmogrifies the test case that was just added to t0210, to also
cover the `GIT_TRACE2_PERF` backend.
Just like t0211, we now have to toggle the `TEST_PASSES_SANITIZE_LEAK`
annotation.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is an unsafe practice to call something like
git clone https://user:password@example.com/
This not only risks leaking the password "over the shoulder" or into the
readline history of the current Unix shell, it also gets logged via
Trace2 if enabled.
Let's at least avoid logging such secrets via Trace2, much like we avoid
logging secrets in `http.c`. Much like the code in `http.c` is guarded
via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
`GIT_TRACE2_REDACT` (also defaulting to `true`).
The new tests added in this commit uncover leaks in `builtin/clone.c`
and `remote.c`. Therefore we need to turn off
`TEST_PASSES_SANITIZE_LEAK`. The reasons:
- We observed that `the_repository->remote_status` is not released
properly.
- We are using `url...insteadOf` and that runs into a code path where an
allocated URL is replaced with another URL, and the original URL is
never released.
- `remote_states` contains plenty of `struct remote`s whose refspecs
seem to be usually allocated by never released.
More investigation is needed here to identify the exact cause and
proper fixes for these leaks/bugs.
Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add `struct key_value_info` argument to `trace2_def_param()`.
In dc90208497 (trace2: plumb config kvi, 2023-06-28) a `kvi`
argument was added to `trace2_def_param_fl()` but the macro
was not up updated. Let's fix that.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"To dereference" and "to peel" were sometimes used in in-code
comments and documentation but without description in the glossary.
* vd/glossary-dereference-peel:
glossary: add definitions for dereference & peel
A warning is issued for options which are specified more than once
beginning with perl-Getopt-Long >= 2.55. In addition to causing users
to see warnings, this results in test failures which compare the output.
An example, from t9001-send-email.37:
| +++ diff -u expect actual
| --- expect 2023-11-14 10:38:23.854346488 +0000
| +++ actual 2023-11-14 10:38:23.848346466 +0000
| @@ -1,2 +1,7 @@
| +Duplicate specification "no-chain-reply-to" for option "no-chain-reply-to"
| +Duplicate specification "to-cover|to-cover!" for option "to-cover"
| +Duplicate specification "cc-cover|cc-cover!" for option "cc-cover"
| +Duplicate specification "no-thread" for option "no-thread"
| +Duplicate specification "no-to-cover" for option "no-to-cover"
| fatal: longline.patch:35 is longer than 998 characters
| warning: no patches were sent
| error: last command exited with $?=1
| not ok 37 - reject long lines
Remove the duplicate option specs. These are primarily the explicit
'--no-' prefix opts which were added in f471494303 (git-send-email.perl:
support no- prefix with older GetOptions, 2015-01-30). This was done
specifically to support perl-5.8.0 which includes Getopt::Long 2.32[1].
Getopt::Long 2.33 added support for the '--no-' prefix natively by
appending '!' to the option specification string, which was included in
perl-5.8.1 and is not present in perl-5.8.0. The previous commit bumped
the minimum supported Perl version to 5.8.1 so we no longer need to
provide the '--no-' variants for negatable options manually.
Teach `--git-completion-helper` to output the '--no-' options. They are
not included in the options hash and would otherwise be lost.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following commit will make use of a Getopt::Long feature which is
only present in Perl >= 5.8.1. Document that as the minimum version we
support.
Many of our Perl scripts will continue to run with 5.8.0 but this change
allows us to adjust them as needed without breaking any promises to our
users.
The Perl requirement was last changed in d48b284183 (perl: bump the
required Perl version to 5.8 from 5.6.[21], 2010-09-24). At that time,
5.8.0 was 8 years old. It is now over 21 years old.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few stray single quotes crept into the usage string in a2ce608244
(send-email docs: add format-patch options, 2021-10-25). Remove them.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add 'gitglossary' definitions for "dereference" (as it used for both symrefs
and objects) and "peel". These terms are used in options and documentation
throughout Git, but they are not clearly defined anywhere and the behavior
they refer to depends heavily on context. Provide explicit definitions to
clarify existing documentation to users and help contributors to use the
most appropriate terminology possible in their additions to Git.
Update other definitions in the glossary that use the term "dereference" to
link to 'def_dereference'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a late amendment of 4a6e4b9602 (CI: remove Travis CI support,
2021-11-23), whereby the `.prove` file (being written by the `prove`
command that is used to run the test suite) is no longer retained
between CI builds: This feature was only ever used in the Travis CI
builds, we tried for a while to do the same in Azure Pipelines CI runs
(but I gave up on it after a while), and we never used that feature in
GitHub Actions (nor does the new GitLab CI code use it).
Retaining the Prove cache has been fragile from the start, even though
the idea seemed good at the time, the idea being that the `.prove` file
caches information about previous `prove` runs (`save`) and uses them
(`slow`) to run the tests in the order from longer-running to shorter
ones, making optimal use of the parallelism implied by `--jobs=<N>`.
However, using a Prove cache can cause some surprising behavior: When
the `prove` caches information about a test script it has run,
subsequent `prove` runs (with `--state=slow`) will run the same test
script again even if said script is not specified on the `prove`
command-line!
So far, this bug did not matter. Right until d8f416bbb8 (ci: run unit
tests in CI, 2023-11-09) did it not matter.
But starting with that commit, we invoke `prove` _twice_ in CI, once to
run the regular test suite of regression test scripts, and once to run
the unit tests. Due to the bug, the second invocation re-runs all of the
tests that were already run as part of the first invocation. This not
only wastes build minutes, it also frequently causes the `osx-*` jobs to
fail because they already take a long time and now are likely to run
into a timeout.
The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.
So let's just drop that Prove option and live happily ever after.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already support Azure Pipelines and GitHub Workflows in the Git
project, but until now we do not have support for GitLab CI. While it is
arguably not in the interest of the Git project to maintain a ton of
different CI platforms, GitLab has recently ramped up its efforts and
tries to contribute to the Git project more regularly.
Part of a problem we hit at GitLab rather frequently is that our own,
custom CI setup we have is so different to the setup that the Git
project has. More esoteric jobs like "linux-TEST-vars" that also set a
couple of environment variables do not exist in GitLab's custom CI
setup, and maintaining them to keep up with what Git does feels like
wasted time. The result is that we regularly send patch series upstream
that fail to compile or pass tests in GitHub Workflows. We would thus
like to integrate the GitLab CI configuration into the Git project to
help us send better patch series upstream and thus reduce overhead for
the maintainer. Results of these pipeline runs will be made available
(at least) in GitLab's mirror of the Git project at [1].
This commit introduces the integration into our regular CI scripts so
that most of the setup continues to be shared across all of the CI
solutions. Note that as the builds on GitLab CI run as unprivileged
user, we need to pull in both sudo and shadow packages to our Alpine
based job to set this up.
[1]: https://gitlab.com/gitlab-org/git
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The linux-musl CI job executes tests on Alpine Linux, which is based on
musl libc instead of glibc. We're missing some test dependencies though,
which causes us to skip a subset of tests.
Install these test dependencies to increase our test coverage on this
platform. There are still some missing test dependecies, but these do
not have a corresponding package in the Alpine repositories:
- p4 and p4d, both parts of the Perforce version control system.
- cvsps, which generates patch sets for CVS.
- Subversion and the SVN::Core Perl library, the latter of which is
not available in the Alpine repositories. While the tool itself is
available, all Subversion-related tests are skipped without the
SVN::Core Perl library anyway.
The Apache2-based tests require a bit more care though. For one, the
module path is different on Alpine Linux, which requires us to add it to
the list of known module paths to detect it. But second, the WebDAV
module on Alpine Linux is broken because it does not bundle the default
database backend [1]. We thus need to skip the WebDAV-based tests on
Alpine Linux for now.
[1]: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13112
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>