Unlike other environment variables that took the usual
true/false/yes/no as well as 0/1, GIT_FLUSH only understood 0/1,
which has been corrected.
* cp/git-flush-is-an-env-bool:
write-or-die: make GIT_FLUSH a Boolean environment variable
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.
* js/update-urls-in-doc-and-comment:
doc: refer to internet archive
doc: update links for andre-simon.de
doc: switch links to https
doc: update links to current pages
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications. The default has been
flipped to disable this pessimization.
* ps/commit-graph-less-paranoid:
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email". Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.
* tz/send-email-negatable-options:
send-email: avoid duplicate specification warnings
perl: bump the required Perl version to 5.8.1 from 5.8.0
Among Git's environment variables, the ones marked as "Boolean"
accept values in a way similar to Boolean configuration variables,
i.e. values like 'yes', 'on', 'true' and positive numbers are
taken as "on" and values like 'no', 'off', 'false' are taken as
"off".
GIT_FLUSH can be used to force Git to use non-buffered I/O when
writing to stdout. It can only accept two values, '1' which causes
Git to flush more often and '0' which makes all output buffered.
Make GIT_FLUSH accept more values besides '0' and '1' by turning it
into a Boolean environment variable, modifying the required logic.
Update the related documentation.
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Beyond the fact that it's somewhat traditional to respect sites'
self-identification, it's helpful for links to point to the things
that people expect them to reference. Here that means linking to
specific pages instead of a domain.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These sites offer https versions of their content.
Using the https versions provides some protection for users.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's somewhat traditional to respect sites' self-identification.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"To dereference" and "to peel" were sometimes used in in-code
comments and documentation but without description in the glossary.
* vd/glossary-dereference-peel:
glossary: add definitions for dereference & peel
The following commit will make use of a Getopt::Long feature which is
only present in Perl >= 5.8.1. Document that as the minimum version we
support.
Many of our Perl scripts will continue to run with 5.8.0 but this change
allows us to adjust them as needed without breaking any promises to our
users.
The Perl requirement was last changed in d48b284183 (perl: bump the
required Perl version to 5.8 from 5.6.[21], 2010-09-24). At that time,
5.8.0 was 8 years old. It is now over 21 years old.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add 'gitglossary' definitions for "dereference" (as it used for both symrefs
and objects) and "peel". These terms are used in options and documentation
throughout Git, but they are not clearly defined anywhere and the behavior
they refer to depends heavily on context. Provide explicit definitions to
clarify existing documentation to users and help contributors to use the
most appropriate terminology possible in their additions to Git.
Update other definitions in the glossary that use the term "dereference" to
link to 'def_dereference'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge-file" learns a mode to read three contents to be merged
from blob objects.
* bc/merge-file-object-input:
merge-file: add an option to process object IDs
git-merge-file doc: drop "-file" from argument placeholders
Teach "git show-ref" a mode to check the existence of a ref.
* ps/show-ref:
t: use git-show-ref(1) to check for ref existence
builtin/show-ref: add new mode to check for reference existence
builtin/show-ref: explicitly spell out different modes in synopsis
builtin/show-ref: ensure mutual exclusiveness of subcommands
builtin/show-ref: refactor options for patterns subcommand
builtin/show-ref: stop using global vars for `show_one()`
builtin/show-ref: stop using global variable to count matches
builtin/show-ref: refactor `--exclude-existing` options
builtin/show-ref: fix dead code when passing patterns
builtin/show-ref: fix leaking string buffer
builtin/show-ref: split up different subcommands
builtin/show-ref: convert pattern to a local variable
The codepath to traverse the commit-graph learned to notice that a
commit is missing (e.g., corrupt repository lost an object), even
though it knows something about the commit (like its parents) from
what is in commit-graph.
* ps/do-not-trust-commit-graph-blindly-for-existence:
commit: detect commits that exist in commit-graph but not in the ODB
commit-graph: introduce envvar to disable commit existence checks
Offer a slightly more verbose description of the issue fixed by
7144dee3ec (credential/libsecret: erase matching creds only, 2023-07-26)
and cb626f8e5c (credential/wincred: erase matching creds only,
2023-07-26).
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
References from description of the `--patch` option in various
manual pages have been simplified and improved.
* so/diff-doc-for-patch-update:
doc/diff-options: fix link to generating patch section
Overly long label names used in the sequencer machinery are now
chopped to fit under filesystem limitation.
* mp/rebase-label-length-limit:
rebase: allow overriding the maximal length of the generated labels
sequencer: truncate labels to accommodate loose refs
git merge-file knows how to merge files on the file system already. It
would be helpful, however, to allow it to also merge single blobs.
Teach it an `--object-id` option which means that its arguments are
object IDs and not files to allow it to do so.
We handle the empty blob specially since read_mmblob doesn't read it
directly and otherwise users cannot specify an empty ancestor.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git merge-file` takes three positional arguments. Each of them is
documented as `<foo-file>`. In preparation for teaching this command to
alternatively take three object IDs, make these placeholders a bit more
generic by dropping the "-file" parts. Instead, clarify early that the
three arguments are filenames. Even after the next commit, we can afford
to present this file-centric view up front and in the general
discussion, since it will remain the default one.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in 32f3c541e3 (multi-pack-index: write pack names in chunk,
2018-07-12) the MIDX's "Packfile Names" (or "PNAM", for short) chunk was
described as containing an array of string entries. e0d1bcf825 notes
that this is the only chunk in the MIDX format's specification that is
not guaranteed to be 4-byte aligned, and so should be placed last.
This isn't quite accurate: the entries within the PNAM chunk are not
guaranteed to be 4-byte aligned since they are arbitrary strings, but
the chunk itself is 4-byte aligned since the ending is padded with NUL
bytes.
That padding has always been there since 32f3c541e3 via
midx.c::write_midx_pack_names(), which ended with:
i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT)
if (i < MIDX_CHUNK_ALIGNMENT) {
unsigned char padding[MIDX_CHUNK_ALIGNMENT];
memset(padding, 0, sizeof(padding))
hashwrite(f, padding, i);
written += i;
}
In fact, 32f3c541e3's log message itself describes the chunk in its
first paragraph with:
Since filenames are not well structured, add padding to keep good
alignment in later chunks.
So these have always been externally aligned. Correct the corresponding
part of our documentation to reflect that.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
e0d1bcf825 (multi-pack-index: add format details, 2018-07-12) describes
the MIDX's "PNAM" chunk as having entries which are "null-terminated
strings".
This is a typo, as strings are terminated with a NUL character, which is
a distinct concept from "NULL" or "null", which we typically reserve for
the void pointer to address 0.
Correct the documentation accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have multiple ways to show the value of a given reference, we
do not have any way to check whether a reference exists at all. While
commands like git-rev-parse(1) or git-show-ref(1) can be used to check
for reference existence in case the reference resolves to something
sane, neither of them can be used to check for existence in some other
scenarios where the reference does not resolve cleanly:
- References which have an invalid name cannot be resolved.
- References to nonexistent objects cannot be resolved.
- Dangling symrefs can be resolved via git-symbolic-ref(1), but this
requires the caller to special case existence checks depending on
whether or not a reference is symbolic or direct.
Furthermore, git-rev-list(1) and other commands do not let the caller
distinguish easily between an actually missing reference and a generic
error.
Taken together, this seems like sufficient motivation to introduce a
separate plumbing command to explicitly check for the existence of a
reference without trying to resolve its contents.
This new command comes in the form of `git show-ref --exists`. This
new mode will exit successfully when the reference exists, with a
specific exit code of 2 when it does not exist, or with 1 when there
has been a generic error.
Note that the only way to properly implement this command is by using
the internal `refs_read_raw_ref()` function. While the public function
`refs_resolve_ref_unsafe()` can be made to behave in the same way by
passing various flags, it does not provide any way to obtain the errno
with which the reference backend failed when reading the reference. As
such, it becomes impossible for us to distinguish generic errors from
the explicit case where the reference wasn't found.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis treats the `--verify` and the implicit mode the same. They
are slightly different though:
- They accept different sets of flags.
- The implicit mode accepts patterns while the `--verify` mode
accepts references.
Split up the synopsis for these two modes such that we can disambiguate
those differences.
While at it, drop "--quiet" from the pattern mode's synopsis. It does
not make a lot of sense to list patterns, but squelch the listing output
itself. The description for "--quiet" is adapted accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.
We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.
In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.
The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>