Most of our tree traversal algorithms use recursion to visit sub-trees.
For pathologically large trees, this can cause us to run out of stack
space and abort in an uncontrolled way. Let's put our own limit here so
that we can fail gracefully rather than segfaulting.
In similar cases where we recursed along the commit graph, we rewrote
the algorithms to avoid recursion and keep any stack data on the heap.
But the commit graph is meant to grow without bound, whereas it's not an
imposition to put a limit on the maximum size of tree we'll handle.
And this has a bonus side effect: coupled with a limit on individual
tree entry names, this limits the total size of a path we may encounter.
This gives us an extra protection against code handling long path names
which may suffer from integer overflows in the size (which could then be
exploited by malicious trees).
The default of 4096 is set to be much longer than anybody would care
about in the real world. Even with single-letter interior tree names
(like "a/b/c"), such a path is at least 8191 bytes. While most operating
systems will let you create such a path incrementally, trying to
reference the whole thing in a system call (as Git would do when
actually trying to access it) will result in ENAMETOOLONG. Coupled with
the recent fsck.largePathname warning, the maximum total pathname Git
will handle is (by default) 16MB.
This config option doesn't do anything yet; future patches will convert
various algorithms to respect the limit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In general, Git tries not to arbitrarily limit what it will store, and
there are currently no limits at all on the size of the path we find in
a tree. In theory you could have one that is gigabytes long.
But in practice this freedom is not really helping anybody, and is
potentially harmful:
1. Most operating systems have much lower limits for the size of a
single pathname component (e.g., on Linux you'll generally get
ENAMETOOLONG for anything over 255 bytes). And while you _can_ use
Git in a way that never touches the filesystem (manipulating the
index and trees directly), it's still probably not a good idea to
have gigantic tree names. Many operations load and traverse them,
so any clever Git-as-a-database scheme is likely to perform poorly
in that case.
2. We still have a lot of code which assumes strings are reasonably
sized, and I won't be at all surprised if you can trigger some
interesting integer overflows with gigantic pathnames. Stopping
malicious trees from entering the repository provides an extra line
of defense, protecting downstream code.
This patch implements an fsck check so that such trees can be rejected
by transfer.fsckObjects. I've picked a reasonably high maximum depth
here (4096) that hopefully should not bother anybody in practice. I've
also made it configurable, as an escape hatch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "error" variable in traverse_trees() shadows the global error()
function (meaning we can't call error() from here). Let's call the local
variable "ret" instead, which matches the idiom in other functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the previous commit dropped the hard-coded limit in
traverse_trees(), we don't need this macro there anymore (the code can
handle any number of trees in parallel).
We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to
5290d45134 (tree-walk.c: break circular dependency with unpack-trees,
2020-02-01). So we can just directly define that as "8" now; we know
traverse_trees() can handle whatever we throw at it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The traverse_trees() and traverse_trees_recursive() functions call each
other recursively. In a deep tree, this can result in running out of
stack space and crashing.
There's obviously going to be some limit here based on available stack,
but the problem is exacerbated by a few large structs, many of which we
over-allocate. For example, in traverse_trees() we store a name_entry
and tree_desc_x per tree, both of which contain an object_id (which is
now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even
though many traversals will only look at 1 or 2.
Interestingly, we used to allocate these on the heap, prior to
8dd40c0472 (traverse_trees(): use stack array for name entries,
2020-01-30). That commit was trying to simplify away allocation size
computations, and naively assumed that the sizes were small enough not
to matter. And they don't in normal cases, but on my stock Debian system
I see a crash running "git archive" on a tree with ~3600 entries.
That's deep enough we wouldn't see it in practice, but probably shallow
enough that we'd prefer not to make it a hard limit. Especially because
other systems may have even smaller stacks.
We can replace these stack variables with a few malloc invocations. This
reduces the stack sizes for the two functions from 1128 and 752 bytes,
respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on
my machine (the improvement isn't in linear proportion because my
numbers don't count the size of parameters and other function overhead).
The possible downsides are:
1. We now have to remember to free(). But both functions have an easy
single exit (and already had to clean up other bits anyway).
2. The extra malloc()/free() overhead might be measurable. I tested
this by setting up a 3000-depth tree with a single blob and running
"git archive" on it. After switching to the heap, it consistently
runs 2-3% faster! Presumably this is because the 1K+ of wasted
stack space penalized memory caches.
On a more real-world case like linux.git, the speed difference isn't
measurable at all, simply because most trees aren't that deep and
there's so much other work going on (like accessing the objects
themselves). So the improvement I saw should be taken as evidence that
we're not making anything worse, but isn't really that interesting on
its own. The main motivation here is that we're now less likely to run
out of stack space and crash.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a minor regression that some compiler might notice.
* 'jk/function-pointer-mismatches-fix' (early part):
fsck: use enum object_type for fsck_walk callback
We switched the function interface for fsck callbacks in a1aad71601
(fsck.h: use "enum object_type" instead of "int", 2021-03-28). However,
we accidentally flipped the type back to "int" as part of 0b4e9013f1
(fsck: mark unused parameters in various fsck callbacks, 2023-07-03).
The mistake happened because that commit was written before a1aad71601
and rebased forward, and I screwed up while resolving the conflict.
Curiously, the compiler does not warn about this mismatch, at least not
when using gcc and clang on Linux (nor in any of our CI environments).
Based on 28abf260a5 (builtin/fsck.c: don't conflate "int" and "enum" in
callback, 2021-06-01), I'd guess that this would cause the AIX xlc
compiler to complain. I noticed because clang-18's UBSan now identifies
mis-matched function calls at runtime, and does complain of this case
when running the test suite.
I'm not entirely clear on whether this mismatch is a problem in
practice. Compilers are certainly free to make enums smaller than "int"
if they don't need the bits, but I suspect that they have to promote
back to int for function calls (though I didn't dig in the standard, and
I won't be surprised if I'm simply wrong and the real-world impact would
depend on the ABI).
Regardless, switching it back to enum is obviously the right thing to do
here; the switch to "int" was simply a mistake.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Typo/grammofix to documentation added during this cycle.
* tl/notes-separator:
notes doc: tidy up `--no-stripspace` paragraph
notes doc: split up run-on sentences
With `--stdin`, we read *from* standard input, not *for*.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit 00bf685975 (show-ref doc: update for internal consistency,
2023-05-19) switched from double quotes to backticks around our {caret}
macro, we started rendering "{caret}" literally. Fix this by replacing
by a "^" character.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Where we document the `--no-stripspace` option, remove a superfluous
"For" to fix the grammar. Mark option names and command names using
`backticks` to set them in monospace.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit c4e2aa7d45 (notes.c: introduce "--[no-]stripspace" option,
2023-05-27) mentioned the new `--no-stripspace` in the documentation for
`-m` and `-F`, it created run-on sentences. It also used slightly
different language in the two sections for no apparent reason. Split the
sentences in two to improve readability, and while touching the two
sites, make them more similar.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of github.com:git/git: (34 commits)
Git 2.42-rc2
t4053: avoid writing to unopened pipe
t4053: avoid race when killing background processes
Git 2.42-rc1
git maintenance: avoid console window in scheduled tasks on Windows
win32: add a helper to run `git.exe` without a foreground window
t9001: remove excessive GIT_SEND_EMAIL_NOTTY=1
mv: handle lstat() failure correctly
parse-options: disallow negating OPTION_SET_INT 0
repack: free geometry struct
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
t0040: declare non-tab indentation to be okay in this script
advice: handle "rebase" in error_resolve_conflict()
A few more topics before -rc1
mailmap: change primary address for Glen Choo
gitignore: ignore clangd .cache directory
docs: update when `git bisect visualize` uses `gitk`
compat/mingw: implement a native locate_in_PATH()
run-command: conditionally define locate_in_PATH()
...
Windows updates.
* ds/maintenance-on-windows-fix:
git maintenance: avoid console window in scheduled tasks on Windows
win32: add a helper to run `git.exe` without a foreground window
Adjust to newer Term::ReadLine to prevent it from breaking
the interactive prompt code in send-email.
* jk/send-email-with-new-readline:
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
Developer support to detect meaningless combination of options.
* rs/parse-opt-forbid-set-int-0-without-noneg:
parse-options: disallow negating OPTION_SET_INT 0
This fixes an occasional hang I see when running t4053 with
--verbose-log using dash.
Commit 1e3f26542a (diff --no-index: support reading from named pipes,
2023-07-05) added a test that "diff --no-index" will complain when
comparing a named pipe and a directory. The minimum we need to test this
is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
But the test does one thing more: it spawns a background shell process
that opens the pipe for writing, like this:
{
(>pipe) &
} &&
This extra writer _could_ be useful if Git misbehaves and tries to open
the pipe for reading. Without the writer, Git would block indefinitely
and the test would never end. But since we do not have such a bug, Git
does not open the pipe and it is the writing process which will block
indefinitely, since there are no readers. The test addresses this by
running "kill $!" in a test_when_finished block. Since the writer should
be blocking forever, this kill command will reliably find it waiting.
However, this seems to be somewhat racy, in that the writing process
sometimes hangs around even after the "kill". In a normal run of the
test script without options, this doesn't have any effect; the
main test script completes anyway. But with --verbose-log, we spawn a
"tee" process that reads the script output, and it won't end until all
descriptors pointing to its input pipe are closed. And the background
process that is hanging around still has its stderr, etc, pointed into
that pipe.
You can reproduce the situation like this:
cd t
./t4053-diff-no-index.sh --verbose-log --stress
Let that run for a few minutes, and then you'll find that some of the
runs have hung. For example, at 11:53, I ran:
$ ps xk start o pid,start,command | grep tee | head
713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
All of these have been hung for several minutes. We can investigate one
and see that it's waiting to get EOF on its input:
$ strace -p 713459
strace: Process 713459 attached
read(0,
^C
Who else has that descriptor open?
$ lsof -a -p 713459 -d 0 +E
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tee 713459 peff 0r FIFO 0,13 0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 5w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 7w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
sh 719203 peff 12w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
sh 719203 peff 13w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
It's a shell, presumably a subshell spawned by the main script. Though
it may seem odd, having the same descriptor open several times is not
unreasonable (they're all basically the original stdout/stderr of the
script that has been copied). And they should all close when the process
exits. So what's it doing? Curiously, it will exit as soon as we strace
it:
$ strace -s 64 -p 719203
strace: Process 719203 attached
openat(AT_FDCWD, "pipe", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
write(2, "./t4053-diff-no-index.sh: 7: eval: ", 35) = 35
write(2, "cannot create pipe: Directory nonexistent", 41) = 41
write(2, "\n", 1) = 1
exit_group(2) = ?
+++ exited with 2 +++
I think what happens is this:
- it is blocking in the openat() call for the pipe, as we expect (so
this is definitely the backgrounded subshell mentioned above)
- strace sends signals (probably STOP/CONT); those cause the kernel to
stop blocking, but libc will restart the system call automatically
- by this time, the "pipe" fifo is gone, so we'll actually try to
create a regular file. But of course the surrounding directory is
gone, too! So we get ENOENT, and then exit as normal.
So the blocking is something we expect to happen. But what we didn't
expect is for the process to still exist at all! It should have been
killed earlier when the parent process called "kill", but it wasn't. And
we can't catch the race at this point, because it happened much earlier.
One can guess, though, that there is some race with the shell setting up
the signal handling in the backgrounded subshell, and possibly blocking
or ignoring signals at the time that the "kill" is received. Curiously,
the race does not seem to happen if I use "bash" instead of "dash", so
presumably bash's setup here is more atomic.
One fix might be to try killing the subshell more aggressively, either
using SIGKILL, or looping on kill/wait. But that seems complex and
likely to introduce new problems/races. Instead, we can observe that the
writer is not needed at all. Git will notice the pipe via stat() before
it is ever opened. So we can simply drop the writer subshell entirely.
If we ever changed Git to open the path and fstat() it, this would
result in the test hanging. But we're not likely to do that. After all,
we have to stat() paths to see if they are openable at all (e.g., it
could be a directory), so this seems like a low risk. And anybody who
does make such a change will immediately see the issue, as Git would
hang consistently.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test 'diff --no-index reads from pipes' starts a couple of
background processes that write to the pipes that are passed to "diff
--no-index". If the test passes then we expect these processes to exit
as all their output will have been read. However if the test fails
then we want to make sure they do not hang about on the users machine
and the test remembers they should be killed by calling
test_when_finished "! kill $!"
after each background process is created. Unfortunately there is a
race where test_when_finished may run before the background process
exits even when all its output has been read resulting in the kill
command succeeding which causes the test to fail. Fix this by ignoring
the exit status of the kill command. If the diff is successful we
could instead wait for the background process to exit and check their
status but that feels like it is testing the platform's printf
implementation rather than git's code.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rebase -i" with a series of squash/fixup, when one of the
steps stopped in conflicts and ended up getting skipped, did not
handle the accumulated commit log messages, which has been
corrected.
* pw/rebase-skip-commit-message-fix:
rebase --skip: fix commit message clean up when skipping squash
"git bisect visualize" stopped running "gitk" on Git for Windows
when the command was reimplemented in C around Git 2.34 timeframe.
This has been corrected.
* ma/locate-in-path-for-windows:
docs: update when `git bisect visualize` uses `gitk`
compat/mingw: implement a native locate_in_PATH()
run-command: conditionally define locate_in_PATH()