The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits. To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.
This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't deemed
worth dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to fix
this.
Given that this is just around the corner (geologically
speaking, anyway), and because it's easy to fix, let's just
make it work. Unlike the previous fix for "size", where we
had to write an individual extended header for each file, we
can write one global header (since we have only one mtime
for the whole archive).
There's a slight bit of trickiness there. We may already be
writing a global header with a "comment" field for the
commit sha1. So we need to write our new field into the same
header. To do this, we push the decision of whether to write
such a header down into write_global_extended_header(),
which will now assemble the header as it sees fit, and will
return early if we have nothing to write (in practice, we'll
only have a large mtime if it comes from a commit, but this
makes it also work if you set your system clock ahead such
that time() returns a huge value).
Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.
After writing the extended header, we munge the timestamp in
the ustar headers to the maximum-allowable size. This is
wrong, but it's the least-wrong thing we can provide to a
tar implementation that doesn't understand pax headers (it's
also what GNU tar does).
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.
These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).
This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.
Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.
If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.
But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.
This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:
1. You have a file between 8GB and 64GB.
2. Your tar implementation _doesn't_ know about pax
extended headers.
3. Your tar implementation _does_ parse 12-byte sizes from
the ustar header without a delimiter.
It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ustar format only has room for 11 (or 12, depending on
some implementations) octal digits for the size and mtime of
each file. For values larger than this, we have to add pax
extended headers to specify the real data, and git does not
yet know how to do so.
Before fixing that, let's start off with some test
infrastructure, as designing portable and efficient tests
for this is non-trivial.
We want to use the system tar to check our output (because
what we really care about is interoperability), but we can't
rely on it:
1. being able to read pax headers
2. being able to handle huge sizes or mtimes
3. supporting a "t" format we can parse
So as a prerequisite, we can feed the system tar a reference
tarball to make sure it can handle these features. The
reference tar here was created with:
dd if=/dev/zero seek=64G bs=1 count=1 of=huge
touch -d @68719476737 huge
tar cf - --format=pax |
head -c 2048
using GNU tar. Note that this is not a complete tarfile, but
it's enough to contain the headers we want to examine.
Likewise, we need to convince git that it has a 64GB blob to
output. Running "git add" on that 64GB file takes many
minutes of CPU, and even compressed, the result is 64MB. So
again, I pre-generated that loose object, and then took only
the first 2k of it. That should be enough to generate 2MB of
data before hitting an inflate error, which is plenty for us
to generate the tar header (and then die of SIGPIPE while
streaming the rest out).
The tests are split so that we test as much as we can even
with an uncooperative system tar. This actually catches the
current breakage (which is that we die("BUG") trying to
write the ustar header) on every system, and then on systems
where we can, we go farther and actually verify the result.
Helped-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is sometimes useful to be able to read exactly N bytes from a
pipe. Doing this portably turns out to be surprisingly difficult
in shell scripts.
We want a solution that:
- is portable
- never reads more than N bytes due to buffering (which
would mean those bytes are not available to the next
program to read from the same pipe)
- handles partial reads by looping until N bytes are read
(or we see EOF)
- is resilient to stray signals giving us EINTR while
trying to read (even though we don't send them, things
like SIGWINCH could cause apparently-random failures)
Some possible solutions are:
- "head -c" is not portable, and implementations may
buffer (though GNU head does not)
- "read -N" is a bash-ism, and thus not portable
- "dd bs=$n count=1" does not handle partial reads. GNU dd
has iflags=fullblock, but that is not portable
- "dd bs=1 count=$n" fixes the partial read problem (all
reads are 1-byte, so there can be no partial response).
It does make a lot of write() calls, but for our tests
that's unlikely to matter. It's fairly portable. We
already use it in our tests, and it's unlikely that
implementations would screw up any of our criteria. The
most unknown one would be signal handling.
- perl can do a sysread() loop pretty easily. On my Linux
system, at least, it seems to restart the read() call
automatically. If that turns out not to be portable,
though, it would be easy for us to handle it.
That makes the perl solution the least bad (because we
conveniently omitted "length of code" as a criterion).
It's also what t9300 is currently using, so we can just pull
the implementation from there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The one in 'master' has a brown-paper-bag bug that breaks the perf
test when used inside a usual Git repository with a working tree.
* js/perf-rebase-i:
perf: make the tests work without a worktree
When we run scripted Porcelains, "git" potty has set up the $PATH by
prepending $GIT_EXEC_PATH, the path given by "git --exec-path=$there
$cmd", etc. already. Because of this, scripted Porcelains can
dot-source shell script library like git-sh-setup with simple dot
without specifying any path.
t2300 however dot-sources git-sh-setup without adjusting $PATH like
the real "git" potty does. This has not been a problem so far, but
once git-sh-setup wants to rely on the $PATH adjustment, just like
any scripted Porcelains already do, it would become one. It cannot
for example dot-source another shell library without specifying the
full path to it by prefixing $(git --exec-path).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t5500::check_prot_host_port_path(), diagport is not a variable
used elsewhere and the function is not recursively called so this
can simply lose the "local", which may not be supported by shell
(besides, the function liberally clobbers other variables without
making them "local").
t7403::reset_submodule_urls() overrides the "root" variable used
in the test framework for no good reason; its use is not about
temporarily relocating where the test repositories are created.
This assignment can be made not to clobber the variable by moving
them into the subshells it already uses. Its value is always
$TRASH_DIRECTORY, so we could use it instead there, and this
function that is called only once and its two subshells may not be
necessary (instead, the caller can use "git -C $there config" and
set a value that is derived from $TRASH_DIRECTORY), but this is a
minimum fix that is needed to lose "local".
Helped-by: John Keeping <john@keeping.me.uk>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The one-shot environment variable syntax:
FOO=BAR some-program
is unportable when some-program is actually a shell
function, like test_must_fail (on some shells FOO remains
set after the function returns, and on others it does not).
We sometimes get around this by using env, like:
test_must_fail env FOO=BAR some-program
But that only works because test_must_fail's arguments are
themselves a command which can be run. You can't run:
env FOO=BAR test_must_fail some-program
because env does not know about our shell functions. So
there is no equivalent for test_commit, for example, and one
must resort to:
(
FOO=BAR
export FOO
test_commit
)
which is a bit verbose. Let's add a version of "env" that
works _inside_ the shell, by creating a subshell, exporting
variables from its argument list, and running the command.
Its use is demonstrated on a currently-unportable case in
t4014.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct faulty recommendation to use "git submodule deinit ." when
de-initialising all submodules, which would result in a strange
error message in a pathological corner case.
* sb/submodule-deinit-all:
submodule deinit: require '--all' instead of '.' for all submodules
Running tests with '-x' option to trace the individual command
executions is a useful way to debug test scripts, but some tests
that capture the standard error stream and check what the command
said can be broken with the trace output mixed in. When running
our tests under "bash", however, we can redirect the trace output
to another file descriptor to keep the standard error of programs
being tested intact.
* jk/test-send-sh-x-trace-elsewhere:
test-lib: set BASH_XTRACEFD automatically
"git describe --contains" often made a hard-to-justify choice of
tag to give name to a given commit, because it tried to come up
with a name with smallest number of hops from a tag, causing an old
commit whose close descendant that is recently tagged were not
described with respect to an old tag but with a newer tag. It did
not help that its computation of "hop" count was further tweaked to
penalize being on a side branch of a merge. The logic has been
updated to favor using the tag with the oldest tagger date, which
is a lot easier to explain to the end users: "We describe a commit
in terms of the (chronologically) oldest tag that contains the
commit."
* js/name-rev-use-oldest-ref:
name-rev: include taggerdate in considering the best name
In regular repositories $source_git and $objects_dir contain relative
paths based on $source. Go there to allow cp to resolve them.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test updates to make it more readable and maintainable.
* es/t1500-modernize:
t1500: avoid setting environment variables outside of tests
t1500: avoid setting configuration options outside of tests
t1500: avoid changing working directory outside of tests
t1500: test_rev_parse: facilitate future test enhancements
t1500: be considerate to future potential tests
"git fast-import --export-marks" would overwrite the existing marks
file even when it makes a dump from its custom die routine.
Prevent it from doing so when we have an import-marks file but
haven't finished reading it.
* fc/fast-import-broken-marks-file:
fast-import: do not truncate exported marks file
"git fsck" learned to catch NUL byte in a commit object as
potential error and warn.
* jc/fsck-nul-in-commit:
fsck: detect and warn a commit with embedded NUL
fsck_commit_buffer(): do not special case the last validation
On Windows, .git and optionally any files whose name starts with a
dot are now marked as hidden, with a core.hideDotFiles knob to
customize this behaviour.
* js/windows-dotgit:
mingw: remove unnecessary definition
mingw: introduce the 'core.hideDotFiles' setting
Mark several messages for translation.
* va/i18n-misc-updates:
i18n: unpack-trees: avoid substituting only a verb in sentences
i18n: builtin/pull.c: split strings marked for translation
i18n: builtin/pull.c: mark placeholders for translation
i18n: git-parse-remote.sh: mark strings for translation
i18n: branch: move comment for translators
i18n: branch: unmark string for translation
i18n: builtin/rm.c: remove a comma ',' from string
i18n: unpack-trees: mark strings for translation
i18n: builtin/branch.c: mark option for translation
i18n: index-pack: use plural string instead of normal one
test_patch_id_file_order shell function uses $name variable to hold
one filename, and calls another shell function calc_patch_id as a
downstream of one pipeline. The called function, however, also uses
the same $name variable. With a shell implementation that runs the
callee in the current shell environment, the caller's $name would
be clobbered by the callee's use of the same variable.
This hasn't been an issue with dash and bash. ksh93 reveals the
breakage in the test script.
Fix it by using a distinct variable name in the callee.
Reported-by: Armin Kunaschik <megabreit@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t9xxx series has been updated primarily for readability, while
fixing small bugs in it. A few scripted Porcelains have also been
updated to fix possible bugs around their use of "test -z" and
"test -n".
* jk/test-z-n-unquoted:
always quote shell arguments to test -z/-n
t9103: modernize test style
t9107: switch inverted single/double quotes in test
t9107: use "return 1" instead of "exit 1"
t9100,t3419: enclose all test code in single-quotes
t/lib-git-svn: drop $remote_git_svn and $git_svn_id
Many commands normalize command line arguments from NFD to NFC
variant of UTF-8 on OSX, but commands in the "diff" family did
not, causing "git diff $path" to complain that no such path is
known to Git. They have been taught to do the normalization.
* ar/diff-args-osx-precompose:
diff: run arguments through precompose_argv
Add perf test for "rebase -i"
* js/perf-rebase-i:
perf: run "rebase -i" under perf
perf: make the tests work in worktrees
perf: let's disable symlinks when they are not available
t0040 had too many unnecessary repetitions in its test data. Teach
test-parse-options program so that a caller can tell what it
expects in its output, so that these repetitions can be cleaned up.
* jc/test-parse-options-expect:
t0040: convert a few tests to use test-parse-options --expect
t0040: remove unused test helpers
test-parse-options: --expect=<string> option to simplify tests
test-parse-options: fix output when callback option fails
"git commit" learned to pay attention to "commit.verbose"
configuration variable and act as if "--verbose" option was
given from the command line.
* pb/commit-verbose-config:
commit: add a commit.verbose config variable
t7507-commit-verbose: improve test coverage by testing number of diffs
parse-options.c: make OPTION_COUNTUP respect "unspecified" values
t/t7507: improve test coverage
t0040-parse-options: improve test coverage
test-parse-options: print quiet as integer
t0040-test-parse-options.sh: fix style issues
"git format-patch" learned a new "--base" option to record what
(public, well-known) commit the original series was built on in
its output.
* xy/format-patch-base:
format-patch: introduce format.useAutoBase configuration
format-patch: introduce --base=auto option
format-patch: add '--base' option to record base tree info
patch-ids: make commit_patch_id() a public helper function
A couple of bugs around core.autocrlf have been fixed.
* tb/core-eol-fix:
convert.c: ident + core.autocrlf didn't work
t0027: test cases for combined attributes
convert: allow core.autocrlf=input and core.eol=crlf
t0027: make commit_chk_wrnNNO() reliable
The experimental "multiple worktree" feature gains more safety to
forbid operations on a branch that is checked out or being actively
worked on elsewhere, by noticing that e.g. it is being rebased.
* nd/worktree-various-heads:
branch: do not rename a branch under bisect or rebase
worktree.c: check whether branch is bisected in another worktree
wt-status.c: split bisect detection out of wt_status_get_state()
worktree.c: check whether branch is rebased in another worktree
worktree.c: avoid referencing to worktrees[i] multiple times
wt-status.c: make wt_status_check_rebase() work on any worktree
wt-status.c: split rebase detection out of wt_status_get_state()
path.c: refactor and add worktree_git_path()
worktree.c: mark current worktree
worktree.c: make find_shared_symref() return struct worktree *
worktree.c: store "id" instead of "git_dir"
path.c: add git_common_path() and strbuf_git_common_path()
dir.c: rename str(n)cmp_icase to fspath(n)cmp
"git commit --dry-run" reported "No, no, you cannot commit." in one
case where "git commit" would have allowed you to commit, and this
improves it a little bit ("git commit --dry-run --short" still does
not give you the correct answer, for example). This is a stop-gap
measure in that "commit --short --dry-run" still gives an incorrect
result.
* ss/commit-dry-run-resolve-merge-to-no-op:
wt-status.c: set commitable bit if there is a meaningful merge.
In t0008, we have
cat <<-EOF
...
a/b/.gitignore:8:!on* "a/b/one\"three"
...
EOF
and expect that the backslash-dq is passed through literally.
ksh88 eats the backslash and produces a wrong expect file to
compare the actual output with.
Using \\" works this around without breaking other POSIX shells
(which collapse backslash-backslash to a single backslash), and
ksh88 does so, too.
It makes it easier to read, too, because the reason why we are
writing backslash there is *not* because we think dq is special and
want to quote it (if that were the case we would have two more
backslashes on that line). It is simply because we want a single
literal backslash there. Since backslash is treated specially in
unquoted here-document, explicitly doubling it to quote it expresses
our intent better than relying on the character that immediately
comes after it (i.e. '"') not being a special character.
Signed-off-by: Armin Kunaschik <megabreit@googlemail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent update to Git LFS broke "git p4" by changing the output from
its "lfs pointer" subcommand.
* ls/p4-lfs:
git-p4: fix Git LFS pointer parsing
travis-ci: express Linux/OS X dependency versions more clearly
travis-ci: update Git-LFS and P4 to the latest version
Some Windows SDK lacks pthread_sigmask() implementation and fails
to compile the recently updated "git push" codepath that uses it.
* jk/push-client-deadlock-fix:
Windows: only add a no-op pthread_sigmask() when needed
Windows: add pthread_sigmask() that does nothing
t5504: drop sigpipe=ok from push tests
fetch-pack: isolate sigpipe in demuxer thread
send-pack: isolate sigpipe in demuxer thread
run-command: teach async threads to ignore SIGPIPE
send-pack: close demux pipe before finishing async process
"git mv old new" did not adjust the path for a submodule that lives
as a subdirectory inside old/ directory correctly.
* sb/mv-submodule-fix:
mv: allow moving nested submodules
The test scripts for "git p4" (but not "git p4" implementation
itself) has been updated so that they would work even on a system
where the installed version of Python is python 3.
* ld/p4-test-py3:
git-p4 tests: time_in_seconds should use $PYTHON_PATH
git-p4 tests: work with python3 as well as python2
git-p4 tests: cd to / before running python
Ideally, each test should be responsible for setting up state it needs
rather than relying upon transient global state. Toward this end, teach
test_rev_parse() to accept a "-g <dir>" option to allow callers to
specify the value of the GIT_DIR environment variable explicitly. Take
advantage of this new option to avoid polluting the global scope with
GIT_DIR assignments.
Implementation note: Typically, tests avoid polluting the global state
by wrapping transient environment variable assignments within a
subshell, however, this technique doesn't work here since test_config()
and test_unconfig() need to know GIT_DIR, as well, but neither function
can be used within a subshell. Consequently, GIT_DIR is instead cleared
manually via test_when_finished().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally, each test should be responsible for setting up state it needs
rather than relying upon transient global state. Toward this end, teach
test_rev_parse() to accept a "-b <value>" option to allow callers to set
"core.bare" explicitly or undefine it. Take advantage of this new option
to avoid setting "core.bare" outside of tests.
Under the hood, "-b <value>" invokes "test_config -C <dir>" (or
"test_unconfig -C <dir>"), thus git-config knows explicitly where to
find its configuration file. Consequently, the global GIT_CONFIG
environment variable required by the manual git-config invocations
outside of tests is no longer needed, and is thus dropped.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally, each test should be responsible for setting up state it needs
rather than relying upon transient global state. Toward this end, teach
test_rev_parse() to accept a "-C <dir>" option to allow callers to
instruct it explicitly in which directory its tests should be run. Take
advantage of this new option to avoid changing the working directory
outside of tests.
Implementation note: test_rev_parse() passes "-C <dir>" along to
git-rev-parse with <dir> properly quoted. The natural and POSIX way to
do so is via ${dir:+-C "$dir"}, however, with some older broken shells,
this expression evaluates incorrectly to a single argument ("-C <dir>")
rather than the expected two (-C and "<dir>"). Work around this problem
with the slightly ungainly expression: ${dir:+-C} ${dir:+"$dir"}
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>