When pack-objects is given --include-tag, it peels each tag
ref down to a non-tag object, and if that non-tag object is
going to be packed, we include the tag, too. But what
happens if we have a chain of tags (e.g., tag "A" points to
tag "B", which points to commit "C")?
We'll peel down to "C" and realize that we want to include
tag "A", but we do not ever consider tag "B", leading to a
broken pack (assuming "B" was not otherwise selected).
Instead, we have to walk the whole chain, adding any tags we
find to the pack.
Interestingly, it doesn't seem possible to trigger this
problem with "git fetch", but you can with "git clone
--single-branch". The reason is that we generate the correct
pack when the client explicitly asks for "A" (because we do
a real reachability analysis there), and "fetch" is more
willing to do so. There are basically two cases:
1. If "C" is already a ref tip, then the client can deduce
that it needs "A" itself (via find_non_local_tags), and
will ask for it explicitly rather than relying on the
include-tag capability. Everything works.
2. If "C" is not already a ref tip, then we hope for
include-tag to send us the correct tag. But it doesn't;
it generates a broken pack. However, the next step is
to do a follow-up run of find_non_local_tags(),
followed by fetch_refs() to backfill any tags we
learned about.
In the normal case, fetch_refs() calls quickfetch(),
which does a connectivity check and sees we have no
new objects to fetch. We just write the refs.
But for the broken-pack case, the connectivity check
fails, and quickfetch will follow-up with the remote,
asking explicitly for each of the ref tips. This picks
up the missing object in a new pack.
For a regular "git clone", we are similarly OK, because we
explicitly request all of the tag refs, and get a correct
pack. But with "--single-branch", we kick in tag
auto-following via "include-tag", but do _not_ do a
follow-up backfill. We just take whatever the server sent us
via include-tag and write out tag refs for any tag objects
we were sent. So prior to c6807a4 (clone: open a shortcut
for connectivity check, 2013-05-26), we actually claimed the
clone was a success, but the result was silently
corrupted! Since c6807a4, index-pack's connectivity
check catches this case, and we correctly complain.
The included test directly checks that pack-objects does not
generate a broken pack, but also confirms that "clone
--single-branch" does not hit the bug.
Note that tag chains introduce another interesting question:
if we are packing the tag "B" but not the commit "C", should
"A" be included?
Both before and after this patch, we do not include "A",
because the initial peel_ref() check only knows about the
bottom-most level, "C". To realize that "B" is involved at
all, we would have to switch to an incremental peel, in
which we examine each tagged object, asking if it is being
packed (and including the outer tag if so).
But that runs contrary to the optimizations in peel_ref(),
which avoid accessing the objects at all, in favor of using
the value we pull from packed-refs. It's OK to walk the
whole chain once we know we're going to include the tag (we
have to access it anyway, so the effort is proportional to
the pack we're generating). But for the initial selection,
we have to look at every ref. If we're only packing a few
objects, we'd still have to parse every single referenced
tag object just to confirm that it isn't part of a tag
chain.
This could be addressed if packed-refs stored the complete
tag chain for each peeled ref (in most cases, this would be
the same cost as now, as each "chain" is only a single
link). But given the size of that project, it's out of scope
for this fix (and probably nobody cares enough anyway, as
it's such an obscure situation). This commit limits itself
to just avoiding the creation of a broken pack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generate a series of packfiles test-1-$pack,
test-2-$pack, with different properties and then examine
them. However we always store the packname generated by
pack-objects in the variable packname_1. This probably was
meant to be packname_2 in the second test, but it turns out
that it doesn't matter: once we are done with the first
pack, we can just keep using the same $packname variable.
So let's drop the confusing "_1" parameter. At the same
time, let's give test-1 and test-2 more descriptive names,
which can help keep them straight (note that we _could_
likewise overwrite the packfiles in each test, but by using
separate filenames, we are sure that test 2 does not
accidentally use the packfile from test 1).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test unpacks objects into a separate repository, and
accesses it by setting GIT_DIR in a subshell. We can do the
same thing these days by using "git init <repo>" and "git
-C". In most cases this is shorter, though when there are
multiple commands, we may end up repeating the "-C".
However, this repetition can actually be a good thing. This
patch also fixes a bug introduced by 512477b (tests: use
"env" to run commands with temporary env-var settings,
2014-03-18). That commit essentially converted:
(GIT_DIR=...; export GIT_DIR
cmd1 &&
cmd2)
into:
(GIT_DIR=... cmd1 &&
cmd2)
which obviously loses the GIT_DIR setting for cmd2 (we never
noticed the bug because it simply runs "cmd2" in the parent
repo, which means we were simply failing to test anything
interesting). By using "git -C" rather than a subshell, it
becomes quite obvious where each command is supposed to be
running.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each test we do a dry-run of unpack-objects, followed by
a real run, followed by confirming that it contained the
objects we expected. The dry-run is telling us nothing, as
any errors it encounters would be found in the real run.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We usually try to avoid doing any significant actions
outside of test blocks. Although "rm -rf" is unlikely to
either fail or to generate output, moving these to the
point of use makes it more clear that they are part of the
overall setup of "clone.git".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git difftool <paths>..." started in a subdirectory failed to
interpret the paths relative to that directory, which has been
fixed.
* jk/difftool-in-subdir:
difftool: use Git::* functions instead of passing around state
difftool: avoid $GIT_DIR and $GIT_WORK_TREE
difftool: fix argument handling in subdirs
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.
* jk/reset-ident-time-per-commit:
am: reset cached ident date for each patch
A minor documentation update.
This was split out from a stalled jh/clean-smudge-annex topic
before discarding it.
* jh/clean-smudge-f-doc:
clarify %f documentation
The API documentation for hashmap was unclear if hashmap_entry
can be safely discarded without any other consideration. State
that it is safe to do so.
* jc/hashmap-doc-init:
hashmap: clarify that hashmap_entry can safely be discarded
FreeBSD can lie when asked mtime of a directory, which made the
untracked cache code to fall back to a slow-path, which in turn
caused tests in t7063 to fail because it wanted to verify the
behaviour of the fast-path.
* nd/fbsd-lazy-mtime:
t7063: work around FreeBSD's lazy mtime update feature
The characters in the label shown for tags/refs for commits in
"gitweb" output are now properly escaped for proper HTML output.
* ab/gitweb-link-html-escape:
gitweb: escape link body in format_ref_marker
Windows port was failing some tests in t4130, due to the lack of
inum in the returned values by its lstat(2) emulation.
* js/t4130-rename-without-ino:
t4130: work around Windows limitation
"git -c grep.patternType=extended log --basic-regexp" misbehaved
because the internal API to access the grep machinery was not
designed well.
* jc/grep-commandline-vs-configuration:
grep: further simplify setting the pattern type
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
The build procedure for "git persistent-https" helper (in contrib/)
has been updated so that it can be built with more recent versions
of Go.
* pm/build-persistent-https-with-recent-go:
contrib/persistent-https: use Git version for build label
contrib/persistent-https: update ldflags syntax for Go 1.7+
"git merge" in Git v2.9 was taught to forbid merging an unrelated
lines of history by default, but that is exactly the kind of thing
the "--rejoin" mode of "git subtree" (in contrib/) wants to do.
"git subtree" has been taught to use the "--allow-unrelated-histories"
option to override the default.
* da/subtree-2.9-regression:
subtree: fix "git subtree split --rejoin"
t7900-subtree.sh: fix quoting and broken && chains
"git commit --help" said "--no-verify" is only about skipping the
pre-commit hook, and failed to say that it also skipped the
commit-msg hook.
* os/no-verify-skips-commit-msg-too:
commit: describe that --no-verify skips the commit-msg hook in the help text
The use of strbuf in "git rm" to build filename to remove was a bit
suboptimal, which has been fixed.
* rs/rm-strbuf-optim:
rm: reuse strbuf for all remove_dir_recursively() calls
Users of the parse_options_concat() API function need to allocate
extra slots in advance and fill them with OPT_END() when they want
to decide the set of supported options dynamically, which makes the
code error-prone and hard to read. This has been corrected by tweaking
the API to allocate and return a new copy of "struct option" array.
* jk/parse-options-concat:
parse_options: allocate a new array when concatenating
Existing autoconf generated test for the need to link with pthread
library did not check all the functions from pthread libraries;
recent FreeBSD has some functions in libc but not others, and we
mistakenly thought linking with libc is enough when it is not.
* ew/autoconf-pthread:
configure.ac: stronger test for pthread linkage
The .c/.h sources are marked as such in our .gitattributes file so
that "git diff -W" and friends would work better.
* rs/help-c-source-with-gitattributes:
.gitattributes: set file type for C files
"git status" learned to suggest "merge --abort" during a conflicted
merge, just like it already suggests "rebase --abort" during a
conflicted rebase.
* mm/status-suggest-merge-abort:
status: suggest 'git merge --abort' when appropriate
Code cleanup.
* rs/submodule-config-code-cleanup:
submodule-config: fix test binary crashing when no arguments given
submodule-config: combine early return code into one goto
submodule-config: passing name reference for .gitmodule blobs
submodule-config: use explicit empty string instead of strbuf in config_from()
Build clean-up.
* nd/test-helpers:
t/test-lib.sh: fix running tests with --valgrind
Makefile: use VCSSVN_LIB to refer to svn library
Makefile: drop extra dependencies for test helpers
Code cleanup.
* rs/use-strbuf-addbuf:
strbuf: avoid calling strbuf_grow() twice in strbuf_addbuf()
use strbuf_addbuf() for appending a strbuf to another
A small internal API cleanup.
* lf/sideband-returns-void:
upload-pack.c: make send_client_data() return void
sideband.c: make send_sideband() return void
Recent FreeBSD stopped making perl available at /usr/bin/perl;
switch the default the built-in path to /usr/local/bin/perl on not
too ancient FreeBSD releases.
* ew/find-perl-on-freebsd-in-local:
config.mak.uname: correct perl path on FreeBSD
Recent update to "git daemon" tries to enable the socket-level
KEEPALIVE, but when it is spawned via inetd, the standard input
file descriptor may not necessarily be connected to a socket.
Suppress an ENOTSOCK error from setsockopt().
* ew/daemon-socket-keepalive:
Windows: add missing definition of ENOTSOCK
daemon: ignore ENOTSOCK from setsockopt
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
"git notes merge" had a code to see if a path exists (and fails if
it does) and then open the path for writing (when it doesn't).
Replace it with open with O_EXCL.
* rs/notes-merge-no-toctou:
notes-merge: use O_EXCL to avoid overwriting existing files
An age old bug that caused "git diff --ignore-space-at-eol"
misbehave has been fixed.
* js/ignore-space-at-eol:
diff: fix a double off-by-one with --ignore-space-at-eol
diff: demonstrate a bug with --patience and --ignore-space-at-eol
"git fetch http://user:pass@host/repo..." scrubbed the userinfo
part, but "git push" didn't.
* jk/push-scrub-url:
t5541: fix url scrubbing test when GPG is not set
push: anonymize URL in status output
"git add -N dir/file && git write-tree" produced an incorrect tree
when there are other paths in the same directory that sorts after
"file".
* nd/cache-tree-ita:
cache-tree: do not generate empty trees as a result of all i-t-a subentries
cache-tree.c: fix i-t-a entry skipping directory updates sometimes
test-lib.sh: introduce and use $EMPTY_BLOB
test-lib.sh: introduce and use $EMPTY_TREE
"git blame file" allowed the lineage of lines in the uncommitted,
unadded contents of "file" to be inspected, but it refused when
"file" did not appear in the current commit. When "file" was
created by renaming an existing file (but the change has not been
committed), this restriction was unnecessarily tight.
* mh/blame-worktree:
t/t8003-blame-corner-cases.sh: Use here documents
blame: allow to blame paths freshly added to the index