The Bloom filter used for path limited history traversal was broken
on systems whose "char" is unsigned; update the implementation and
bump the format version to 2.
* tb/path-filter-fix:
bloom: introduce `deinit_bloom_filters()`
commit-graph: reuse existing Bloom filters where possible
object.h: fix mis-aligned flag bits table
commit-graph: new Bloom filter version that fixes murmur3
commit-graph: unconditionally load Bloom filters
bloom: prepare to discard incompatible Bloom filters
bloom: annotate filters with hash version
repo-settings: introduce commitgraph.changedPathsVersion
t4216: test changed path filters with high bit paths
t/helper/test-read-graph: implement `bloom-filters` mode
bloom.h: make `load_bloom_filter_from_graph()` public
t/helper/test-read-graph.c: extract `dump_graph_info()`
gitformat-commit-graph: describe version 2 of BDAT
commit-graph: ensure Bloom filters are read with consistent settings
revision.c: consult Bloom filters for root commits
t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
"git archive --add-virtual-file=<path>:<contents>" never paid
attention to the --prefix=<prefix> option but the documentation
said it would. The documentation has been corrected.
* jc/archive-prefix-with-add-virtual-file:
archive: document that --add-virtual-file takes full path
Memory ownership rules for the in-core representation of
remote.*.url configuration values have been straightened out, which
resulted in a few leak fixes and code clarification.
* jk/remote-wo-url:
remote: drop checks for zero-url case
remote: always require at least one url in a remote
t5801: test remote.*.vcs config
t5801: make remote-testgit GIT_DIR setup more robust
remote: allow resetting url list
config: document remote.*.url/pushurl interaction
remote: simplify url/pushurl selection
remote: use strvecs to store remote url/pushurl
remote: transfer ownership of memory in add_url(), etc
remote: refactor alias_url() memory ownership
archive: fix check for missing url
Earlier we stopped using the tree of HEAD as the default source of
attributes in a bare repository, but failed to document it. This
has been corrected.
* jc/no-default-attr-tree-in-bare:
attr.tree: HEAD:.gitattributes is no longer the default in a bare repo
When the user adds to "git rebase -i" instruction to "pick" a merge
commit, the error experience is not pleasant. Such an error is now
caught earlier in the process that parses the todo list.
* pw/rebase-i-error-message:
rebase -i: improve error message when picking merge
rebase -i: pass struct replay_opts to parse_insn_line()
The safe.directory configuration knob has been updated to
optionally allow leading path matches.
* jc/safe-directory-leading-path:
safe.directory: allow "lead/ing/path/*" match
The documentation for "git diff --name-only" has been clarified
that it is about showing the names in the post-image tree.
* jc/doc-diff-name-only:
diff: document what --name-only shows
A scheduled "git maintenance" job is expected to work on all
repositories it knows about, but it stopped at the first one that
errored out. Now it keeps going.
* js/for-each-repo-keep-going:
maintenance: running maintenance should not stop on errors
for-each-repo: optionally keep going on an error
Tom Scogland noticed that `--add-virtual-file` option uses the path
specified as its value as-is, without prepending any value given to
the `--prefix` option like `--add-file` does.
The behaviour has always been that way since the option was
introduced, but the documentation has always been wrong and said
that it would use the value of `--prefix` just like `--add-file`
does.
We could modify the behaviour to make it literally work like the
documentation said, but it would break existing scripts the users
use.
Noticed-by: Tom Scogland <scogland1@llnl.gov>
Acked-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The murmur3 implementation in bloom.c has a bug when converting series
of 4 bytes into network-order integers when char is signed (which is
controllable by a compiler option, and the default signedness of char is
platform-specific). When a string contains characters with the high bit
set, this bug causes results that, although internally consistent within
Git, does not accord with other implementations of murmur3 (thus,
the changed path filters wouldn't be readable by other off-the-shelf
implementatios of murmur3) and even with Git binaries that were compiled
with different signedness of char. This bug affects both how Git writes
changed path filters to disk and how Git interprets changed path filters
on disk.
Therefore, introduce a new version (2) of changed path filters that
corrects this problem. The existing version (1) is still supported and
is still the default, but users should migrate away from it as soon
as possible.
Because this bug only manifests with characters that have the high bit
set, it may be possible that some (or all) commits in a given repo would
have the same changed path filter both before and after this fix is
applied. However, in order to determine whether this is the case, the
changed paths would first have to be computed, at which point it is not
much more expensive to just compute a new changed path filter.
So this patch does not include any mechanism to "salvage" changed path
filters from repositories. There is also no "mixed" mode - for each
invocation of Git, reading and writing changed path filters are done
with the same version number; this version number may be explicitly
stated (typically if the user knows which version they need) or
automatically determined from the version of the existing changed path
filters in the repository.
There is a change in write_commit_graph(). graph_read_bloom_data()
makes it possible for chunk_bloom_data to be non-NULL but
bloom_filter_settings to be NULL, which causes a segfault later on. I
produced such a segfault while developing this patch, but couldn't find
a way to reproduce it neither after this complete patch (or before),
but in any case it seemed like a good thing to include that might help
future patch authors.
The value in t0095 was obtained from another murmur3 implementation
using the following Go source code:
package main
import "fmt"
import "github.com/spaolacci/murmur3"
func main() {
fmt.Printf("%x\n", murmur3.Sum32([]byte("Hello world!")))
fmt.Printf("%x\n", murmur3.Sum32([]byte{0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff}))
}
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will introduce another version of the changed-path
filter in the commit graph file. In order to control which version to
write (and read), a config variable is needed.
Therefore, introduce this config variable. For forwards compatibility,
teach Git to not read commit graphs when the config variable
is set to an unsupported version. Because we teach Git this,
commitgraph.readChangedPaths is now redundant, so deprecate it and
define its behavior in terms of the config variable we introduce.
This commit does not change the behavior of writing (Git writes changed
path filters when explicitly instructed regardless of any config
variable), but a subsequent commit will restrict Git such that it will
only write when commitgraph.changedPathsVersion is a recognized value.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For over a year, setting add.interactive.useBuiltin configuration
variable did nothing but giving a "this does not do anything"
warning. Finally remove it.
* jc/add-i-retire-usebuiltin-config:
add-i: finally retire add.interactive.useBuiltin
Earlier we stopped using the tree of HEAD as the default source of
attributes in a bare repository, but failed to document it. This
has been corrected.
* jc/no-default-attr-tree-in-bare:
attr.tree: HEAD:.gitattributes is no longer the default in a bare repo
The pseudo-merge reachability bitmap to help more efficient storage
of the reachability bitmap in a repository with too many refs has
been added.
* tb/pseudo-merge-reachability-bitmap: (26 commits)
pack-bitmap.c: ensure pseudo-merge offset reads are bounded
Documentation/technical/bitmap-format.txt: add missing position table
t/perf: implement performance tests for pseudo-merge bitmaps
pseudo-merge: implement support for finding existing merges
ewah: `bitmap_equals_ewah()`
pack-bitmap: extra trace2 information
pack-bitmap.c: use pseudo-merges during traversal
t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
pack-bitmap: implement test helpers for pseudo-merge
ewah: implement `ewah_bitmap_popcount()`
pseudo-merge: implement support for reading pseudo-merge commits
pack-bitmap.c: read pseudo-merge extension
pseudo-merge: scaffolding for reads
pack-bitmap: extract `read_bitmap()` function
pack-bitmap-write.c: write pseudo-merge table
pseudo-merge: implement support for selecting pseudo-merge commits
config: introduce `git_config_double()`
pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
pack-bitmap-write: support storing pseudo-merge commits
...
The "--heads" option of "ls-remote" and "show-ref" has been been
deprecated; "--branches" replaces "--heads".
* jc/heads-are-branches:
show-ref: introduce --branches and deprecate --heads
ls-remote: introduce --branches and deprecate --heads
refs: call branches branches
The structure of the document that records longer-term project
decisions to deprecate/remove/update various behaviour has been
outlined.
* ps/document-breaking-changes:
BreakingChanges: document that we do not plan to deprecate git-checkout
BreakingChanges: document removal of grafting
BreakingChanges: document upcoming change from "sha1" to "sha256"
docs: introduce document to announce breaking changes
When the user adds to "git rebase -i" instruction to "pick" a merge
commit, the error experience is not pleasant. Such an error is now
caught earlier in the process that parses the todo list.
* pw/rebase-i-error-message:
rebase -i: improve error message when picking merge
rebase -i: pass struct replay_opts to parse_insn_line()
"git update-ref --stdin" learned to handle transactional updates of
symbolic-refs.
* kn/update-ref-symref:
update-ref: add support for 'symref-update' command
reftable: pick either 'oid' or 'target' for new updates
update-ref: add support for 'symref-create' command
update-ref: add support for 'symref-delete' command
update-ref: add support for 'symref-verify' command
refs: specify error for regular refs with `old_target`
refs: create and use `ref_update_expects_existing_old_ref()`
"git diff --exit-code --ext-diff" learned to take the exit status
of the external diff driver into account when deciding the exit
status of the overall "git diff" invocation when configured to do
so.
* rs/diff-exit-code-with-external-diff:
diff: let external diffs report that changes are uninteresting
userdiff: add and use struct external_diff
t4020: test exit code with external diffs
"git am" has a safety feature to prevent it from starting a new
session when there already is a session going. It reliably
triggers when a mbox is given on the command line, but it has to
rely on the tty-ness of the standard input. Add an explicit way to
opt out of this safety with a command line option.
* jk/am-retry:
test-terminal: drop stdin handling
am: add explicit "--retry" option
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.
* ps/ref-storage-migration:
builtin/refs: new command to migrate ref storage formats
refs: implement logic to migrate between ref storage formats
refs: implement removal of ref storages
worktree: don't store main worktree twice
reftable: inline `merged_table_release()`
refs/files: fix NULL pointer deref when releasing ref store
refs/files: extract function to iterate through root refs
refs/files: refactor `add_pseudoref_and_head_entries()`
refs: allow to skip creation of reflog entries
refs: pass storage format to `ref_store_init()` explicitly
refs: convert ref storage format to an enum
setup: unset ref storage when reinitializing repository version
"make check-docs" noticed problems and reported to its output but
failed to signal its findings with its exit status, which has been
corrected.
* ps/check-docs-fix:
ci/test-documentation: work around SyntaxWarning in Python 3.12
gitlab-ci: add job to run `make check-docs`
Documentation/lint-manpages: bubble up errors
Makefile: extract script to lint missing/extraneous manpages
While investigating a benign Coverity warning on the new pseudo-merge
implementation, I was struggling to understand the (paraphrased) below:
ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t));
for (i = 0; i < index->pseudo_merges.nr; i++) {
index->pseudo_merges.v[i].at = get_be64(ofs);
ofs += sizeof(uint64_t);
}
, in pack-bitmap.c::load_bitmap_header(). Looking at the documentation,
the diagram describing the on-disk format (prior to this patch)
suggested that the optional extended lookup table immediately preceded
the trailing metadata portion.
If that were the case, that would make the above code from
load_bitmap_header() incorrect, as we'd be blindly reading into the
extended offset table.
But later on in the documentation there is a description of the
pseudo-merge position table as immediately preceding the trailing
metadata portion of the extension. And indeed, we do write the position
table in pack-bitmap-write.c:
/* write positions for all pseudo merges */
for (i = 0; i < writer->pseudo_merges_nr; i++)
hashwrite_be64(f, pseudo_merge_ofs[i]);
hashwrite_be32(f, writer->pseudo_merges_nr);
hashwrite_be32(f, kh_size(writer->pseudo_merge_commits));
hashwrite_be64(f, table_start - start);
hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));
So this is purely a case of the diagram being out of sync with the
textual description and actual implementation of the format
specification.
Add the missing component back to the format diagram to avoid further
confusion in this area.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because remote.*.url is treated as a multi-valued key, there is no way
to override previous config. So for example if you have
remote.origin.url set to some wrong value, doing:
git -c remote.origin.url=right fetch
would not work. It would append "right" to the list, which means we'd
still fetch from "wrong" (since subsequent values are used only as push
urls).
Let's provide a mechanism to reset the list, like we do for other
multi-valued keys (e.g., credential.helper, http.extraheaders, and
merge.suppressDest all use this "empty string means reset" pattern).
Reported-by: Mathew George <mathewegeorge@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for these keys gives a very terse definition and
points you to the fetch/push manpages. But from reading those pages it
was not at all obvious to me that:
- these are keys that can be defined multiple times with meaningful
behavior (especially remote.*.url)
- the way that pushurl overrides url (the git-push page does mention
that "pushurl defaults to url", but it is not immediately clear what
a multi-valued url would do in that situation).
Let's try to summarize the current behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-checkout(1) command is seen by many as hard to understand
because it connects two somewhat unrelated features: switching between
branches and restoring worktree files from arbitrary revisions. In 2019,
we thus implemented two new commands git-switch(1) and git-restore(1) to
split out these separate concerns into standalone functions.
This "replacement" of git-checkout(1) has repeatedly triggered concerns
for our userbase that git-checkout(1) will eventually go away. This is
not the case though: the use of that command is still widespread, and it
is not expected that this will change anytime soon.
Document that all three commands will remain for the foreseeable future.
This decision may be revisited in case we ever figure out that most
everyone has given up on any of the commands.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The grafting mechanism for objects has been deprecated in e650d0643b
(docs: mark info/grafts as outdated, 2014-03-05), which is more than a
decade ago. The mechanism can lead to hard-to-debug issues and has a
superior replacement with replace refs.
Follow through with the deprecation and mark grafts for removal in Git
3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Starting with 8e42eb0e9a (doc: sha256 is no longer experimental,
2023-07-31), the "sha256" object format is no longer considered to be
experimental. Furthermore, the SHA-1 hash function is actively
recommended against by for example NIST and FIPS 140-2, and attacks
against it are becoming more practical both due to new weaknesses
(SHAppening, SHAttered, Shambles) and due to the ever-increasing
computing power. It is only a matter of time before it can be considered
to be broken completely.
Let's plan for this event by being active instead of waiting for it to
happend and announce that the default object format is going to change
from "sha1" to "sha256" with Git 3.0.
All major Git implementations (libgit2, JGit, go-git) support the
"sha256" object format and are thus prepared for this change. The most
important missing piece in the puzzle is support in forges. But while
GitLab recently gained experimental support for the "sha256" object
format though, to the best of my knowledge GitHub doesn't support it
yet. Ideally, announcing this upcoming change will encourage forges to
start building that support.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over time, Git has grown quite a lot. With this evolution, many ideas
that were sensible at the time they were introduced are not anymore and
are thus considered to be deprecated. And while some deprecations may be
noted in manpages, most of them are actually deprecated in the "hive
mind" of the Git community, only.
Introduce a new document that tracks such breaking changes, but also
deprecations which we are not willing to go through with, to address
this issue. This document serves multiple purposes:
- It is a way to facilitate discussion around proposed deprecations.
- It allows users to learn about deprecations and speak up in case
they have good reasons why a certain feature should not be
deprecated.
- It states intent and documents where the Git project wants to go,
both in the case where we want to deprecate, but also in the case
where we don't want to deprecate a specific feature.
The document is _not_ intended to cast every single discussion into
stone. It is supposed to be a living document that may change over time
when there are good reasons for it to change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The safe.directory configuration knob has been updated to
optionally allow leading path matches.
* jc/safe-directory-leading-path:
safe.directory: allow "lead/ing/path/*" match