Records store their keys prefix-compressed. As many records will share a
common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
of disk space. The downside of this is that it is not possible to just
seek into the middle of a block and consume the corresponding record
because it may depend on prefixes read from preceding records.
To help with this usecase, the reftable format writes every n'th record
without using prefix compression, which is called a "restart". The list
of restarts is stored at the end of each block so that a reader can
figure out entry points at which to read a full record without having to
read all preceding records.
This allows us to do a binary search over the records in a block when
searching for a particular key by iterating through the restarts until
we have found the section in which our record must be located. From
thereon we perform a linear search to locate the desired record.
This mechanism is broken though. In `block_reader_seek()` we call
`binsearch()` over the count of restarts in the current block. The
function we pass to compare records with each other computes the key at
the current index and then compares it to our search key by calling
`strbuf_cmp()`, returning its result directly. But `binsearch()` expects
us to return a truish value that indicates whether the current index is
smaller than the searched-for key. And unless our key exactly matches
the value at the restart counter we always end up returning a truish
value.
The consequence is that `binsearch()` essentially always returns 0,
indicacting to us that we must start searching right at the beginning of
the block. This works by chance because we now always do a linear scan
from the start of the block, and thus we would still end up finding the
desired record. But needless to say, this makes the optimization quite
useless.
Fix this bug by returning whether the current key is smaller than the
searched key. As the current behaviour was correct it is not possible to
write a test. Furthermore it is also not really possible to demonstrate
in a benchmark that this fix speeds up seeking records.
This may cause the reader to question whether this binary search makes
sense in the first place if it doesn't even help with performance. But
it would end up helping if we were to read a reftable with a much larger
block size. Blocks can be up to 16MB in size, in which case it will
become much more important to avoid the linear scan. We are not yet
ready to read or write such larger blocks though, so we have to live
without a benchmark demonstrating this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When decoding records it is customary to reuse a `struct
reftable_ref_record` across calls. Thus, it may happen that the record
already holds some allocated memory. When decoding ref and log records
we handle this by releasing or reallocating held memory. But we fail to
do this for object records, which causes us to leak memory.
Fix this memory leak by releasing object records before we decode into
them. We may eventually want to reuse memory instead to avoid needless
reallocations. But for now, let's just plug the leak and be done.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update the DEF_VER in GIT-VERSION-GEN, which I forgot to do
earlier (it should have been done when we started the new cycle).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git for-each-ref" learned "--include-root-refs" option to show
even the stuff outside the 'refs/' hierarchy.
* kn/for-all-refs:
for-each-ref: add new option to include root refs
ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'
refs: introduce `refs_for_each_include_root_refs()`
refs: extract out `loose_fill_ref_dir_regular_file()`
refs: introduce `is_pseudoref()` and `is_headref()`
When a merge conflicted at a submodule, merge-ort backend used to
unconditionally give a lengthy message to suggest how to resolve
it. Now the message can be squelched as an advice message.
* pb/ort-make-submodule-conflict-message-an-advice:
merge-ort: turn submodule conflict suggestions into an advice
Clarify wording in the CodingGuidelines that requires <git-compat-util.h>
to be the first header file.
* jc/doc-compat-util:
doc: clarify the wording on <git-compat-util.h> requirement
An error message from "git upload-pack", which responds to "git
fetch" requests, had a trialing NUL in it, which has been
corrected.
* sg/upload-pack-error-message-fix:
upload-pack: don't send null character in abort message to the client
Many small allocations "git name-rev" makes have been updated to
allocate from a mem-pool.
* rs/name-rev-with-mempool:
name-rev: use mem_pool_strfmt()
mem-pool: add mem_pool_strfmt()
The logic to access reflog entries by date and number had ugly
corner cases at the boundaries, which have been cleaned up.
* jk/reflog-special-cases-fix:
read_ref_at(): special-case ref@{0} for an empty reflog
get_oid_basic(): special-case ref@{n} for oldest reflog entry
Revert "refs: allow @{n} to work with n-sized reflog"
The code incorrectly attempted to use textconv cache when asked,
even when we are not running in a repository, which has been
corrected.
* jk/textconv-cache-outside-repo-fix:
userdiff: skip textconv caching when not in a repository
Practice the new mark-up rule for <placeholders> with "git add"
documentation page.
* jc/doc-add-placeholder-fix:
doc: apply the new placeholder rules to git-add documentation
The way placeholders are to be marked-up in documentation have been
specified; use "_<placeholder>_" to typeset the word inside a pair
of <angle-brakets> emphasized.
* ja/doc-placeholders-markup-rules:
doc: clarify the format of placeholders
"git reflog" learned a "list" subcommand that enumerates known reflogs.
* ps/reflog-list:
builtin/reflog: introduce subcommand to list reflogs
refs: stop resolving ref corresponding to reflogs
refs: drop unused params from the reflog iterator callback
refs: always treat iterators as ordered
refs/files: sort merged worktree and common reflogs
refs/files: sort reflogs returned by the reflog iterator
dir-iterator: support iteration in sorted order
dir-iterator: pass name to `prepare_next_entry_data()` directly
Doc update.
* ja/docfixes:
doc: end sentences with full-stop
doc: close unclosed angle-bracket of a placeholder in git-clone doc
doc: git-rev-parse: enforce command-line description syntax
"git difftool --dir-diff" learned to honor the "--trust-exit-code"
option; it used to always exit with 0 and signalled success.
* ps/difftool-dir-diff-exit-code:
git-difftool--helper: honor `--trust-exit-code` with `--dir-diff`
Credential helper based on libsecret (in contrib/) has been updated
to handle an empty password correctly.
* mh/libsecret-empty-password-fix:
libsecret: retrieve empty password
Some parts of command line completion script (in contrib/) have
been micro-optimized.
* bb/completion-no-grep-into-awk:
completion: use awk for filtering the config entries
Variants of vimdiff learned to honor mergetool.<variant>.layout settings.
* km/mergetool-vimdiff-layout-fallback:
mergetools: vimdiff: use correct tool's name when reading mergetool config
"git tag --column" failed to check the exit status of its "git
column" invocation, which has been corrected.
* rj/tag-column-fix:
tag: error when git-column fails
"git am --help" now tells readers what actions are available in
"git am --whitespace=<action>", in addition to saying that the
option is passed through to the underlying "git apply".
* jc/am-whitespace-doc:
doc: add shortcut to "am --whitespace=<action>"
The reason why we require the <git-compat-util.h> file to be the
first header file to be included is because it insulates other
header files and source files from platform differences, like which
system header files must be included in what order, and what C
preprocessor feature macros must be defined to trigger certain
features we want out of the system.
We tried to clarify the rule in the coding guidelines document, but
the wording was a bit fuzzy that can lead to misinterpretations like
you can include <xdiff/xinclude.h> only to avoid having to include
<git-compat-util.h> even if you have nothing to do with the xdiff
implementation, for example. "You do not have to include more than
one of these" was also misleading and would have been puzzling if
you _needed_ to depend on more than one of these approved headers
(answer: you are allowed to include them all if you need the
declarations in them for reasons other than that you want to avoid
including compat-util yourself).
Instead of using the phrase "approved headers", enumerate them as
exceptions, each labeled with its intended audiences, to avoid such
misinterpretations. The structure also makes it easier to add new
exceptions, so add the description of "t/unit-tests/test-lib.h"
being an exception only for the unit tests implementation as an
example.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Kyle Lippincott <spectral@google.com>
Acked-by: Elijah Newren <newren@gmail.com>
This is a minor follow-up to cb00f524df (rebase: rewrite
--(no-)autosquash documentation, 2023-11-14) to fix a typo introduced in
that commit.
Signed-off-by: Richard Macklin <code@rmacklin.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More tests that are marked as "ref-files only" have been updated to
improve test coverage of reftable backend.
* ps/ref-tests-update-even-more:
t7003: ensure filter-branch prunes reflogs with the reftable backend
t2011: exercise D/F conflicts with HEAD with the reftable backend
t1405: remove unneeded cleanup step
t1404: make D/F conflict tests compatible with reftable backend
t1400: exercise reflog with gaps with reftable backend
t0410: convert tests to use DEFAULT_REPO_FORMAT prereq
t: move tests exercising the "files" backend
Teach "git checkout -p" and friends that "@" is a synonym for
"HEAD".
* gt/at-is-synonym-for-head-in-add-patch:
add -p tests: remove PERL prerequisites
add-patch: classify '@' as a synonym for 'HEAD'
"git column" has been taught to reject negative padding value, as
it would lead to nonsense behaviour including division by zero.
* kh/column-reject-negative-padding:
column: guard against negative padding
column: disallow negative padding
Adjust use of "rev-list --missing" in an existing tests so that it
does not depend on a buggy failure mode.
* jc/t9210-lazy-fix:
t9210: do not rely on lazy fetching to fail
The code to iterate over refs with the reftable backend has seen
some optimization.
* ps/reftable-iteration-perf:
reftable/reader: add comments to `table_iter_next()`
reftable/record: don't try to reallocate ref record name
reftable/block: swap buffers instead of copying
reftable/pq: allocation-less comparison of entry keys
reftable/merged: skip comparison for records of the same subiter
reftable/merged: allocation-less dropping of shadowed records
reftable/record: introduce function to compare records by key
"git apply" on a filesystem without filemode support have learned
to take a hint from what is in the index for the path, even when
not working with the "--index" or "--cached" option, when checking
the executable bit match what is required by the preimage in the
patch.
* cp/apply-core-filemode:
apply: code simplification
apply: correctly reverse patch's pre- and post-image mode bits
apply: ignore working tree filemode when !core.filemode
Integrate the reftable code into the refs framework as a backend.
* ps/reftable-backend:
refs/reftable: fix leak when copying reflog fails
ci: add jobs to test with the reftable backend
refs: introduce reftable backend
This file has been added as part of 2232a88ab6 (attr: add builtin
objectmode values support, 2023-11-16) and most likely serves no
relevant purpose.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new advice type 'submoduleMergeConflict' for the error message
shown when a non-trivial submodule conflict is encountered, which
was added in 4057523a40 (submodule merge: update conflict error
message, 2022-08-04). That commit mentions making this message an
advice as possible future work. The message can now be disabled
with the advice mechanism.
Update the tests as the expected message now appears on stderr instead
of stdout.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit special-cased get_oid_basic()'s handling of ref@{n}
for a reflog with n entries. But its special case doesn't work for
ref@{0} in an empty reflog, because read_ref_at() dies when it notices
the empty reflog!
We can make this work by special-casing this in read_ref_at(). It's
somewhat gross, for two reasons:
1. We have no reflog entry to describe in the "msg" out-parameter. So
we have to leave it uninitialized or make something up.
2. Likewise, we have no oid to put in the "oid" out-parameter. Leaving
it untouched is actually the best thing here, as all of the callers
will have initialized it with the current ref value via
repo_dwim_log(). This is rather subtle, but it is how things worked
in 6436a20284 (refs: allow @{n} to work with n-sized reflog,
2021-01-07) before we reverted it.
The key difference from 6436a20284 here is that we'll return "1" to
indicate that we _didn't_ find the requested reflog entry. Coupled with
the special-casing in get_oid_basic() in the previous commit, that's
enough to make looking up ref@{0} work, and we can flip 6436a20284's
test back to expect_success.
It also means that the call in show-branch which segfaulted with
6436a20284 (and which is now tested in t3202) remains OK. The caller
notices that we could not find any reflog entry, and so it breaks out of
its loop, showing nothing. This is different from the current behavior
of producing an error, but it's just as reasonable (and is exactly what
we'd do if you asked it to walk starting at ref@{1} but there was only 1
entry).
Thus nobody should actually look at the reflog entry info we return. But
we'll still put in some fake values just to be on the safe side, since
this is such a subtle and confusing interface. Likewise, we'll document
what's going on in a comment above the function declaration. If this
were a function with a lot of callers, the footgun would probably not be
worth it. But it has only ever had two callers in its 18-year existence,
and it seems unlikely to grow more. So let's hold our noses and let
users enjoy the convenience of a simulated ref@{0}.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog,
2021-01-07) was that if we have "n" entries in a reflog, we should still
be able to resolve ref@{n} by looking at the "old" value of the oldest
entry.
Commit 6436a20284 tried to put the logic into read_ref_at() by shifting
its idea of "n" by one. But we reverted that in the previous commit,
since it led to bugs in other callers which cared about the details of
the reflog entry we found. Instead, let's put the special case into the
caller that resolves @{n}, as it cares only about the oid.
read_ref_at() is even kind enough to return the "old" value from the
final reflog; it just returns "1" to signal to us that we ran off the
end of the reflog. But we can notice in the caller that we read just
enough records for that "old" value to be the one we're looking for, and
use it.
Note that read_ref_at() could notice this case, too, and just return 0.
But we don't want to do that, because the caller must be made aware that
we only found the oid, not an actual reflog entry (and the call sites in
show-branch do care about this).
There is one complication, though. When read_ref_at() hits a truncated
reflog, it will return the "old" value of the oldest entry only if it is
not the null oid. Otherwise, it actually returns the "new" value from
that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in
oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a
ref created recently will produce the initial value as a convenience
(even though technically it did not exist 20 years ago).
But this convenience is only useful for time-based cutoffs. For
count-based cutoffs, get_oid_basic() has always simply complained about
going too far back:
$ git rev-parse HEAD@{20}
fatal: log for 'HEAD' only has 16 entries
and we should continue to do so, rather than returning a nonsense value
(there's even a test in t1508 already which covers this). So let's have
the d1a4489a56 code kick in only when doing timestamp-based cutoffs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>