We'd like to eventually support multi-byte comment prefixes, but the
comment_line_char variable is referenced in many spots, making the
transition difficult.
Let's start by storing the character in a NUL-terminated string. That
will let us switch code over incrementally to the string format, and we
can easily support the existing code with a macro wrapper (since we'll
continue to allow only a single-byte prefix, this will behave
identically).
Once all references to the "char" variable have been converted, we can
drop it and enable longer strings.
We'll still have to touch all of the spots that create or set the
variable in this patch, but there are only a few (reading the config,
and the "auto" character selector).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several comment-related strbuf functions take a comment_line_char
parameter. There's also a global comment_line_char variable, which is
closely related (most callers pass it in as this parameter). Let's avoid
shadowing the global name. This makes it more obvious that we're not
using the global value, and it will be especially helpful as we refactor
the global in future patches (in particular, any macro trickery wouldn't
work because the preprocessor doesn't respect scope).
We'll use "comment_prefix". That should be descriptive enough, and as a
bonus is more neutral with respect to the "char" type (since we'll
eventually swap it out for a string).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When core.commentChar is set to "auto", we check a set of candidate
characters against the proposed buffer to see which if any can be used
without ambiguity. But before we do that, we optimize for the common
case that the default "#" is fine by just seeing if it is present in the
buffer at all.
The way we do this is a bit subtle, though: we assign the candidate
character to comment_line_char preemptively, then check if it works, and
return if it does. The subtle part is that sometimes setting
comment_line_char is important (after we return, the important outcome
is the fact that we have set the variable) and sometimes it is useless
(if our optimization fails, we go on to do the more careful checks and
eventually assign something else instead).
To make it more clear what is happening (and to make further refactoring
of comment_line_char easier), let's check our candidate character
directly, and then assign as part of returning if it worked out.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In strbuf_add_commented_lines(), we have to convert the single-byte
comment_line_char into a string to pass to add_lines(). We cache the
created string using a static-local variable. But this makes the
function non-reentrant, and it's doubtful that this provides any real
performance benefit given that we know the string always contains a
single character.
So let's just create it from scratch each time, and to give the compiler
the maximal opportunity to make it fast we'll ditch the over-complicated
xsnprintf() and just assign directly into the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In strbuf_add_commented_lines(), we prepare two strings with potential
prefixes: one with just the comment char, and one with an additional
space. In the add_lines() helper, we use the one without the extra space
for blank lines or lines starting with a tab.
While passing in two separate prefixes to the helper is very flexible,
it's more flexibility than we actually use (or are likely to use, since
the rules inside add_lines() only make sense if "prefix2" is a variant
of "prefix1" without the extra space). And setting up the two strings
makes refactoring in strbuf_add_commented_lines() awkward.
Instead, let's pass in a single string, and just let add_lines() add the
extra space to the result as appropriate.
We do still need to pass in a flag to trigger this behavior. The helper
is shared by strbuf_add_lines(), which passes in a NULL "prefix2" to
inhibit this extra handling.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we usually look for a comment char while parsing line-oriented
files, setting core.commentChar to a single newline can confuse our code
quite a bit. For example, using it with "git commit" causes us to fail
to recognize any of the template as comments, including it in the config
message. Which kind of makes sense, since the template content is on its
own line (so no line can "start" with a newline). In other spots I would
not be surprised if you can create more mischief (e.g., violating loop
assumptions) but I didn't dig into it.
Since comment characters are a local preference, to some degree this is
a case of "if it hurts, don't do it". But given that this would be a
silly and pointless thing to do, and that it makes it harder to reason
about code parsing comment lines, let's just forbid it.
There are other cases that are perhaps questionable (e.g., setting the
comment char to a single space), but they seem to behave reasonably (at
least a simple "git commit" will correctly identify and strip the
template lines). So I haven't worried about going on a hunt for every
stupid thing a user might do to themselves, and just focused on the most
confusing case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code simplification to one unit-test program.
* rs/t-ctype-simplify:
t-ctype: avoid duplicating class names
t-ctype: align output of i
t-ctype: simplify EOF check
t-ctype: allow NUL anywhere in the specification string
Setting the safe.bareRepository configuration variable to explicit
stops git from using a bare repository, unless the repository is
explicitly specified, either by the "--git-dir=<path>" command line
option, or by exporting $GIT_DIR environment variable. This may be
a reasonable measure to safeguard users from accidentally straying
into a bare repository in unexpected places, but often gets in the
way of users who need valid accesses to the repository.
Earlier, 45bb9162 (setup: allow cwd=.git w/ bareRepository=explicit,
2024-01-20) loosened the rule such that being inside the ".git"
directory of a non-bare repository does not really count as
accessing a "bare" repository. The reason why such a loosening is
needed is because often hooks and third-party tools run from within
$GIT_DIR while working with a non-bare repository.
More importantly, the reason why this is safe is because a directory
whose contents look like that of a "bare" repository cannot be a
bare repository that came embedded within a checkout of a malicious
project, as long as its directory name is ".git", because ".git" is
not a name allowed for a directory in payload.
There are at least two other cases where tools have to work in a
bare-repository looking directory that is not an embedded bare
repository, and accesses to them are still not allowed by the recent
change.
- A secondary worktree (whose name is $name) has its $GIT_DIR
inside "worktrees/$name/" subdirectory of the $GIT_DIR of the
primary worktree of the same repository.
- A submodule worktree (whose name is $name) has its $GIT_DIR
inside "modules/$name/" subdirectory of the $GIT_DIR of its
superproject.
As long as the primary worktree or the superproject in these cases
are not bare, the pathname of these "looks like bare but not really"
directories will have "/.git/worktrees/" and "/.git/modules/" as a
substring in its leading part, and we can take advantage of the same
security guarantee allow git to work from these places.
Extend the earlier "in a directory called '.git' we are OK" logic
used for the primary worktree to also cover the secondary worktree's
and non-embedded submodule's $GIT_DIR, by moving the logic to a
helper function "is_implicit_bare_repo()". We deliberately exclude
secondary worktrees and submodules of a bare repository, as these
are exactly what safe.bareRepository=explicit setting is designed to
forbid accesses to without an explicit GIT_DIR/--git-dir=<path>
Helped-by: Kyle Lippincott <spectral@google.com>
Helped-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If several jobs in the windows-test or vs-test matrices fail, the
upload-artifact action in each job tries to upload the test directories
of the failed tests as "failed-tests-windows.zip", which fails for all
jobs except the one which finishes first with the following error:
Error: Failed to CreateArtifact: Received non-retryable error:
Failed request: (409) Conflict: an artifact with this name
already exists on the workflow run
Make the artifacts name unique by using the 'matrix.nr' token, and
disambiguate the vs-test artifacts from the windows-test ones.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new formatting rules, we use _<placeholders>_.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We also apply the formatting to urls.txt which is included.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When included into a the manpage of git-init, the param section must
not refer to the manpage.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In all cases of option description, each option is in its own
term. Use the same format here.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new doc format conventions, we use _<placeholders>_.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 24876ebf68 (commit-reach(repo_in_merge_bases_many): report missing
commits, 2024-02-28), I taught `merge_submodule()` to handle errors
reported by `repo_in_merge_bases_many()`.
However, those errors were not passed through to the callers. That was
unintentional, and this commit remedies that.
Note that `find_first_merges()` can now also return -1 (because it
passes through that return value from `repo_in_merge_bases()`), and this
commit also adds the forgotten handling for that scenario.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `merge_submodule()` function returns an integer that indicates
whether the merge was clean (returning 1) or unclean (returning 0).
Like the version in `merge-ort.c`, the version in `merge-recursive.c`
does not report any errors (such as repository corruption) by returning
-1 as of time of writing, even if the callers in `merge-ort.c` are
prepared for exactly such errors.
However, we want to teach (both variants of) the `merge_submodule()`
function that trick: to report errors by returning -1. Therefore,
prepare the caller in `merge-recursive.c` to handle that scenario.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A custom remote helper no longer cannot access the newly created
repository during "git clone", which is a regression in Git 2.44.
This has been corrected.
* ps/remote-helper-repo-initialization-fix:
builtin/clone: allow remote helpers to detect repo
The upload-pack program, when talking over v2, accepted the
packfile-uris protocol extension from the client, even if it did
not advertise the capability, which has been corrected.
* jk/upload-pack-v2-capability-cleanup:
upload-pack: only accept packfile-uris if we advertised it
upload-pack: use existing config mechanism for advertisement
upload-pack: centralize setup of sideband-all config
upload-pack: use repository struct to get config
Various parts of upload-pack has been updated to bound the resource
consumption relative to the size of the repository to protect from
abusive clients.
* jk/upload-pack-bounded-resources:
upload-pack: free tree buffers after parsing
upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
upload-pack: always turn off save_commit_buffer
upload-pack: disallow object-info capability by default
upload-pack: accept only a single packfile-uri line
upload-pack: use a strmap for want-ref lines
upload-pack: use oidset for deepen_not list
upload-pack: switch deepen-not list to an oid_array
upload-pack: drop separate v2 "haves" array
Clear the fallout from a fix for 2.44 regression.
* ps/reftable-repo-init-fix:
t0610: remove unused variable assignment
refs/reftable: don't fail empty transactions in repo without HEAD
"git log --merge" learned to pay attention to CHERRY_PICK_HEAD and
other kinds of *_HEAD pseudorefs.
* ml/log-merge-with-cherry-pick-and-other-pseudo-heads:
revision: implement `git log --merge` also for rebase/cherry-pick/revert
revision: ensure MERGE_HEAD is a ref in prepare_show_merge
"git commit -v --cleanup=scissors" used to add the scissors line
twice in the log message buffer, which has been corrected.
* jt/commit-redundant-scissors-fix:
commit: unify logic to avoid multiple scissors lines when merging
commit: avoid redundant scissor line with --cleanup=scissors -v
"git merge-tree" has learned that the three trees involved in the
3-way merge only need to be trees, not necessarily commits.
* js/merge-tree-3-trees:
fill_tree_descriptor(): mark error message for translation
cache-tree: avoid an unnecessary check
Always check `parse_tree*()`'s return value
t4301: verify that merge-tree fails on missing blob objects
merge-ort: do check `parse_tree()`'s return value
merge-tree: fail with a non-zero exit code on missing tree objects
merge-tree: accept 3 trees as arguments
"git rev-list --missing=print" has learned to optionally take
"--allow-missing-tips", which allows the objects at the starting
points to be missing.
* cc/rev-list-allow-missing-tips:
revision: fix --missing=[print|allow*] for annotated tags
rev-list: allow missing tips with --missing=[print|allow*]
t6022: fix 'test' style and 'even though' typo
oidset: refactor oidset_insert_from_set()
revision: clarify a 'return NULL' in get_reference()
"git --no-lazy-fetch cmd" allows to run "cmd" while disabling lazy
fetching of objects from the promisor remote, which may be handy
for debugging.
* jc/no-lazy-fetch:
git: extend --no-lazy-fetch to work across subprocesses
git: document GIT_NO_REPLACE_OBJECTS environment variable
git: --no-lazy-fetch option
Records store their keys prefix-compressed. As many records will share a
common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
of disk space. The downside of this is that it is not possible to just
seek into the middle of a block and consume the corresponding record
because it may depend on prefixes read from preceding records.
To help with this usecase, the reftable format writes every n'th record
without using prefix compression, which is called a "restart". The list
of restarts is stored at the end of each block so that a reader can
figure out entry points at which to read a full record without having to
read all preceding records.
This allows us to do a binary search over the records in a block when
searching for a particular key by iterating through the restarts until
we have found the section in which our record must be located. From
thereon we perform a linear search to locate the desired record.
This mechanism is broken though. In `block_reader_seek()` we call
`binsearch()` over the count of restarts in the current block. The
function we pass to compare records with each other computes the key at
the current index and then compares it to our search key by calling
`strbuf_cmp()`, returning its result directly. But `binsearch()` expects
us to return a truish value that indicates whether the current index is
smaller than the searched-for key. And unless our key exactly matches
the value at the restart counter we always end up returning a truish
value.
The consequence is that `binsearch()` essentially always returns 0,
indicacting to us that we must start searching right at the beginning of
the block. This works by chance because we now always do a linear scan
from the start of the block, and thus we would still end up finding the
desired record. But needless to say, this makes the optimization quite
useless.
Fix this bug by returning whether the current key is smaller than the
searched key. As the current behaviour was correct it is not possible to
write a test. Furthermore it is also not really possible to demonstrate
in a benchmark that this fix speeds up seeking records.
This may cause the reader to question whether this binary search makes
sense in the first place if it doesn't even help with performance. But
it would end up helping if we were to read a reftable with a much larger
block size. Blocks can be up to 16MB in size, in which case it will
become much more important to avoid the linear scan. We are not yet
ready to read or write such larger blocks though, so we have to live
without a benchmark demonstrating this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When decoding records it is customary to reuse a `struct
reftable_ref_record` across calls. Thus, it may happen that the record
already holds some allocated memory. When decoding ref and log records
we handle this by releasing or reallocating held memory. But we fail to
do this for object records, which causes us to leak memory.
Fix this memory leak by releasing object records before we decode into
them. We may eventually want to reuse memory instead to avoid needless
reallocations. But for now, let's just plug the leak and be done.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If
(a) There is a "---" divider in a commit message,
(b) At some point beyond that divider, there is a cut-line (that is,
"# ------------------------ >8 ------------------------") in the
commit message,
(c) the user does not explicitly set the "no-divider" option,
then "git interpret-trailers" will hang indefinitively.
This is because when (a) is true, find_end_of_log_message() will invoke
ignored_log_message_bytes() with a len that is intended to make it
ignore the part of the commit message beyond the divider. However,
ignored_log_message_bytes() calls wt_status_locate_end(), and that
function ignores the length restriction when it tries to locate the cut
line. If it manages to find one, the returned cutoff value is greater
than len. At this point, ignored_log_message_bytes() goes into an
infinite loop, because it won't advance the string parsing beyond len,
but the exit condition expects to reach cutoff.
Make wt_status_locate_end() honor the length parameter passed in, to
fix this issue.
In general, if wt_status_locate_end() is given a piece of the memory
that lacks NUL at all, strstr() may continue across page boundaries
and run into an unmapped page. For our current callers, this is not
a problem, as all of them except one uses a memory owned by a strbuf
(which guarantees an implicit NUL-termination after its payload),
and the one exception in trailer.c:find_end_of_log_message() uses
strlen() to compute the length before calling this function.
Signed-off-by: Florian Schmidt <flosch@nutanix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
[jc: tweaked the commit log message and the implementation a bit]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not register tables resulting from stack compaction with the
tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register compacted tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not register any of the locks we acquire when compacting the
reftable stack via our lockfiles interfaces. These locks will thus not
be released when Git gets killed.
Refactor the code to register locks as lockfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not register new tables which we're about to add to the stack with
the tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not report to the caller when rolling back a lockfile fails, which
will be needed by the reftable compaction logic in a subsequent commit.
It also cannot really report on all errors because the function calls
`delete_tempfile()`, which doesn't return an error either.
Refactor the code so that both `delete_tempfile()` and
`rollback_lock_file()` return an error code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The formatting around "option push-option" was missing its closing
quote, leading to the output having a stray opening quote, rather than
rendering the item in italics (as we do for all of the other options in
the list).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands do not cause a set of 'def_param' events to be emitted.
This includes "git-remote-https", "git-http-fetch", and various
"query" commands, like "git --man-path".
Since all of these commands do emit a 'cmd_name' event, add code to
the "trace2_cmd_name()" function to generate the set of 'def_param'
events.
Remove explicit calls to "trace2_cmd_list_config()" and
"trace2_cmd_list_env_vars()" in git.c since they are no longer needed.
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During nested alias expansion it is possible for
"trace2_cmd_list_config()" and "trace2_cmd_list_env_vars()"
to be called more than once. This causes a full set of
'def_param' events to be emitted each time. Let's avoid
that.
Add code to those two functions to only emit them once.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some Git commands fail to emit 'def_param' events for interesting
config and environment variable settings.
Add unit tests to demonstrate this.
Most commands are considered "builtin" and are based upon git.c.
These typically do emit 'def_param' events. Exceptions are some of
the "query" commands, the "run-dashed" mechanism, and alias handling.
Commands built from remote-curl.c (instead of git.c), such as
"git-remote-https", do not emit 'def_param' events.
Likewise, "git-http-fetch" is built http-fetch.c and does not emit
them.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace "test -f" and friends to use the test_path_is_file helper
function and friends from test-lib-functions.sh. These functions
perform identical operations while enhancing debugging capabilities
in case of test failures.
The original used 'test ! -f' to check if the file has been
correctly cleaned, so 'test ! -e' would have been a better choice.
Replace them with 'test_path_is_missing'.
Signed-off-by: Vincenzo Mezzela <vincenzo.mezzela@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach fsmonitor_refresh_callback() to handle case-insensitive
lookups if case-sensitive lookups fail on case-insensitive systems.
This can cause 'git status' to report stale status for files if there
are case issues/errors in the worktree.
The FSMonitor daemon sends FSEvents using the observed spelling
of each pathname. On case-insensitive file systems this may be
different than the expected case spelling.
The existing code uses index_name_pos() to find the cache-entry for
the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
that the worktree scan/index refresh will revisit and revalidate the
path.
On a case-insensitive file system, the exact match lookup may fail
to find the associated cache-entry. This causes status to think that
the cached CE flags are correct and skip over the file.
Update event handling to optionally use the name-hash and dir-name-hash
if necessary.
Also update t7527 to convert the "test_expect_failure" to "_success"
now that we have fixed the bug.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor code in the fsmonitor_refresh_callback() call chain dealing
with invalidating the CE_FSMONITOR_VALID bit and add a trace message.
During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
data from the FSMonitor daemon (so that a later phase will lstat() and
verify the true state of the file).
Create a new function to clear the bit and add some unique tracing for
it to help debug edge cases.
This is similar to the existing `mark_fsmonitor_invalid()` function,
but it also does untracked-cache invalidation and we've already
handled that in the refresh-callback handlers, so but we don't need
to repeat that.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Consolidate the directory/non-directory calls to the refresh handler
code. Log the resulting count of invalidated cache-entries.
The nr_in_cone value will be used in a later commit to decide if
we also need to try to do case-insensitive lookups.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the refresh callback helper function for unqualified FSEvents
(pathnames without a trailing slash) to return the number of
cache-entries that were invalided in response to the event.
This will be used in a later commit to help determine if the observed
pathname was (possibly) case-incorrect when (on a case-insensitive
file system).
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b0f6b6b523 (refs/reftable: don't fail empty transactions in repo
without HEAD, 2024-02-27), we have added a new test to t0610. This test
contains a useless assignment to a variable that is never actually used.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>