"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
During interactive rebase, using 'drop' on a merge commit lead to
an error, which was incorrect.
* js/rebase-i-allow-drop-on-a-merge:
rebase -i: permit 'drop' of a merge commit
As part of 9bbc981c6f (t/unit-tests: finalize migration of
reftable-related tests, 2025-07-24), the explicit list of
`UNIT_TEST_PROGRAMS` was turned into a wildcard pattern-derived list.
Let's do the same in the CMake definition.
This fixes build errors with symptoms like this:
CMake Error at CMakeLists.txt:132 (string):
string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
CMakeLists.txt:1037 (parse_makefile_for_scripts)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Edit: We are continuing to follow the existing PO file convention, which
includes filenames but strips out line numbers from the file-location
comments. This standard was set by our former lead, Jordi Mas, and we
are maintaining it for project-wide consistency.
Signed-off-by: Mikel Forcada <mikel.forcada@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
3a54f5bd5d (merge/pull: add the "--compact-summary" option, 2025-06-12)
added the option --compact-summary to both merge and pull. It takes no
no argument, but for merge it got an argument help string. Remove it,
since it is unnecessary.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dabecb9db2 (for-each-ref: introduce a '--start-after' option,
2025-07-15) added the option --start-after and referred to its argument
as "marker" in documentation and usage string, but not in the option's
short help. Use "marker" there as well for consistency and brevity.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve wording and fix typos for a couple entries part of the Git 2.51
release notes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The deflate codepath in "git archive --format=zip" had a
longstanding bug coming from misuse of zlib API, which has been
corrected.
* jt/archive-zip-deflate-fix:
archive: flush deflate stream until Z_STREAM_END
Squelch false-positive compiler warning.
* dl/squelch-maybe-uninitialized:
t/unit-tests/clar: fix -Wmaybe-uninitialized with -Og
remote: bail early from set_head() if missing remote name
When renaming a remote we also need to rename all references
accordingly. But while we only need to rename references that are
contained in the "refs/remotes/$OLDNAME/" namespace, we end up using
`refs_for_each_rawref()` that iterates through _all_ references. We know
to exit early in the callback in case we see an irrelevant reference,
but ultimately this is still a waste of compute as we knowingly iterate
through references that we won't ever care about.
Improve this by using `refs_for_each_rawref_in()`, which knows to only
iterate through (potentially broken) references in a given prefix.
The following benchmark renames a remote with a single reference in a
repository that has 100k unrelated references. This shows a sizeable
improvement with the "files" backend:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 42.6 ms ± 0.9 ms [User: 29.1 ms, System: 8.4 ms]
Range (min … max): 40.1 ms … 43.3 ms 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 31.7 ms ± 4.0 ms [User: 19.6 ms, System: 6.9 ms]
Range (min … max): 27.1 ms … 36.0 ms 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
1.35 ± 0.17 times faster than rename remote (refformat = files, revision = HEAD~)
The "reftable" backend shows roughly the same absolute improvement, but
given that it's already significantly faster than the "files" backend
this translates to a much larger relative improvement:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 18.2 ms ± 0.5 ms [User: 12.7 ms, System: 3.0 ms]
Range (min … max): 17.3 ms … 21.4 ms 110 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 8.8 ms ± 0.5 ms [User: 3.8 ms, System: 2.9 ms]
Range (min … max): 7.5 ms … 9.9 ms 167 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
2.07 ± 0.12 times faster than rename remote (refformat = reftable, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was recently reported [1] that renaming a remote that has dangling
symrefs is broken. This issue can be trivially reproduced:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git remote add origin /dev/null
$ git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/master
$ git remote rename origin renamed
$ git symbolic-ref refs/remotes/origin/HEAD
refs/remotes/origin/master
$ git symbolic-ref refs/remotes/renamed/HEAD
fatal: ref refs/remotes/renamed/HEAD is not a symbolic ref
As one can see, the "HEAD" reference did not get renamed but stays in
the same place. There are two issues here:
- We use `refs_resolve_ref_unsafe()` to resolve references, but we
don't pass the `RESOLVE_REF_NO_RECURSE` flag. Consequently, if the
reference does not resolve, the function will fail and we thus
ignore this branch.
- We use `refs_for_each_ref()` to iterate through the old remote's
references, but that function ignores broken references.
Both of these issues are easy to fix. But having a closer look at the
logic that renames remote references surfaces that it leaves a lot to be
desired overall.
The problem is that we're using O(|refs| + |symrefs| * 2) many reference
transactions to perform the renames. We first delete all symrefs, then
individually rename every direct reference and finally we recreate the
symrefs. On the one hand this isn't even remotely an atomic operation,
so if we hit any error we'll already have deleted all references.
But more importantly it is also extremely inefficient. The number of
transactions for symrefs doesn't really bother us too much, as there
should generally only be a single symref anyway ("HEAD"). But the
renames are very expensive:
- For the "reftable" backend we perform auto-compaction after every
single rename, which does add up.
- For the "files" backend we potentially have to rewrite the
"packed-refs" file on every single rename in case they are packed.
The consequence here is quadratic runtime performance. Renaming a
100k references takes hours to complete.
Refactor the code to use a single transaction to perform all the
reference updates atomically, which speeds up the transaction quite
significantly:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 238.770 s ± 13.857 s [User: 91.473 s, System: 143.793 s]
Range (min … max): 204.863 s … 247.699 s 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 2.103 s ± 0.036 s [User: 0.360 s, System: 1.313 s]
Range (min … max): 2.011 s … 2.141 s 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
113.53 ± 6.87 times faster than rename remote (refformat = files, revision = HEAD~)
For the "reftable" backend we see a significant speedup, as well, but
given that we don't have quadratic runtime behaviour there it's way less
extreme:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 8.604 s ± 0.539 s [User: 4.985 s, System: 2.368 s]
Range (min … max): 7.880 s … 9.556 s 10 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 1.177 s ± 0.103 s [User: 0.446 s, System: 0.270 s]
Range (min … max): 1.023 s … 1.410 s 10 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
7.31 ± 0.79 times faster than rename remote (refformat = reftable, revision = HEAD~)
There is one issue though with using atomic transactions: when nesting a
remote into itself it can happen that renamed references conflict with
the old referencse. For example, when we have a reference
"refs/remotes/origin/foo" and we rename "origin" to "origin/foo", then
we'll end up with an F/D conflict when we try to create the renamed
reference "refs/remotes/origin/foo/foo".
This situation is overall quite unlikely to happen: people tend to not
use nested remotes, and if they do they must at the same time also have
a conflicting refname. But the end result would be that the old remote
references stay intact whereas all the other parts of the repository
have been adjusted for the new remote name.
Address this by queueing and preparing the reference update before we
touch any other part of the repository. Like this we can make sure that
the reference update will go through before rewriting the configuration.
Otherwise, if the transaction fails to prepare we can gracefully abort
the whole operation without any changes having been performed in the
repository yet. Furthermore, we can detect the conflict and print some
helpful advice for how the user can resolve this situation. So overall,
the tradeoff is that:
- Reference transactions are now all-or-nothing. This is a significant
improvement over the previous state where we may have ended up with
partially-renamed references.
- Rewriting references is now significantly faster.
- We only rewrite the configuration in case we know that all
references can be updated.
- But we may refuse to rename a remote in case references conflict.
Overall this seems like an acceptable tradeoff.
While at it, fix the handling of symbolic/broken references by using
`refs_for_each_rawref()`. Add tests that cover both this reported issue
and tests that exercise nesting of remotes.
One thing to note: with this change we cannot provide a proper progress
monitor anymore as we queue the references into the transactions as we
iterate through them. Consequently, as we don't know yet how many refs
there are in total, we cannot report how many percent of the operation
is done anymore. But that's a small price to pay considering that you
now shouldn't need the progress monitor in most situations at all
anymore.
[1]: <CANrWfmQWa=RJnm7d3C7ogRX6Tth2eeuGwvwrNmzS2gr+eP0OpA@mail.gmail.com>
Reported-by: Han Jiang <jhcarl0814@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote we may have to also rename remote refs in case
the refspec changes. Pull out this computation into a separate loop.
While that seems nonsensical right now, it'll help us in a subsequent
commit where we will prepare the reference transaction before we rewrite
the configuration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix -Wsign-comparison warnings. All of the warnings we have are about
mismatches in signedness for loop counters. These are trivially fixable
by using the correct integer type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between two storage formats we have to do
so via two callback-driven functions:
- `migrate_one_reflog()` gets invoked via `refs_for_each_reflog()` to
first list all available reflogs.
- `migrate_one_reflog_entry()` gets invoked via
`refs_for_each_reflog_ent()` in `migrate_one_reflog()`.
Before the preceding commit we didn't have the refname available in
`migrate_one_reflog_entry()`, which made it necessary to have a separate
structure that we pass to the second callback so that we can propagate
the refname. Now that `refs_for_each_reflog_ent()` knows to pass the
refname to the callback though that indirection isn't necessary anymore.
There's one catch though: we do have an update index that is also stored
in the entry-specific callback data. This update index is required so
that we can tell the ref backend in which order it should persist the
reflog entries to disk.
But that purpose can be trivially achieved by just converting it into a
global counter that is used for all reflog entries, regardless of which
reference they are for. The ordering will remain the same as both the
update index and the refname is considered when sorting the entries.
Move the index into `struct migration_data` and drop the now-unused
`struct reflog_migration_data` to simplify the code a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
4c063c82e9 (rebase -i: improve error message when picking merge,
2024-05-30) added advice texts for cases when a merge commit is
passed as argument of sequencer command that cannot operate with
a merge commit. However, it forgot about the 'drop' command, so
that in this case the BUG() in the default branch is reached.
Handle 'drop' like 'merge', i.e., permit it without a message.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between different storage formats we end
up with invalid old object IDs for the migrated entries: instead of
writing the old object ID of the to-be-migrated entry, we end up with
the all-zeroes object ID.
The root cause of this issue is that we don't know to use the old object
ID provided by the caller. Instead, we manually resolve the old object
ID by resolving the current value of its matching reference. But as that
reference does not yet exist in the target ref storage we always end up
resolving it to all-zeroes.
This issue got unnoticed as there is no user-facing command that would
even show the old object ID. While `git log -g` knows to show the new
object ID, we don't have any formatting directive to show the old object
ID.
Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If
set, backends are instructed to use the old and new object IDs provided
by the caller, without doing any manual resolving. Set this flag in
`ref_transaction_update_reflog()`.
Amend our tests in t1460-refs-migrate to use our test tool to read
reflog entries. This test tool prints out both old and new object ID of
each reflog entry, which fixes the test gap. Furthermore it also prints
the full identity used to write the reflog, which provides test coverage
for the previous commit in this patch series that fixed the identity for
migrated reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
object ID set. If so, the value of that field is used to verify whether
the current state of the reference matches this expected state. It is
thus an important part of mitigating races with a concurrent process
that updates the same set of references.
When writing reflogs though we explicitly unset that flag. This is a
sensible thing to do: the old state of reflog entry updates may not
necessarily match the current on-disk state of its accompanying ref, but
it's only intended to signal what old object ID we want to write into
the new reflog entry. For example when migrating refs we end up writing
many reflog entries for a single reference, and most likely those reflog
entries will have many different old object IDs.
But unsetting this flag also removes a useful signal, namely that the
caller _did_ provide an old object ID for a given reflog entry. This
signal will become useful in a subsequent commit, where we add a new
flag that tells the transaction to use the provided old and new object
IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a
signal to verify that the caller really did provide an old object ID.
Stop unsetting the flag so that we can use it as this described signal
in a subsequent commit. Skip checking the old object ID for log-only
updates so that we don't expect it to match the current on-disk state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When updating a reference that is being pointed to HEAD we don't only
write a reflog message for that particular reference, but also generate
one for HEAD. This logic is handled by `split_head_update()`, where we:
1. Verify that the condition actually triggered. This is done by
reading HEAD at the start of the transaction so that we can then
check whether a given reference update refers to its target.
2. Queue a new log-only update for HEAD in case it did.
But the logic is unfortunately not free of races, as we do not lock the
HEAD reference after we have read its target. This can lead to the
following two scenarios:
- HEAD gets concurrently updated to point to one of the references we
have already processed. This causes us not writing a reflog message
even though we should have done so.
- HEAD gets concurrently updated to no longer point to a reference
anymore that we have already processed. This causes us to write a
reflog message even though we should _not_ have done so.
Improve the situation by introducing a new `REF_LOG_VIA_SPLIT` flag that
is specific to the "files" backend. If set, we will double check that
the HEAD reference still points to the reference that we are creating
the reflog entry for after we have locked HEAD. Furthermore, instead of
manually resolving the old object ID of that entry, we now use the same
old state as for the parent update.
If we detect such a racy update we abort the transaction. This is a bit
heavy-handed: the user didn't even ask us to write a reflog update for
"HEAD", so it might be surprising if we abort the transaction. That
being said:
- Normal users wouldn't typically hit this case as we only hit the
relevant code when committing to a branch that is being pointed to
by "HEAD" directly. Commands like git-commit(1) typically commit to
"HEAD" itself though.
- Scripted users that use git-update-ref(1) and related plumbing
commands are unlikely to hit this case either, as they would have to
update the pointed-to-branch at the same as "HEAD" is being updated,
which is an exceedingly rare event.
The alternative would be to instead drop the log-only update completely,
but that would require more logic that is hard to verify without adding
infrastructure specific for such a test. So we rather do the pragmatic
thing and don't worry too much about an edge case that is very unlikely
to happen.
Unfortunately, this change only helps with the second race. We cannot
reliably plug the first race without locking the HEAD reference at the
start of the transaction. Locking HEAD unconditionally would effectively
serialize all writes though, and that doesn't seem like an option. Also,
double checking its value at the end of the transaction is not an option
either, as its target may have flip-flopped during the transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between different storage formats we must
reconstruct the identity of reflog entries. This is done by passing the
committer passed to the `migrate_one_reflog_entry()` callback function
to `fmt_ident()`.
This results in an invalid identity though: `fmt_ident()` expects the
caller to provide both name and mail of the author, but we pass the full
identity as mail. This leads to an identity like:
pks <Patrick Steinhardt ps@pks.im>
Fix the bug by splitting the identity line first. This allows us to
extract both the name and mail so that we can pass them to `fmt_ident()`
separately.
This commit does not yet add any tests as there is another bug in the
reflog migration that will be fixed in a subsequent commit. Once that
bug is fixed we'll make the reflog verification in t1450 stricter, and
that will catch both this bug here and the other bug.
Note that we also add two new `name` and `mail` string buffers to the
callback structures and splice them through to the callbacks. This is
done so that we can avoid allocating a new buffer every time we compute
the committer information.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last parameter in `split_ident_line()` is the length of the line
passed in by the caller. As such, most callers pass in either the result
of `strlen()`, `struct strbuf::len` or a pointer diff, all of which
are expected to be positive numbers. Regardless of that, the function
accepts a signed integer, which is somewhat confusing.
Fix the function signature to instead accept a `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we provide a couple of subcommands in git-reflog(1) to remove
reflog entries, we don't provide any to write new entries. Obviously
this is not an operation that really would be needed for many use cases
out there, or otherwise people would have complained that such a command
does not exist yet. But the introduction of the "reftable" backend
changes the picture a bit, as it is now basically impossible to manually
append a reflog entry if one wanted to do so due to the binary format.
Plug this gap by introducing a simple "write" subcommand. For now, all
this command does is to append a single new reflog entry with the given
object IDs and message to the reflog. More specifically, it is not yet
possible to:
- Write multiple reflog entries at once.
- Insert reflog entries at arbitrary indices.
- Specify the date of the reflog entry.
- Insert reflog entries that refer to nonexistent objects.
If required, those features can be added at a future point in time. For
now though, the new command aims to fulfill the most basic use cases
while being as strict as possible when it comes to verifying parameters.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>