Commit Graph

75953 Commits

Author SHA1 Message Date
Xing Xin
a77554ea09 diff-tree: fix crash when used with --remerge-diff
When using "git-diff-tree" to get the tree diff for merge commits with
the diff format set to `remerge`, a bug is triggered as shown below:

  $ git diff-tree -r --remerge-diff 363337e6eb
  363337e6eb
  BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?

This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
added in commit 7b90ab467a (log: clean unneeded objects during log
--remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
attempting to clean up temporary objects generated during the remerge
process.

After some further digging, I find that the remerge-related diff options
were introduced in db757e8b8d (show, log: provide a --remerge-diff
capability, 2022-02-02), which also affect the setup of `rev_info` for
"git-diff-tree", but were not accounted for in the original
implementation (inferred from the commit message).

Elijah Newren, the author of the remerge diff feature, notes that other
callers of `log-tree.c:log_tree_commit` (the only caller of
`log-tree.c:do_remerge_diff`) also exist, but:

  `builtin/am.c`: manually sets all flags; remerge_diff is not among them
  `sequencer.c`: manually sets all flags; remerge_diff is not among them

so `builtin/diff-tree.c` really is the only caller that was overlooked
when remerge-diff functionality was added.

This commit resolves the crash by adding `remerge_objdir` setup logic to
`builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
It also includes the necessary cleanup for `remerge_objdir`.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09 08:07:44 -07:00
Junio C Hamano
0d66f601a9 tests: drop use of 'tee' that hides exit status
A few tests have "| tee output" downstream of a git command, and
then inspect the contents of the file.  The net effect is that we
use an extra process, and hide the exit status from the upstream git
command.

In any of these tests, I do not see a reason why we want to hide a
possible failure from these git commands.  Replace the use of tee
with a plain simple redirection.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 18:08:10 -07:00
Junio C Hamano
25673b1c47 The third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:41:21 -07:00
Junio C Hamano
598422337c Merge branch 'ps/p4-tests-updates'
Perforce tests have been updated.

* ps/p4-tests-updates:
  t98xx: mark Perforce tests as memory-leak free
  ci: update Perforce version to r23.2
  t98xx: fix Perforce tests with p4d r23 and newer
2024-08-08 10:41:21 -07:00
Junio C Hamano
3e12106370 Merge branch 'dh/encoding-trace-optim'
An expensive operation to prepare tracing was done in re-encoding
code path even when the tracing was not requested, which has been
corrected.

* dh/encoding-trace-optim:
  convert: return early when not tracing
2024-08-08 10:41:20 -07:00
Junio C Hamano
536695cabe Merge branch 'ps/doc-more-c-coding-guidelines'
Some project conventions have been added to CodingGuidelines.

* ps/doc-more-c-coding-guidelines:
  Documentation: consistently use spaces inside initializers
  Documentation: document idiomatic function names
  Documentation: document naming schema for structs and their functions
  Documentation: clarify indentation style for C preprocessor directives
  clang-format: fix indentation width for preprocessor directives
2024-08-08 10:41:20 -07:00
Junio C Hamano
984ab11337 Merge branch 'rs/grep-omit-blank-lines-after-function-at-eof'
"git grep -W" omits blank lines that follow the found function at
the end of the file, just like it omits blank lines before the next
function.

* rs/grep-omit-blank-lines-after-function-at-eof:
  grep: -W: skip trailing empty lines at EOF, too
2024-08-08 10:41:19 -07:00
Junio C Hamano
028cf22904 Merge branch 'dd/notes-empty-no-edit-by-default'
"git notes add -m '' --allow-empty" and friends that take prepared
data to create notes should not invoke an editor, but it started
doing so since Git 2.42, which has been corrected.

* dd/notes-empty-no-edit-by-default:
  notes: do not trigger editor when adding an empty note
2024-08-08 10:41:19 -07:00
Junio C Hamano
c2058b2a85 Merge branch 'es/shell-check-updates'
Test script linter has been updated to catch an attempt to use
one-shot export construct "VAR=VAL func" for shell functions (which
does not work for some shells) better.

* es/shell-check-updates:
  check-non-portable-shell: improve `VAR=val shell-func` detection
  check-non-portable-shell: suggest alternative for `VAR=val shell-func`
  check-non-portable-shell: loosen one-shot assignment error message
  t4034: fix use of one-shot variable assignment with shell function
  t3430: drop unnecessary one-shot "VAR=val shell-func" invocation
2024-08-08 10:41:18 -07:00
Junio C Hamano
d70f3208bc Merge branch 'rj/add-p-pager'
A 'P' command to "git add -p" that passes the patch hunk to the
pager has been added.

* rj/add-p-pager:
  add-patch: render hunks through the pager
  pager: introduce wait_for_pager
  pager: do not close fd 2 unnecessarily
  add-patch: test for 'p' command
2024-08-08 10:41:18 -07:00
Junio C Hamano
f250b51b49 Merge branch 'ks/unit-test-comment-typofix'
Typofix.

* ks/unit-test-comment-typofix:
  unit-tests/test-lib: fix typo in check_pointer_eq() description
2024-08-08 10:41:17 -07:00
AbdAlRahman Gad
203a9bf091 t7004: make use of write_script
Use write_script which takes care of emitting the `#!/bin/sh` line
and the `chmod +x`.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:01 -07:00
AbdAlRahman Gad
2f44f11b0a t7004: use single quotes instead of double quotes
Some test bodies and test description are surrounded with double
quotes instead of single quotes, violating our coding style.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:01 -07:00
AbdAlRahman Gad
c07b695c15 t7004: begin the test body on the same line as test_expect_success
Test body should begin with a single quote right after the test
description instead of backslash followed by new line.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:01 -07:00
AbdAlRahman Gad
8975df91ff t7004: description on the same line as test_expect_success
There are several tests in t7004 where the test description that
follows `test_expect_success` is on a separate line, violating our
coding style. Adapt these to be on the same line.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:01 -07:00
AbdAlRahman Gad
c4e00c1c6b t7004: do not prepare things outside test_expect_success
Do not prepare expect and other things outside test_expect_success.
If such code fails for some reason, we won't necessarily hear about
it in a timely fashion (or perhaps at all). By placing all code inside
`test_expect_success` it ensures that we know immediately if it fails.

Also add '\' before EOF to avoid shell interpolation and '-' to allow
indentation of the body.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:01 -07:00
AbdAlRahman Gad
52a6674a4d t7004: use indented here-doc
Use <<-\EOF instead of <<\EOF where the latter allows us to indent
the body of the here-doc.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:00 -07:00
AbdAlRahman Gad
95fc11b6fd t7004: one command per line
One of the tests in t7004 has multiple commands on a single line,
which is discouraged. Adapt these by splitting up these into one
line per command.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:00 -07:00
AbdAlRahman Gad
ea62c4f947 t7004: remove space after redirect operators
Modernize 't7004' by removing whitespace after redirect operators.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:27:00 -07:00
Patrick Steinhardt
f234df07f6 reftable/stack: handle locked tables during auto-compaction
When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:43 -07:00
Patrick Steinhardt
ed1ad6b44d reftable/stack: fix corruption on concurrent compaction
The locking employed by compaction uses the following schema:

  1. Lock "tables.list" and verify that it matches the version we have
     loaded in core.

  2. Lock each of the tables in the user-supplied range of tables that
     we are supposed to compact. These locks prohibit any concurrent
     process to compact those tables while we are doing that.

  3. Unlock "tables.list". This enables concurrent processes to add new
     tables to the stack, but also allows them to compact tables outside
     of the range of tables that we have locked.

  4. Perform the compaction.

  5. Lock "tables.list" again.

  6. Move the compacted table into place.

  7. Write the new order of tables, including the compacted table, into
     the lockfile.

  8. Commit the lockfile into place.

Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:43 -07:00
Patrick Steinhardt
128b9aa3e9 reftable/stack: use lock_file when adding table to "tables.list"
When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.

Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:43 -07:00
Patrick Steinhardt
7ee307da1b reftable/stack: do not die when fsyncing lock file files
We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.

Adapt accordingly and use `fsync_component()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:43 -07:00
Patrick Steinhardt
558f6fbeb1 reftable/stack: simplify tracking of table locks
When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:42 -07:00
Patrick Steinhardt
5f0ed603a1 reftable/stack: update stats on failed full compaction
When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.

Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:42 -07:00
Patrick Steinhardt
8030100bda reftable/stack: test compaction with already-locked tables
We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:42 -07:00
Patrick Steinhardt
9a833ca35d reftable/stack: extract function to setup stack with N tables
We're about to add two tests, and both of them will want to initialize
the reftable stack with a set of N tables. Introduce a new function that
handles this and refactor existing tests that use such a setup to use
it.

Note that this changes the exact records contained in the preexisting
tests. This is fine though as we only care about the shape of the stack
here, not the shape of each table.

Furthermore, with this change we now start to disable auto compaction
when writing the tables, as otherwise we might not end up with the
expected amount of new tables added. This also slightly changes the
behaviour of these tests, but the properties we care for remain intact.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:42 -07:00
Patrick Steinhardt
ed7d2f4770 reftable/stack: refactor function to gather table sizes
Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 10:14:41 -07:00
shejialuo
1c31be45b3 fsck: add ref name check for files backend
The git-fsck(1) only implicitly checks the reference, it does not fully
check refs with bad format name such as standalone "@".

However, a file ending with ".lock" should not be marked as having a bad
ref name. It is expected that concurrent writers may have such lock files.
We currently ignore this situation. But for bare ".lock" file, we will
report it as error.

In order to provide such checks, add a new fsck message id "badRefName"
with default ERROR type. Use existing "check_refname_format" to explicit
check the ref name. And add a new unit test to verify the functionality.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:53 -07:00
shejialuo
a7600b8481 files-backend: add unified interface for refs scanning
For refs and reflogs, we need to scan its corresponding directories to
check every regular file or symbolic link which shares the same pattern.
Introduce a unified interface for scanning directories for
files-backend.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:53 -07:00
shejialuo
bf061d26c7 builtin/refs: add verify subcommand
Introduce a new subcommand "verify" in git-refs(1) to allow the user to
check the reference database consistency and also this subcommand will
be used as the entry point of checking refs for "git-fsck(1)".

Add "verbose" field into "fsck_options" to indicate whether we should
print verbose messages when checking refs and objects consistency.

Remove bit-field for "strict" field, this is because we cannot take
address of a bit-field which makes it unhandy to set member variables
when parsing the command line options.

The "git-fsck(1)" declares "fsck_options" variable with "static"
identifier which avoids complaint by the leak-checker. However, in
"git-refs verify", we need to do memory clean manually. Thus add
"fsck_options_clear" function in "fsck.c" to provide memory clean
operation.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:53 -07:00
shejialuo
ab6f79d8df refs: set up ref consistency check infrastructure
The "struct ref_store" is the base class which contains the "be" pointer
which provides backend-specific functions whose interfaces are defined
in the "ref_storage_be". We could reuse this polymorphism to define only
one interface. For every backend, we need to provide its own function
pointer.

The interfaces defined in the `ref_storage_be` are carefully structured
in semantic. It's organized as the five parts:

1. The name and the initialization interfaces.
2. The ref transaction interfaces.
3. The ref internal interfaces (pack, rename and copy).
4. The ref filesystem interfaces.
5. The reflog related interfaces.

To keep consistent with the git-fsck(1), add a new interface named
"fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be
grouped into any above five categories. Explicitly add blank line to
make it different from others.

Last, implement placeholder functions for each ref backends.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:53 -07:00
shejialuo
2de307cdb2 fsck: add refs report function
Introduce a new struct "fsck_ref_report" to contain the information we
need when reporting refs-related messages.

With the new "fsck_vreport" function, add a new function
"fsck_report_ref" to report refs-related fsck error message. Unlike
"report" function uses the exact parameters, we simply pass "struct
fsck_ref_report *report" as the parameter. This is because at current we
don't know exactly how many fields we need. By passing this parameter,
we don't need to change this function prototype when we want to add more
information into "fsck_ref_report".

We have introduced "fsck_report_ref" function to report the error
message for refs. We still need to add the corresponding callback
function. Create refs-specific "error_func" callback
"fsck_refs_error_function".

Last, add "FSCK_REFS_OPTIONS_DEFAULT" macro to create default options
when checking ref consistency.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:52 -07:00
shejialuo
3473d18fad fsck: add a unified interface for reporting fsck messages
The static function "report" provided by "fsck.c" aims at checking error
type and calling the callback "error_func" to report the message. Both
refs and objects need to check the error type of the current fsck
message. In order to extract this common behavior, create a new function
"fsck_vreport". Instead of using "...", provide "va_list" to allow more
flexibility.

Instead of changing "report" prototype to be align with the
"fsck_vreport" function, we leave the "report" prototype unchanged due
to the reason that there are nearly 62 references about "report"
function. Simply change "report" function to use "fsck_vreport" to
report objects related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:52 -07:00
shejialuo
0ec5dfe8c4 fsck: make "fsck_error" callback generic
The "fsck_error" callback is designed to report the objects-related
error messages. It accepts two parameter "oid" and "object_type" which
is not generic. In order to provide a unified callback which can report
either objects or refs, remove the objects-related parameters and add
the generic parameter "void *fsck_report".

Create a new "fsck_object_report" structure which incorporates the
removed parameters "oid" and "object_type". Then change the
corresponding references to adapt to new "fsck_error" callback.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:52 -07:00
shejialuo
8cd4a447b8 fsck: rename objects-related fsck error functions
The names of objects-related fsck error functions are generic. It's OK
when there is only object database check. However, we are going to
introduce refs database check report function. To avoid ambiguity,
rename object-related fsck error functions to explicitly indicate these
functions are used to report objects-related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:52 -07:00
shejialuo
2d79aa9095 fsck: rename "skiplist" to "skip_oids"
The "skiplist" field in "fsck_options" is related to objects. Because we
are going to introduce ref consistency check, the "skiplist" name is too
general which will make the caller think "skiplist" is related to both
the refs and objects.

It may seem that for both refs and objects, we should provide a general
"skiplist" here. However, the type for "skiplist" is `struct oidset`
which is totally unsuitable for refs.

To avoid above ambiguity, rename "skiplist" to "skip_oids".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:36:52 -07:00
Patrick Steinhardt
6f1e9394e2 object: fix leaking packfiles when closing object store
When calling `raw_object_store_clear()`, we close and free several
resources associated with the object store. Part of that is to close and
free all the packfiles, which is handled by `close_object_store()`.

That function really only ends up closing the packfiles though, but it
doesn't free them. And in fact it can't, as that function is being
called via `run_command()` when `close_object_store = 1`, which is done
e.g. when we execute git-maintenance(1). At that point, other structures
may still have references on those packfiles, and thus we cannot free
them here. So while it is in fact intentional that we really only close
them, the result is a memory leak because `raw_object_store_clear()`
does not free them, either.

Fix the leak by freeing the packfiles in `raw_object_store_clear()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:22:21 -07:00
Patrick Steinhardt
fa0f27a19d submodule: fix leaking seen submodule names
We keep track of submodules we have already seen via a string map such
that we don't process the same submodule twice. We never free that map
though, causing a memory leak.

Fix this leak by clearing the map.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:22:21 -07:00
Patrick Steinhardt
1a7e5efdb0 submodule: fix leaking fetch tasks
When done with a fetch task used for parallel fetches of submodules, we
need to both call `fetch_task_release()` to release the task's contents
and `free()` to release the task itself. Most sites do this already, but
some only call `fetch_task_release()` and thus leak memory.

While we could trivially fix this by adding the two missing calls to
free(3P), the result would be that we always call both functions. Let's
thus refactor the code such that `fetch_task_release()` also frees the
structure itself. Rename it to `fetch_task_free()` accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:22:21 -07:00
Patrick Steinhardt
c369fc46d0 builtin/submodule: allow "add" to use different ref storage format
Same as with "clone", users may want to add a submodule to a repository
with a non-default ref storage format. Wire up a new `--ref-format=`
option that works the same as for `git submodule clone`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:22:21 -07:00
Patrick Steinhardt
fb99dded31 refs: fix ref storage format for submodule ref stores
When opening a submodule ref storage we accidentally use the ref storage
format of the owning repository, not of the submodule repository. As
submodules may have a different storage format than their parent repo
this can lead to bugs when trying to access the submodule ref storage
from the parent repository.

One such bug was reported when performing a recursive pull with mixed
ref stores, which fails with:

    $ git pull --recursive
    fatal: Unable to find current revision in submodule path 'path/to/sub'

The same issue occurs when adding a repository contained in the working
tree with a different ref storage format via `git submodule add`.

Fix the bug by using the submodule repository's ref storage format
instead and add some tests. Note that the test for `git submodule
status` was included as a precaution, only. The command worked alright
even without the bugfix.

Reported-by: Jeppe Øland <joland@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:22:21 -07:00
Patrick Steinhardt
69814846ab builtin/clone: propagate ref storage format to submodules
When recursively cloning a repository with a non-default ref storage
format, e.g. by passing the `--ref-format=` option, then only the
top-level repository will end up using that ref storage format, and
all recursively cloned submodules will instead use the default format.

While mixed-format constellations are expected to work alright, the
outcome still is somewhat surprising as we have essentially ignored
the user's request.

Fix this by propagating the requested ref format to cloned submodules.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:21:39 -07:00
Patrick Steinhardt
5ac781ad62 builtin/submodule: allow cloning with different ref storage format
As submodules are proper self-contained repositories, it is perfectly
valid for them to have a different ref storage format than their parent
repository. There is no obvious way for users to ask for the ref storage
format when initializing submodules though. Whether the setup of such
mixed-ref-storage-format constellations is all that useful remains to be
seen. But there is no good reason to not expose such an option, and we
will require it in a subsequent patch.

Introduce a new `--ref-format=` option for git-submodule(1) that allows
the user to pick the ref storage format. This option will also be used
in a subsequent commit, where we start to propagate the same flag from
git-clone(1) to cloning submodules with the `--recursive` switch.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:20:49 -07:00
Patrick Steinhardt
d9ab8788e1 git-submodule.sh: break overly long command lines
For most of the subcommands of git-submodule(1), we end up passing a
bunch of arguments to the submodule helper. This quickly leads to overly
long lines, where it becomes hard to spot what has changed when one
needs to modify them.

Break up these lines into one argument per line, similarly to how it is
done for the "clone" subcommand already.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:20:48 -07:00
Patrick Steinhardt
6ce8ffe30e transport: mark more tests leak-free
After fixing a transport leak, a few more tests have become
leak-free.  Mark them as such.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08 09:16:21 -07:00
Junio C Hamano
448d51d549 transport: fix leak with transport helper URLs
Transport URLs can be prefixed with "foo::", which would tell us that
the transport uses a remote helper called "foo". We extract the helper
name by `xstrndup()`ing the prefix before the double-colons, but never
free that string.

Fix this leak by assigning the result to a separate local variable that
we can then free upon returning.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-07 17:38:31 -07:00
Junio C Hamano
92a29c2c39 Merge branch 'ps/refs-wo-the-repository' into ps/config-wo-the-repository
* ps/refs-wo-the-repository:
  refs/reftable: stop using `the_repository`
  refs/packed: stop using `the_repository`
  refs/files: stop using `the_repository`
  refs/files: stop using `the_repository` in `parse_loose_ref_contents()`
  refs: stop using `the_repository`
2024-08-07 14:13:20 -07:00
Junio C Hamano
90b801d8ff Merge branch 'ps/leakfixes-part-3' into ps/leakfixes-part-4
* ps/leakfixes-part-3: (24 commits)
  commit-reach: fix trivial memory leak when computing reachability
  convert: fix leaking config strings
  entry: fix leaking pathnames during delayed checkout
  object-name: fix leaking commit list items
  t/test-repository: fix leaking repository
  builtin/credential-cache: fix trivial leaks
  builtin/worktree: fix leaking derived branch names
  builtin/shortlog: fix various trivial memory leaks
  builtin/rerere: fix various trivial memory leaks
  builtin/credential-store: fix leaking credential
  builtin/show-branch: fix several memory leaks
  builtin/rev-parse: fix memory leak with `--parseopt`
  builtin/stash: fix various trivial memory leaks
  builtin/remote: fix various trivial memory leaks
  builtin/remote: fix leaking strings in `branch_list`
  builtin/ls-remote: fix leaking `pattern` strings
  builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
  builtin/submodule--helper: fix leaking clone depth parameter
  builtin/name-rev: fix various trivial memory leaks
  builtin/describe: fix trivial memory leak when describing blob
  ...
2024-08-06 12:40:41 -07:00
Taylor Blau
fcb2205b77 midx: implement support for writing incremental MIDX chains
Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.

This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:

    $ git multi-pack-index write --incremental

The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:

  - The `compute_sorted_entries()` function is taught to reject objects
    which appear in any existing MIDX layer.

  - Functions like `write_midx_revindex()` are adjusted to write
    pack_order values which are offset by the number of objects in the
    base MIDX layer.

  - The end of `write_midx_internal()` is adjusted to move
    non-incremental MIDX files when necessary (i.e. when creating an
    incremental chain with an existing non-incremental MIDX in the
    repository).

There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).

The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:

  1. Most of the "interesting" behavior is already thoroughly covered in
     t5319-multi-pack-index.sh, which handles the core logic of reading
     objects through a MIDX.

     The new tests in t5334-incremental-multi-pack-index.sh are mostly
     focused on creating and destroying incremental MIDXs, as well as
     stitching their results together across layers.

  2. A new GIT_TEST environment variable is added called
     "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
     entire test suite to write incremental MIDXs after repacking when
     combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.

     This exercises the long tail of other interesting behavior that is
     defined implicitly throughout the rest of the CI suite. It is
     likewise added to the linux-TEST-vars job.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06 12:01:39 -07:00