"git maintenance" lacked the care "git gc" had to avoid holding
onto the repository lock for too long during packing refs, which
has been remedied.
* ps/maintenance-ref-lock:
builtin/maintenance: fix locking race when handling "gc" task
builtin/gc: avoid global state in `gc_before_repack()`
usage: allow dying without writing an error message
builtin/maintenance: fix locking race with refs and reflogs tasks
builtin/maintenance: split into foreground and background tasks
builtin/maintenance: fix typedef for function pointers
builtin/maintenance: extract function to run tasks
builtin/maintenance: stop modifying global array of tasks
builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
builtin/maintenance: centralize configuration of explicit tasks
builtin/gc: drop redundant local variable
builtin/gc: use designated field initializers for maintenance tasks
"git whatchanged" that is longer to type than "git log --raw"
which is its modern rough equivalent has outlived its usefulness
more than 10 years ago. Plan to deprecate and remove it.
* jc/you-still-use-whatchanged:
whatschanged: list it in BreakingChanges document
whatchanged: remove when built with WITH_BREAKING_CHANGES
whatchanged: require --i-still-use-this
tests: prepare for a world without whatchanged
doc: prepare for a world without whatchanged
you-still-use-that??: help deprecating commands for removal
"git stash" recorded a wrong branch name when submodules are
present in the current checkout, which has been corrected.
* kj/stash-onbranch-submodule-fix:
stash: fix incorrect branch name in stash message
A memory leak on an error code path has been plugged.
* ly/submodule-update-failure-leakfix:
builtin/submodule--helper: fix leak when remote_submodule_branch() failed
"git diff --no-index dirA dirB" can limit the comparison with
pathspec at the end of the command line, just like normal "git
diff".
* jk/diff-no-index-with-pathspec:
diff --no-index: support limiting by pathspec
pathspec: add flag to indicate operation without repository
pathspec: add match_leading_pathspec variant
"git cat-file --batch" learns to understand %(objectmode) atom to
allow the caller to tell missing objects (due to repository
corruption) and submodules (whose commit objects are OK to be
missing) apart.
* vd/cat-file-objectmode-update:
cat-file.c: add batch handling for submodules
cat-file: add %(objectmode) atom
t1006: update 'run_tests' to test generic object specifiers
"git pack-objects" learns to find delta bases from blobs at the
same path, using the --path-walk API.
* ds/path-walk-2:
pack-objects: allow --shallow and --path-walk
path-walk: add new 'edge_aggressive' option
pack-objects: thread the path-based compression
pack-objects: refactor path-walk delta phase
scalar: enable path-walk during push via config
pack-objects: enable --path-walk via config
repack: add --path-walk option
t5538: add tests to confirm deltas in shallow pushes
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
p5313: add performance tests for --path-walk
pack-objects: update usage to match docs
pack-objects: add --path-walk option
pack-objects: extract should_attempt_deltas()
When creating a stash, Git uses the current branch name
of the superproject to construct the stash commit message.
However, in repositories with submodules,
the message may mistakenly display the submodule branch name instead.
This is because `refs_resolve_ref_unsafe()` returns a pointer to a static buffer.
Subsequent calls to the same function overwrite the buffer,
corrupting the originally fetched `branch_name` used for the stash message.
Use `xstrdup()` to duplicate the branch name immediately after resolving it,
so that later buffer overwrites do not affect the stash message.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/submodule--helper.c:update_submodule(), the variable
remote_name is allocated in get_default_remote_submodule() but
may be leaked if remote_submodule_branch() fails. Although it is
unlikely that remote_submodule_branch() would fail after successfully
obtaining a remote ref name from get_default_remote_submodule(),
it is still possible. To prevent a potential memory leak, add a
call to free(remote_name) at the early exit point.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The support for assuming "push" when "-p" is given introduced in
9e140909f6 (stash: allow pathspecs in the no verb form, 2017-02-28) is
very narrow, neither "git stash -m <message> -p <pathspec>" nor "git
stash --patch <pathspec>" imply "push" and die instead. Relax this by
passing PARSE_OPT_STOP_AT_NON_OPTION when push is being assumed and then
setting "force_assume" if "--patch" was present. This means "git stash
<pathspec> -p" still dies so that it does not assume the user meant
"push" if they mistype a subcommand name but "git stash -m <message> -p
<pathspec>" will now succeed. The test added in the last commit is
adjusted to check that push is still assumed when "--patch" comes after
other options on the command-line.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically "git stash [<options>]" was assumed to mean "git stash save
[<options>]". Since 1ada5020b3 (stash: use stash_push for no verb form,
2017-02-28) it is assumed to mean "git stash push [<options>]". As the
push subcommand supports pathspecs, 9e140909f6 (stash: allow pathspecs
in the no verb form, 2017-02-28) allowed "git stash -p <pathspec>" to
mean "git stash push -p <pathspec>". This was broken in 8c3713cede
(stash: eliminate crude option parsing, 2020-02-17) which failed to
account for "push" being added to the start of argv in cmd_stash()
before it calls push_stash() and kept looking in argv[0] for "-p" after
moving the code to push_stash().
Fix this by regression by checking argv[1] instead of argv[0] and add a
couple of tests to prevent future regressions.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pretty.c:repo_logmsg_reencode() allocated memory should be freed with
repo_unuse_commit_buffer(). Callers sometimes forgot free it at exit
point. Add `repo_unuse_commit_buffer()` in insert_records_from_trailers
at builtin/shortlog.c and create_commit at builtin/replay.c
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit-graph.c:graph_write(), if read_one_commit() failed,
progress allocated in start_delayed_progress() will leak. Add
stop_progress() before goto cleanup.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/fetch-pack.c:cmd_fetch_pack(), if finish_connect() failed,
it returns error code without cleanup which cause memory leak. Add
cleanup label before frees in the end of cmd_fetch_pack(), and add
`goto cleanup` if finish_connect() failed.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object specification is passed to 'cat-file --batch[-check]'
referring to a submodule (e.g. 'HEAD:path/to/my/submodule'), the current
behavior of the command is to print the "missing" error message. However, it
is often valuable for callers to distinguish between paths that are actually
missing and "the submodule tree entry exists, but the object does not exist
in the repository".
To disambiguate without needing to invoke a separate Git process (e.g.
'ls-tree'), print the message "<oid> submodule" for such objects instead of
"<object> missing". In addition to the change from "missing" to "submodule",
the new message differs from the old in that it always prints the resolved
tree entry's OID, rather than the input object specification.
Note that this implementation maintains a distinction between submodules
where the commit OID is not present in the repo, and submodules where the
commit OID *is* present; the former will now print "<object> submodule", but
the latter will still print the full object content.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a formatting atom, used with the --batch-check/--batch-command options,
that prints the octal representation of the object mode if a given revision
includes that information, e.g. one that follows the format
<tree-ish>:<path>. If the mode information does not exist, an empty string
is printed instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
BUG() is not end-user facing but programmer facing, and we do not
use _("...") in them. Replace all `BUG(_("..."))` with `BUG("...")`
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "gc" task has a similar locking race as the one that we have fixed
for the "pack-refs" and "reflog-expire" tasks in preceding commits. Fix
this by splitting up the logic of the "gc" task:
- We execute `gc_before_repack()` in the foreground, which contains
the logic that git-gc(1) itself would execute in the foreground, as
well.
- We spawn git-gc(1) after detaching, but with a new hidden flag that
suppresses calling `gc_before_repack()`.
Like this we have roughly the same logic as git-gc(1) itself and know to
repack refs and reflogs before detaching, thus fixing the race.
Note that `gc_before_repack()` is renamed to `gc_foreground_tasks()` to
better reflect what this function does.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `gc_before_repack()` should only ever run once in git-gc(1), but we
may end up calling it twice when the "--detach" flag is passed. The
duplicated call is avoided though via a static flag in this function.
This pattern is somewhat unintuitive though. Refactor it to drop the
static flag and instead guard the second call of `gc_before_repack()`
via `opts.detach`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes code wants to die in a situation where it already has written
an error message. To use the same error code as `die()` we have to use
`exit(128)`, which is easy to get wrong and leaves magic numbers all
over our codebase.
Teach `die_message_builtin()` to not print any error when passed a
`NULL` pointer as error string. Like this, such users can now call
`die(NULL)` to achieve the same result without any hardcoded error
codes.
Adapt a couple of builtins to use this new pattern to demonstrate that
there is a need for such a helper.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in the preceding commit, git-gc(1) knows to detach only
after it has already packed references and expired reflogs. This is done
to avoid racing around their respective lockfiles.
Adapt git-maintenance(1) accordingly and run the "pack-refs" and
"reflog-expire" tasks in the foreground. Note that the "gc" task has the
same issue, but the fix is a bit more involved there and will thus be
done in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both git-gc(1) and git-maintenance(1) have logic to daemonize so that
the maintenance tasks are performed in the background. git-gc(1) has
some special logic though to not perform _all_ housekeeping tasks in the
background: both references and reflogs are still handled synchronously
in the foreground.
This split exists because otherwise it may easily happen that git-gc(1)
keeps the "packed-refs" file locked for an extended amount of time,
where the next Git command that wants to modify any reference could now
fail. This was especially important in the past, where git-gc(1) was
still executed directly as part of our automatic maintenance: git-gc(1)
was invoked via `git gc --auto --detach`, so we knew to handle most of
the maintenance tasks in the background while doing those parts that may
cause locking issues in the foreground.
We have since moved to git-maintenance(1), which is a more flexible
replacement for git-gc(1). By default this command runs git-gc(1), only,
but it can be configured to run different tasks, as well. This command
does not know about the split between maintenance tasks that should run
before and after detach though, and this has led to several bug reports
about spurious locking errors for the "packed-refs" file.
Prepare for a fix by introducing this split for maintenance tasks. Note
that this commit does not yet change any of the tasks, so there should
not (yet) be a change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The typedefs for `maintenance_task_fn` and `maintenance_auto_fn` are
somewhat confusingly not true function pointers. As such, any user of
those typedefs needs to manually add the pointer to make use of them.
Fix this by making these true function pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the function to run maintenance tasks. This function will be
reused in a subsequent commit where we introduce a split between
maintenance tasks that run before and after daemonizing the process.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When configuring maintenance tasks run by git-maintenance(1) we do so by
modifying the global array of tasks directly. This is already quite bad
on its own, as global state makes for logic that is hard to follow.
Even more importantly though we use multiple different fields to track
whether or not a task should be run:
- "enabled" tracks the "maintenance.*.enabled" config key. This field
disables execution of a task, unless the user has explicitly asked
for the task.
- "selected_order" tracks the order in which jobs have been asked for
by the user via the "--task=" command line option. It overrides
everything else, but only has an effect if at least one job has been
selected.
- "schedule" tracks the schedule priority for a job, that is how often
it should run. This field only plays a role when the user has passed
the "--schedule=" command line option.
All of this makes it non-trivial to figure out which job really should
be running right now. The logic to configure these fields and the logic
that interprets them is distributed across multiple functions, making it
even harder to follow it.
Refactor the logic so that we stop modifying global state. Instead, we
now compute which jobs should be run in `initialize_task_config()`,
represented as an array of jobs to run that is stored in the options
structure. Like this, all logic becomes self-contained and any users of
this array only need to iterate through the tasks and execute them one
by one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--task=" option explicitly allows the user to say which maintenance
tasks should be run, whereas "--schedule=" only respects the maintenance
strategy configured for a specific repository. As such, it is not
sensible to accept both options at the same time.
Mark them as incompatible with one another. While at it, also convert
the existing logic that marks "--auto" and "--schedule=" as incompatible
to use `die_for_incompatible_opt2()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users of git-maintenance(1) can explicitly ask it to run specific tasks
by passing the `--task=` command line option. This option can be passed
multiple times, which causes us to execute tasks in the same order as
the tasks have been provided by the user.
The order in which tasks are run is computed in `task_option_parse()`:
every time we parse such a command line argument, we modify the global
array of tasks by seting the selected index for that specific task.
This has two downsides:
- We modify global state, which makes it hard to follow the logic.
- The configuration of tasks is split across multiple different
functions, so it is not easy to figure out the different factors
that play a role in selecting tasks.
Refactor the logic so that `task_option_parse()` does not modify global
state anymore. Instead, this function now only collects the list of
configured tasks. The logic to configure ordering of the respective
tasks is then deferred to `initialize_task_config()`.
This refactoring solves the second problem, that the configuration of
tasks is spread across multiple different locations. The first problem,
that we modify global state, will be fixed in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two different variables that track the quietness for git-gc(1):
- The local variable `quiet`, which we wire up.
- The `quiet` field of `struct maintenance_run_opts`.
This leads to confusion which of these variables should be used and what
the respective effect is.
Simplify this logic by dropping the local variable in favor of the
options field.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the array of maintenance tasks to use designated field
initializers. This makes it easier to add more fields to the struct
without having to modify all tasks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OpenBSD / NetBSD use HW_PHYSMEM64 to detect the amount of physical
memory in a system. HW_PHYSMEM will not provide the correct amount
on a system with >=4GB of memory.
Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the design of signature handling is still being discussed, it is
likely that the data stream produced by the code in Git 2.50 would
have to be changed in such a way that is not backward compatible.
Mark the feature as experimental and discourge its use for now.
Also flip the default on the generation side to "strip"; users of
existing versions would not have passed --signed-commits=strip and
will be broken by this change if the default is made to abort, and
will be encouraged by the error message to produce data stream with
future breakage guarantees by passing --signed-commits option.
As we tone down the default behaviour, we no longer need the
FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was
not discoverable enough.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git receive-pack" optionally learns not to care about connectivity
check, which can be useful when the repository arranges to ensure
connectivity by some other means.
* jt/receive-pack-skip-connectivity-check:
builtin/receive-pack: add option to skip connectivity check
t5410: test receive-pack connectivity check
Assorted fixes for issues found with CodeQL.
* js/misc-fixes:
sequencer: stop pretending that an assignment is a condition
bundle-uri: avoid using undefined output of `sscanf()`
commit-graph: avoid using stale stack addresses
trace2: avoid "futile conditional"
Avoid redundant conditions
fetch: avoid unnecessary work when there is no current branch
has_dir_name(): make code more obvious
upload-pack: rename `enum` to reflect the operation
commit-graph: avoid malloc'ing a local variable
fetch: carefully clear local variable's address after use
commit: simplify code
"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.
* ds/sparse-apply-add-p:
p2000: add performance test for patch-mode commands
reset: integrate sparse index with --patch
git add: make -p/-i aware of sparse index
apply: integrate with the sparse index
"git merge-tree" learned an option to see if it resolves cleanly
without actually creating a result.
* en/merge-tree-check:
merge-tree: add a new --quiet flag
merge-ort: add a new mergeability_only option
Support to create a loose object file with unknown object type has
been dropped.
* jk/no-funny-object-types:
object-file: drop support for writing objects with unknown types
hash-object: handle --literally with OPT_NEGBIT
hash-object: merge HASH_* and INDEX_* flags
hash-object: stop allowing unknown types
t: add lib-loose.sh
t/helper: add zlib test-tool
oid_object_info(): drop type_name strbuf
fsck: stop using object_info->type_name strbuf
oid_object_info_convert(): stop using string for object type
cat-file: use type enum instead of buffer for -t option
object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
cat-file: make --allow-unknown-type a noop
object-file.h: fix typo in variable declaration
The dependency on the_repository variable has been reduced from the
code paths in "git replay".
* en/replay-wo-the-repository:
replay: replace the_repository with repo parameter passed to cmd_replay ()
The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.
When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:
$ git diff-tree -r <treeish1> <treeish2>.
where the treeish could point to a subdirectory of the repository.
When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.
The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.
Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.
For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.
Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.
Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.
After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.
Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.
Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.
The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.
Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.
Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.
If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.
To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.
Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During git-receive-pack(1), connectivity of the object graph is
validated to ensure that the received packfile does not leave the
repository in a broken state. This is done via git-rev-list(1) and
walking the objects, which can be expensive for large repositories.
Generally, this check is critical to avoid an incomplete received
packfile from corrupting a repository. Server operators may have
additional knowledge though around exactly how Git is being used on the
server-side which can be used to facilitate more efficient connectivity
computation of incoming objects.
For example, if it can be ensured that all objects in a repository are
connected and do not depend on any missing objects, the connectivity of
newly written objects can be checked by walking the object graph
containing only the new objects from the updated tips and identifying
the missing objects which represent the boundary between the new objects
and the repository. These boundary objects can be checked in the
canonical repository to ensure the new objects connect as expected and
thus avoid walking the rest of the object graph.
Git itself cannot make the guarantees required for such an optimization
as it is possible for a repository to contain an unreachable object that
references a missing object without the repository being considered
corrupt.
Introduce the --skip-connectivity-check option for git-receive-pack(1)
which bypasses this connectivity check to give more control to the
server-side. Note that without proper server-side validation of newly
received objects handled outside of Git, usage of this option risks
corrupting a repository.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* jk/oidmap-cleanup:
raw_object_store: drop extra pointer to replace_map
oidmap: add size function
oidmap: rename oidmap_free() to oidmap_clear()
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit. This option allows the merge machinery to, in the
outer layer of the merge:
* exit early when a conflict is detected
* avoid writing (most) merged blobs/trees to the object store
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There does not appear to be anything particularly incompatible about the
--shallow and --path-walk options of 'git pack-objects'. If shallow
commits are to be handled differently, then it is by the revision walk
that defines the commit set and which are interesting or uninteresting.
However, before the previous change, a trivial removal of the warning
would cause a failure in t5500-fetch-pack.sh when
GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more
objects than we desired, due to some incorrect behavior of the path-walk
API, especially around walking uninteresting objects.
The recently-added tests in t5538-push-shallow.sh help to confirm this
behavior is working with the --path-walk option if
GIT_TEST_PACK_PATH_WALK is enabled. These tests passed previously due to
the --path-walk feature being disabled in the presence of a shallow
clone.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapting the implementation of ll_find_deltas(), create a threaded
version of the --path-walk compression step in 'git pack-objects'.
This involves adding a 'regions' member to the thread_params struct,
allowing each thread to own a section of paths. We can simplify the way
jobs are split because there is no value in extending the batch based on
name-hash the way sections of the object entry array are attempted to be
grouped. We re-use the 'list_size' and 'remaining' items for the purpose
of borrowing work in progress from other "victim" threads when a thread
has finished its batch of work more quickly.
Using the Git repository as a test repo, the p5313 performance test
shows that the resulting size of the repo is the same, but the threaded
implementation gives gains of varying degrees depending on the number of
objects being packed. (This was tested on a 16-core machine.)
Test HEAD~1 HEAD
---------------------------------------------------
5313.20: big pack 2.38 1.99 -16.4%
5313.21: big pack size 16.1M 16.0M -0.2%
5313.24: repack 107.32 45.41 -57.7%
5313.25: repack size 213.3M 213.2M -0.0%
(Test output is formatted to better fit in message.)
This ~60% reduction in 'git repack --path-walk' time is typical across
all repos I used for testing. What is interesting is to compare when the
overall time improves enough to outperform the --name-hash-version=1
case. These time improvements correlate with repositories with data
shapes that significantly improve their data size as well. The
--path-walk feature frequently takes longer than --name-hash-version=2,
trading some extra computation for some additional compression. The
natural place where this additional computation comes from is the two
compression passes that --path-walk takes, though the first pass is
naturally faster due to the path boundaries avoiding a number of delta
compression attempts.
For example, the microsoft/fluentui repo has significant size reduction
from --name-hash-version=1 to --name-hash-version=2 followed by further
improvements with --path-walk. The threaded computation makes
--path-walk more competitive in time compared to --name-hash-version=2,
though still ~31% more expensive in that metric.
Repack Method Pack Size Time
------------------------------------------
Hash v1 439.4M 87.24s
Hash v2 161.7M 21.51s
Path Walk (Before) 142.5M 81.29s
Path Walk (After) 142.5M 28.16s
Similar results hold for the Git repository:
Repack Method Pack Size Time
------------------------------------------
Hash v1 248.8M 30.44s
Hash v2 249.0M 30.15s
Path Walk (Before) 213.2M 142.50s
Path Walk (After) 213.3M 45.41s
...as well as the nodejs/node repository:
Repack Method Pack Size Time
------------------------------------------
Hash v1 739.9M 71.18s
Hash v2 764.6M 67.82s
Path Walk (Before) 698.1M 208.10s
Path Walk (After) 698.0M 75.10s
Finally, the Linux kernel repository is a good test for this repacking
time change, even though the space savings is more subtle:
Repack Method Pack Size Time
------------------------------------------
Hash v1 2.5G 554.41s
Hash v2 2.5G 549.62s
Path Walk (before) 2.2G 1562.36s
Path Walk (before) 2.2G 559.00s
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>