Files
git/Documentation/config/gc.txt
Taylor Blau 4dc16e2cb0 gc: introduce gc.recentObjectsHook
This patch introduces a new multi-valued configuration option,
`gc.recentObjectsHook` as a means to mark certain objects as recent (and
thus exempt from garbage collection), regardless of their age.

When performing a garbage collection operation on a repository with
unreachable objects, Git makes its decision on what to do with those
object(s) based on how recent the objects are or not. Generally speaking,
unreachable-but-recent objects stay in the repository, and older objects
are discarded.

However, we have no convenient way to keep certain precious, unreachable
objects around in the repository, even if they have aged out and would
be pruned. Our options today consist of:

  - Point references at the reachability tips of any objects you
    consider precious, which may be undesirable or infeasible if there
    are many such objects.

  - Track them via the reflog, which may be undesirable since the
    reflog's lifetime is limited to that of the reference it's tracking
    (and callers may want to keep those unreachable objects around for
    longer).

  - Extend the grace period, which may keep around other objects that
    the caller *does* want to discard.

  - Manually modify the mtimes of objects you want to keep. If those
    objects are already loose, this is easy enough to do (you can just
    enumerate and `touch -m` each one).

    But if they are packed, you will either end up modifying the mtimes
    of *all* objects in that pack, or be forced to write out a loose
    copy of that object, both of which may be undesirable. Even worse,
    if they are in a cruft pack, that requires modifying its `*.mtimes`
    file by hand, since there is no exposed plumbing for this.

  - Force the caller to construct the pack of objects they want
    to keep themselves, and then mark the pack as kept by adding a
    ".keep" file. This works, but is burdensome for the caller, and
    having extra packs is awkward as you roll forward your cruft pack.

This patch introduces a new option to the above list via the
`gc.recentObjectsHook` configuration, which allows the caller to
specify a program (or set of programs) whose output is treated as a set
of objects to treat as recent, regardless of their true age.

The implementation is straightforward. Git enumerates recent objects via
`add_unseen_recent_objects_to_traversal()`, which enumerates loose and
packed objects, and eventually calls add_recent_object() on any objects
for which `want_recent_object()`'s conditions are met.

This patch modifies the recency condition from simply "is the mtime of
this object more recent than the cutoff?" to "[...] or, is this object
mentioned by at least one `gc.recentObjectsHook`?".

Depending on whether or not we are generating a cruft pack, this allows
the caller to do one of two things:

  - If generating a cruft pack, the caller is able to retain additional
    objects via the cruft pack, even if they would have otherwise been
    pruned due to their age.

  - If not generating a cruft pack, the caller is likewise able to
    retain additional objects as loose.

A potential alternative here is to introduce a new mode to alter the
contents of the reachable pack instead of the cruft one. One could
imagine a new option to `pack-objects`, say `--extra-reachable-tips`
that does the same thing as above, adding the visited set of objects
along the traversal to the pack.

But this has the unfortunate side-effect of altering the reachability
closure of that pack. If parts of the unreachable object graph mentioned
by one or more of the "extra reachable tips" programs is not closed,
then the resulting pack won't be either. This makes it impossible in the
general case to write out reachability bitmaps for that pack, since
closure is a requirement there.

Instead, keep these unreachable objects in the cruft pack (or set of
unreachable, loose objects) instead, to ensure that we can continue to
have a pack containing just reachable objects, which is always safe to
write a bitmap over.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12 14:12:20 -07:00

159 lines
6.6 KiB
Plaintext

gc.aggressiveDepth::
The depth parameter used in the delta compression
algorithm used by 'git gc --aggressive'. This defaults
to 50, which is the default for the `--depth` option when
`--aggressive` isn't in use.
+
See the documentation for the `--depth` option in
linkgit:git-repack[1] for more details.
gc.aggressiveWindow::
The window size parameter used in the delta compression
algorithm used by 'git gc --aggressive'. This defaults
to 250, which is a much more aggressive window size than
the default `--window` of 10.
+
See the documentation for the `--window` option in
linkgit:git-repack[1] for more details.
gc.auto::
When there are approximately more than this many loose
objects in the repository, `git gc --auto` will pack them.
Some Porcelain commands use this command to perform a
light-weight garbage collection from time to time. The
default value is 6700.
+
Setting this to 0 disables not only automatic packing based on the
number of loose objects, but any other heuristic `git gc --auto` will
otherwise use to determine if there's work to do, such as
`gc.autoPackLimit`.
gc.autoPackLimit::
When there are more than this many packs that are not
marked with `*.keep` file in the repository, `git gc
--auto` consolidates them into one larger pack. The
default value is 50. Setting this to 0 disables it.
Setting `gc.auto` to 0 will also disable this.
+
See the `gc.bigPackThreshold` configuration variable below. When in
use, it'll affect how the auto pack limit works.
gc.autoDetach::
Make `git gc --auto` return immediately and run in background
if the system supports it. Default is true.
gc.bigPackThreshold::
If non-zero, all non-cruft packs larger than this limit are kept
when `git gc` is run. This is very similar to
`--keep-largest-pack` except that all non-cruft packs that meet
the threshold are kept, not just the largest pack. Defaults to
zero. Common unit suffixes of 'k', 'm', or 'g' are supported.
+
Note that if the number of kept packs is more than gc.autoPackLimit,
this configuration variable is ignored, all packs except the base pack
will be repacked. After this the number of packs should go below
gc.autoPackLimit and gc.bigPackThreshold should be respected again.
+
If the amount of memory estimated for `git repack` to run smoothly is
not available and `gc.bigPackThreshold` is not set, the largest pack
will also be excluded (this is the equivalent of running `git gc` with
`--keep-largest-pack`).
gc.writeCommitGraph::
If true, then gc will rewrite the commit-graph file when
linkgit:git-gc[1] is run. When using `git gc --auto`
the commit-graph will be updated if housekeeping is
required. Default is true. See linkgit:git-commit-graph[1]
for details.
gc.logExpiry::
If the file gc.log exists, then `git gc --auto` will print
its content and exit with status zero instead of running
unless that file is more than 'gc.logExpiry' old. Default is
"1.day". See `gc.pruneExpire` for more ways to specify its
value.
gc.packRefs::
Running `git pack-refs` in a repository renders it
unclonable by Git versions prior to 1.5.1.2 over dumb
transports such as HTTP. This variable determines whether
'git gc' runs `git pack-refs`. This can be set to `notbare`
to enable it within all non-bare repos or it can be set to a
boolean value. The default is `true`.
gc.cruftPacks::
Store unreachable objects in a cruft pack (see
linkgit:git-repack[1]) instead of as loose objects. The default
is `true`.
gc.pruneExpire::
When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'
(and 'repack --cruft --cruft-expiration 2.weeks.ago' if using
cruft packs via `gc.cruftPacks` or `--cruft`). Override the
grace period with this config variable. The value "now" may be
used to disable this grace period and always prune unreachable
objects immediately, or "never" may be used to suppress pruning.
This feature helps prevent corruption when 'git gc' runs
concurrently with another process writing to the repository; see
the "NOTES" section of linkgit:git-gc[1].
gc.worktreePruneExpire::
When 'git gc' is run, it calls
'git worktree prune --expire 3.months.ago'.
This config variable can be used to set a different grace
period. The value "now" may be used to disable the grace
period and prune `$GIT_DIR/worktrees` immediately, or "never"
may be used to suppress pruning.
gc.reflogExpire::
gc.<pattern>.reflogExpire::
'git reflog expire' removes reflog entries older than
this time; defaults to 90 days. The value "now" expires all
entries immediately, and "never" suppresses expiration
altogether. With "<pattern>" (e.g.
"refs/stash") in the middle the setting applies only to
the refs that match the <pattern>.
gc.reflogExpireUnreachable::
gc.<pattern>.reflogExpireUnreachable::
'git reflog expire' removes reflog entries older than
this time and are not reachable from the current tip;
defaults to 30 days. The value "now" expires all entries
immediately, and "never" suppresses expiration altogether.
With "<pattern>" (e.g. "refs/stash")
in the middle, the setting applies only to the refs that
match the <pattern>.
+
These types of entries are generally created as a result of using `git
commit --amend` or `git rebase` and are the commits prior to the amend
or rebase occurring. Since these changes are not part of the current
project most users will want to expire them sooner, which is why the
default is more aggressive than `gc.reflogExpire`.
gc.recentObjectsHook::
When considering whether or not to remove an object (either when
generating a cruft pack or storing unreachable objects as
loose), use the shell to execute the specified command(s).
Interpret their output as object IDs which Git will consider as
"recent", regardless of their age. By treating their mtimes as
"now", any objects (and their descendants) mentioned in the
output will be kept regardless of their true age.
+
Output must contain exactly one hex object ID per line, and nothing
else. Objects which cannot be found in the repository are ignored.
Multiple hooks are supported, but all must exit successfully, else the
operation (either generating a cruft pack or unpacking unreachable
objects) will be halted.
gc.rerereResolved::
Records of conflicted merge you resolved earlier are
kept for this many days when 'git rerere gc' is run.
You can also use more human-readable "1.month.ago", etc.
The default is 60 days. See linkgit:git-rerere[1].
gc.rerereUnresolved::
Records of conflicted merge you have not resolved are
kept for this many days when 'git rerere gc' is run.
You can also use more human-readable "1.month.ago", etc.
The default is 15 days. See linkgit:git-rerere[1].