Taylor Blau e199290592 pack-objects: only perform verbatim reuse on the preferred pack
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.

This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.

To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.

If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:

 - include the object twice, assuming that the caller wants Y in the
   pack, or

 - include the object once, resulting in us packing more objects than
   necessary.

This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.

Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.

This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).

The only significant changes from the pre-ca0fd69e37 implementation are:

 - that we can no longer inspect words up to the end of
   reuse_packfile_bitmap->word_alloc, since we only want to look at
   words whose bits all correspond to objects in the given packfile, and

 - that we return early when given a reuse_packfile which is not
   preferred, making the call a noop.

In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 09:13:31 +09:00
2024-09-16 15:27:08 -07:00
2024-10-04 14:21:44 -07:00
2024-05-24 11:40:44 -07:00
2024-10-04 14:21:40 -07:00
2024-09-20 14:40:41 -07:00
2024-09-06 09:31:15 -07:00
2024-09-25 10:37:10 -07:00
2024-09-23 10:35:07 -07:00
2024-06-17 15:55:55 -07:00
2024-07-08 14:53:10 -07:00
2024-07-08 14:53:10 -07:00
2024-09-23 10:35:07 -07:00
2024-09-03 09:15:00 -07:00
2024-09-16 10:46:00 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-08-23 09:02:33 -07:00
2024-04-05 15:21:14 -07:00
2024-10-02 07:46:26 -07:00
2024-10-02 07:46:26 -07:00
2024-09-19 13:46:00 -07:00
2024-06-14 10:26:33 -07:00
2024-10-02 07:46:26 -07:00
2024-08-08 09:36:53 -07:00
2024-09-19 13:46:00 -07:00
2024-09-19 13:46:00 -07:00
2024-10-06 15:56:06 -07:00
2024-10-02 07:46:26 -07:00
2024-09-27 08:25:36 -07:00
2024-09-23 10:35:09 -07:00
2024-10-02 07:46:26 -07:00
2024-04-19 12:38:50 +02:00
2024-06-14 10:26:33 -07:00
2024-08-09 08:47:34 -07:00
2024-09-06 10:38:49 -07:00
2024-07-08 14:53:10 -07:00
2024-07-08 14:53:10 -07:00
2024-07-08 14:53:10 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-07-25 09:03:00 -07:00
2024-05-24 11:40:42 -07:00
2024-05-24 11:40:42 -07:00
2024-09-25 10:37:12 -07:00
2024-08-09 08:47:34 -07:00
2024-09-19 13:46:00 -07:00
2024-05-11 17:22:17 +02:00
2024-09-19 13:46:01 -07:00
2024-04-05 15:21:14 -07:00
2024-09-25 10:37:11 -07:00
2024-09-25 10:37:12 -07:00
2024-09-16 15:19:05 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-09-19 13:46:12 -07:00
2024-09-19 13:46:12 -07:00
2024-09-20 11:16:30 -07:00
2024-08-28 10:31:26 -07:00
2024-09-25 10:37:12 -07:00
2024-06-14 10:26:33 -07:00
2024-06-12 13:37:15 -07:00
2024-10-02 07:46:26 -07:00
2024-05-17 10:33:39 -07:00
2024-05-23 11:04:27 -07:00
2024-06-14 10:26:33 -07:00
2024-04-05 15:16:27 -07:00
2024-08-09 08:47:34 -07:00
2024-06-24 16:39:15 -07:00
2024-09-04 08:03:24 -07:00
2024-06-14 10:26:33 -07:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
No description provided
Readme 279 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%