Jeff King bf972896d7 cat-file: use packed_object_info() for --batch-all-objects
When "cat-file --batch-all-objects" iterates over each object, it knows
where to find each one. But when we look up details of the object, we
don't use that information at all.

This patch teaches it to use the pack/offset pair when we're iterating
over objects in a pack. This yields a measurable speed improvement
(timings on a fully packed clone of linux.git):

  Benchmark #1: ./git.old cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)"
    Time (mean ± σ):      8.128 s ±  0.118 s    [User: 7.968 s, System: 0.156 s]
    Range (min … max):    8.007 s …  8.301 s    10 runs

  Benchmark #2: ./git.new cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)"
    Time (mean ± σ):      4.294 s ±  0.064 s    [User: 4.167 s, System: 0.125 s]
    Range (min … max):    4.227 s …  4.457 s    10 runs

  Summary
    './git.new cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)"' ran
      1.89 ± 0.04 times faster than './git.old cat-file --batch-all-objects --unordered --batch-check="%(objecttype) %(objectname)"

The implementation is pretty simple: we just call packed_object_info()
instead of oid_object_info_extended() when we can. Most of the changes
are just plumbing the pack/offset pair through the callstack. There is
one subtlety: replace lookups are not handled by packed_object_info().
But since those are disabled for --batch-all-objects, and since we'll
only have pack info when that option is in effect, we don't have to
worry about that.

There are a few limitations to this optimization which we could address
with further work:

 - I didn't bother recording when we found an object loose. Technically
   this could save us doing a fruitless lookup in the pack index. But
   opening and mmap-ing a loose object is so expensive in the first
   place that this doesn't matter much. And if your repository is large
   enough to care about per-object performance, most objects are going
   to be packed anyway.

 - This works only in --unordered mode. For the sorted mode, we'd have
   to record the pack/offset pair as part of our oid-collection. That's
   more code, plus at least 16 extra bytes of heap per object. It would
   probably still be a net win in runtime, but we'd need to measure.

 - For --batch, this still helps us with getting the object metadata,
   but we still do a from-scratch lookup for the object contents. This
   probably doesn't matter that much, because the lookup cost will be
   much smaller relative to the cost of actually unpacking and printing
   the objects.

   For small objects, we could probably swap out read_object_file() for
   using packed_object_info() with a "object_info.contentp" to get the
   contents. But we'd still need to deal with streaming for larger
   objects. A better path forward here is to teach the initial
   oid_object_info_extended() / packed_object_info() calls to retrieve
   the contents of smaller objects while they are already being
   accessed. That would save the extra lookup entirely. But it's a
   non-trivial feature to add to the object_info code, so I left it for
   now.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:45:14 -07:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-07-22 13:05:54 -07:00
2021-07-13 16:52:50 -07:00
2021-05-10 16:59:47 +09:00
2021-03-08 09:56:34 -08:00
2021-01-21 15:50:00 -08:00
2021-03-13 16:00:09 -08:00
2021-06-14 10:16:06 +09:00
2021-03-13 16:00:09 -08:00
2021-04-27 16:31:39 +09:00
2021-05-07 12:47:41 +09:00
2021-01-23 17:14:07 -08:00
2021-08-02 14:06:42 -07:00
2021-07-13 16:52:50 -07:00
2021-03-13 16:00:09 -08:00
2021-07-16 17:42:53 -07:00
2021-07-16 17:42:53 -07:00
2021-07-16 17:42:53 -07:00
2021-03-13 16:00:09 -08:00
2021-08-06 12:52:15 -07:00
2021-08-04 13:28:52 -07:00
2021-06-14 13:33:27 +09:00
2021-07-30 12:14:27 -07:00
2021-07-28 13:17:57 -07:00
2021-08-16 12:15:44 -07:00
2021-07-13 16:52:53 -07:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-02-22 12:07:40 -08:00
2021-07-28 13:17:59 -07:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-02-17 17:21:40 -08:00
2021-08-04 13:28:55 -07:00
2021-07-30 12:14:27 -07:00
2021-07-28 13:17:59 -07:00
2021-03-13 16:00:09 -08:00
2021-04-27 16:31:39 +09:00
2021-07-28 13:17:58 -07:00
2021-03-13 16:00:09 -08:00
2021-04-08 13:23:25 -07:00
2021-04-08 13:23:25 -07:00
2021-04-08 13:23:25 -07:00
2021-03-13 16:00:09 -08:00
2021-05-04 11:52:02 +09:00
2021-03-13 16:00:09 -08:00
2021-07-13 16:52:50 -07:00
2021-08-04 13:28:52 -07:00
2021-08-04 13:28:52 -07:00
2021-03-13 16:00:09 -08:00
2021-02-05 16:40:45 -08:00
2020-12-14 10:21:36 -08:00
2020-10-27 15:09:50 -07:00
2021-06-10 12:04:27 +09:00
2021-06-14 10:16:06 +09:00
2021-02-17 17:21:40 -08:00
2021-04-14 13:47:54 -07:00
2021-07-28 13:17:59 -07:00
2021-05-04 12:09:43 +09:00
2021-07-16 17:42:53 -07:00
2021-05-04 11:52:02 +09:00
2021-05-04 11:52:02 +09:00
2021-02-16 09:41:32 -08:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
No description provided
Readme 279 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%