Derrick Stolee 410334ed52 read-cache: use hashfile instead of git_hash_ctx
The do_write_index() method in read-cache.c has its own hashing logic
and buffering mechanism. Specifically, the ce_write() method was
introduced by 4990aadc (Speed up index file writing by chunking it
nicely, 2005-04-20) and similar mechanisms were introduced a few months
later in c38138cd (git-pack-objects: write the pack files with a SHA1
csum, 2005-06-26). Based on the timing, in the early days of the Git
codebase, I figured that these roughly equivalent code paths were never
unified only because it got lost in the shuffle. The hashfile API has
since been used extensively in other file formats, such as pack-indexes,
multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to
unify the index writing code to use the same mechanism.

I discovered this disparity while trying to create a new index format
that uses the chunk-format API. That API uses a hashfile as its base, so
it is incompatible with the custom code in read-cache.c.

This rewrite is rather straightforward. It replaces all writes to the
temporary file with writes to the hashfile struct. This takes care of
many of the direct interactions with the_hash_algo.

There are still some git_hash_ctx uses remaining: the extension headers
are hashed for use in the End of Index Entries (EOIE) extension. This
use of the git_hash_ctx is left as-is. There are multiple reasons to not
use a hashfile here, including the fact that the data is not actually
writing to a file, just a hash computation. These hashes do not block
our adoption of the chunk-format API in a future change to the index, so
leave it as-is.

The internals of the algorithms are mostly identical. Previously, the
hashfile API used a smaller 8KB buffer instead of the 128KB buffer from
read-cache.c. The previous change already unified these sizes.

There is one subtle point: we do not pass the CSUM_FSYNC to the
finalize_hashfile() method, which differs from most consumers of the
hashfile API. The extra fsync() call indicated by this flag causes a
significant peformance degradation that is noticeable for quick
commands that write the index, such as "git add". Other consumers can
absorb this cost with their more complicated data structure
organization, and further writing structures such as pack-files and
commit-graphs is rarely in the critical path for common user
interactions.

Some static methods become orphaned in this diff, so I marked them as
MAYBE_UNUSED. The diff is much harder to read if they are deleted during
this change. Instead, they will be deleted in the following change.

In addition to the test suite passing, I computed indexes using the
previous binaries and the binaries compiled after this change, and found
the index data to be exactly equal. Finally, I did extensive performance
testing of "git update-index --force-write" on repos of various sizes,
including one with over 2 million paths at HEAD. These tests
demonstrated less than 1% difference in behavior. As expected, the
performance should be considered unchanged. The previous changes to
increase the hashfile buffer size from 8K to 128K ensured this change
would not create a peformance regression.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19 16:41:21 +09:00
2021-05-16 21:05:24 +09:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-05-10 16:59:47 +09:00
2021-03-08 09:56:34 -08:00
2021-01-21 15:50:00 -08:00
2020-12-08 15:11:17 -08:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-04-27 16:31:39 +09:00
2021-05-07 12:47:41 +09:00
2021-01-23 17:14:07 -08:00
2021-03-13 16:00:09 -08:00
2021-04-02 14:43:14 -07:00
2021-03-13 16:00:09 -08:00
2021-02-25 16:43:30 -08:00
2021-04-16 13:53:33 -07:00
2021-03-13 16:00:09 -08:00
2021-05-16 21:05:24 +09:00
2021-04-27 16:31:39 +09:00
2021-04-27 16:31:39 +09:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-02-22 12:07:40 -08:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-02-17 17:21:40 -08:00
2021-05-14 08:26:08 +09:00
2021-05-16 21:05:23 +09:00
2021-03-13 16:00:09 -08:00
2021-04-27 16:31:39 +09:00
2021-05-16 21:05:22 +09:00
2021-03-13 16:00:09 -08:00
2021-04-08 13:23:25 -07:00
2021-04-08 13:23:25 -07:00
2021-04-08 13:23:25 -07:00
2021-03-13 16:00:09 -08:00
2021-05-04 11:52:02 +09:00
2021-04-27 16:09:32 +09:00
2021-03-22 14:00:24 -07:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-02-05 16:40:45 -08:00
2020-12-14 10:21:36 -08:00
2020-10-27 15:09:50 -07:00
2021-03-26 14:49:41 -07:00
2021-03-13 16:00:09 -08:00
2021-02-17 17:21:40 -08:00
2021-04-27 16:31:39 +09:00
2021-05-07 12:47:41 +09:00
2021-03-13 16:00:09 -08:00
2021-03-13 16:00:09 -08:00
2021-04-05 17:30:36 -07:00
2021-04-14 13:47:54 -07:00
2021-05-04 12:09:43 +09:00
2021-05-16 21:05:22 +09:00
2021-04-20 17:23:34 -07:00
2021-05-04 11:52:02 +09:00
2021-05-04 11:52:02 +09:00
2021-03-13 16:00:09 -08:00
2021-02-16 09:41:32 -08:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
No description provided
Readme 279 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%