9a414486d9f0325c3663210471e4673ed0cd862c
The lookup_object function is backed by a hash table of all
objects we have seen in the program. We manage collisions
with a linear walk over the colliding entries, checking each
with hashcmp(). The main cost of lookup is in these
hashcmp() calls; finding our item in the first slot is
cheaper than finding it in the second slot, which is cheaper
than the third, and so on.
If we assume that there is some locality to the object
lookups (e.g., if X and Y collide, and we have just looked
up X, the next lookup is more likely to be for X than for
Y), then we can improve our average lookup speed by checking
X before Y.
This patch does so by swapping a found item to the front of
the collision chain. The p0001 perf test reveals that this
does indeed exploit locality in the case of "rev-list --all
--objects":
Test origin this tree
-------------------------------------------------------------------------
0001.1: rev-list --all 0.40(0.38+0.02) 0.40(0.36+0.03) +0.0%
0001.2: rev-list --all --objects 2.24(2.17+0.05) 1.86(1.79+0.05) -17.0%
This is not surprising, as the full object traversal will
hit the same tree entries over and over (e.g., for every
commit that doesn't change "Documentation/", we will have to
look up the same sha1 just to find out that we already
processed it).
The reason why this technique works (and does not violate
any properties of the hash table) is subtle and bears some
explanation. Let's imagine we get a lookup for sha1 `X`, and
it hashes to bucket `i` in our table. That stretch of the
table may look like:
index | i-1 | i | i+1 | i+2 |
-----------------------------------
entry ... | A | B | C | X | ...
-----------------------------------
We start our probe at i, see that B does not match, nor does
C, and finally find X. There may be multiple C's in the
middle, but we know that there are no empty slots (or else
we would not find X at all).
We do not know the original index of B; it may be `i`, or it
may be less than i (e.g., if it were `i-1`, it would collide
with A and spill over into the `i` bucket). So it is
acceptable for us to move it to the right of a contiguous
stretch of entries (because we will find it from a linear
walk starting anywhere at `i` or before), but never to the
left (if we moved it to `i-1`, we would miss it when
starting our walk at `i`).
We do know the original index of X; it is `i`, so it is safe
to place it anywhere in the contiguous stretch between `i`
and where we found it (`i+2` in the this case).
This patch does a pure swap; after finding X in the
situation above, we would end with:
index | i-1 | i | i+1 | i+2 |
-----------------------------------
entry ... | A | X | C | B | ...
-----------------------------------
We could instead bump X into the `i` slot, and then shift
the whole contiguous chain down by one, resulting in:
index | i-1 | i | i+1 | i+2 |
-----------------------------------
entry ... | A | X | B | C | ...
-----------------------------------
That puts our chain in true most-recently-used order.
However, experiments show that it is not any faster (and in
fact, is slightly slower due to the extra manipulation).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
////////////////////////////////////////////////////////////////
GIT - the stupid content tracker
////////////////////////////////////////////////////////////////
"git" can mean anything, depending on your mood.
- random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
- "global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks
Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.
Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
See Documentation/gittutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands, and
Documentation/git-commandname.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with "man gittutorial" or "git help tutorial", and the
documentation of each command with "man git-commandname" or "git help
commandname".
CVS users may also want to read Documentation/gitcvs-migration.txt
("man gitcvs-migration" or "git help cvs-migration" if git is
installed).
Many Git online resources are accessible from http://git-scm.com/
including full documentation and Git related tools.
The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org (read
Documentation/SubmittingPatches for instructions on patch submission).
To subscribe to the list, send an email with just "subscribe git" in
the body to majordomo@vger.kernel.org. The mailing list archives are
available at http://marc.theaimsgroup.com/?l=git and other archival
sites.
The messages titled "A note from the maintainer", "What's in
git.git (stable)" and "What's cooking in git.git (topics)" and
the discussion following them on the mailing list give a good
reference for project status, development direction and
remaining tasks.
Description
Languages
C
50.5%
Shell
38.7%
Perl
4.5%
Tcl
3.2%
Python
0.8%
Other
2.1%