SZEDER Gábor 74e8addfaa split-index: add tests to demonstrate the racy split index problem
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.

Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).

The tests cover the following sequences of index splitting, updating,
and racy file modifications, with the last two cases demonstrating the
racy split index problem:

  1. Split the index while adding a racily clean file:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file    # size stays the same

     This case already works properly.  Even though the cache entry's
     stat data matches with the modifid file in the worktree,
     subsequent git commands will notice that the (split) index and
     the file have the same mtime, and then will go on to check the
     file's content and notice its dirtiness.

  2. Add a racily clean file to an already split index:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file

     This case already works properly.  After the second 'git
     update-index' writes the newly added file's cache entry to the
     new split index, it basically works in the same way as case #1.

  3. Split the index when it (i.e. the not yet splitted index)
     contains a racily clean cache entry, i.e. an entry whose cached
     stat data matches with the corresponding file in the worktree and
     the cached mtime matches that of the index:

       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --split-index --add other-file

     This case already works properly.  The shared index is written by
     do_write_index(), i.e. the same function that is responsible for
     writing "regular" and split indexes as well.  This function
     cleverly notices the racily clean cache entry, and writes the
     entry to the new shared index with smudged stat data, i.e. file
     size set to 0.  When subsequent git commands read the index, they
     will notice that the smudged stat data doesn't match with the
     file in the worktree, and then go on to check the file's content
     and notice its dirtiness.

  4. Update the split index when it contains a racily clean cache
     entry:

       git update-index --split-index
       echo "cached content" >file
       git update-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case already works properly.  After the second 'git
     update-index' the newly added file's cache entry is only stored
     in the split index.  If a cache entry is present in the split
     index (even if it is a replacement of an outdated entry in the
     shared index), then it will always be included in the new split
     index on subsequent split index updates (until the file is
     removed or a new shared index is written), independently from
     whether the entry is racily clean or not.  When do_write_index()
     writes the new split index, it notices the racily clean cache
     entry, and smudges its stat date.  Subsequent git commands
     reading the index will notice the smudged stat data and then go
     on to check the file's content and notice its dirtiness.

  5. Update the split index when a racily clean cache entry is stored
     only in the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git update-index --add other-file

     This case fails due to the racy split index problem.  In the
     second 'git update-index' prepare_to_write_split_index() decides,
     among other things, which cache entries stored only in the shared
     index should be replaced in the new split index.  Alas, this
     function never looks out for racily clean cache entries, and
     since the file's stat data in the worktree hasn't changed since
     the shared index was written, the entry won't be replaced in the
     new split index.  Consequently, do_write_index() doesn't even get
     this racily clean cache entry, and can't smudge its stat data.
     Subsequent git commands will then see that the index has more
     recent mtime than the file and that the (not smudged) cached stat
     data still matches with the file in the worktree, and,
     ultimately, will erroneously consider the file clean.

  6. Update the split index after unpack_trees() copied a racily clean
     cache entry from the shared index:

       echo "cached content" >file
       git update-index --split-index --add file
       echo "dirty worktree" >file
       # ... wait ...
       git read-tree -m HEAD

     This case fails due to the racy split index problem.  This
     basically fails for the same reason as case #5 above, but there
     is one important difference, which warrants the dedicated test.
     While that second 'git update-index' in case #5 updates
     index_state in place, in this case 'git read-tree -m' calls
     unpack_trees(), which throws out the entire index, and constructs
     a new one from the (potentially updated) copies of the original's
     cache entries.  Consequently, when prepare_to_write_split_index()
     gets to work on this reconstructed index, it takes a different
     code path than in case #5 when deciding which cache entries in
     the shared index should be replaced.  The result is the same,
     though: the racily clean cache entry goes unnoticed, it isn't
     added to the split index with smudged stat data, and subsequent
     git commands will then erroneously consider the file clean.

Note that in the last two 'test_expect_failure' cases I omitted the
'#' (as in nr. of trial) from the tests' description on purpose for
now, as it breakes the TAP output [2]; it will be added at the end of
the series, when those two tests will be flipped to
'test_expect_success'.

[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
    branch 'nd/split-index', 2014-07-16).

[2] In the TAP output a '#' should separate the test's description
    from the TODO directive emitted by 'test_expect_failure'.  The
    additional '#' in "#$trial" interferes with this, the test harness
    won't recognize the TODO directive, and will report that those
    tests failed unexpectedly.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-12 07:23:29 +09:00
2018-08-20 11:33:53 -07:00
2018-09-10 10:41:56 -07:00
2018-08-15 15:08:25 -07:00
2018-08-02 13:54:58 -07:00
2018-08-20 11:33:53 -07:00
2018-08-20 12:41:32 -07:00
2018-08-20 11:33:53 -07:00
2018-08-20 11:33:53 -07:00
2018-08-13 14:14:43 -07:00
2018-08-20 12:41:32 -07:00
2018-08-20 12:41:32 -07:00
2018-07-26 10:12:51 -07:00
2018-07-18 12:20:28 -07:00
2018-08-23 06:20:58 -07:00
2018-03-30 12:49:57 -07:00
2018-03-30 12:49:57 -07:00
2018-08-02 15:30:42 -07:00
2018-05-08 15:59:17 +09:00
2018-08-20 12:41:32 -07:00
2018-08-20 11:33:53 -07:00
2018-08-20 12:41:32 -07:00
2017-12-08 09:16:27 -08:00
2017-12-08 09:16:27 -08:00
2018-08-20 11:33:53 -07:00
2018-08-20 11:33:53 -07:00
2018-08-20 12:41:32 -07:00
2018-08-20 11:33:53 -07:00
2018-05-08 15:59:17 +09:00
2018-08-02 15:30:43 -07:00
2018-08-02 15:30:42 -07:00
2018-05-08 15:59:34 +09:00
2018-08-20 11:33:50 -07:00
2018-08-01 13:37:18 -07:00
2018-09-10 10:41:56 -07:00
2018-08-20 11:33:53 -07:00
2018-08-20 11:33:50 -07:00
2018-08-02 15:30:44 -07:00
2018-08-02 15:30:44 -07:00
2018-07-16 14:27:39 -07:00
2018-04-11 13:09:55 +09:00
2018-05-30 21:51:28 +09:00
2017-06-24 14:28:41 -07:00
2017-09-06 17:19:54 +09:00
2018-08-02 15:30:45 -07:00
2018-08-15 15:08:23 -07:00
2017-12-27 12:28:06 -08:00
2018-06-21 12:22:48 -07:00
2018-02-02 11:28:41 -08:00
2018-06-01 15:06:37 +09:00
2018-07-18 12:20:28 -07:00
2018-08-15 15:08:25 -07:00
2017-12-19 11:33:55 -08:00
2018-01-16 12:16:54 -08:00
2018-08-20 11:33:53 -07:00
2018-08-15 15:08:23 -07:00
2018-06-25 13:22:27 -07:00
2018-08-20 11:33:50 -07:00
2018-08-15 15:08:23 -07:00
2018-08-20 12:41:32 -07:00
2018-03-15 12:01:08 -07:00
2018-08-20 11:33:53 -07:00
2018-06-01 15:06:37 +09:00
2018-08-15 15:08:25 -07:00
2018-03-30 12:49:57 -07:00
2018-03-30 12:49:57 -07:00
2018-08-02 15:30:39 -07:00
2018-08-15 11:52:09 -07:00
2018-05-30 21:51:28 +09:00
2018-08-15 15:08:25 -07:00
2018-08-20 12:41:32 -07:00
2018-07-18 12:20:28 -07:00

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
No description provided
Readme 279 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%