The pack-bitmap machinery uses the `read_bitmap_1()` function to read a
bitmap from within the mmap'd region corresponding to the .bitmap file.
As as side-effect of calling this function, `read_bitmap_1()` increments
the `index->map_pos` variable to reflect the number of bytes read.
Extract the core of this routine to a separate function (that operates
over a `const unsigned char *`, a `size_t` and a `size_t *` pointer)
instead of a `struct bitmap_index *` pointer.
This function (called `read_bitmap()`) is part of the pack-bitmap.h API
so that it can be used within the upcoming portion of the implementation
in pseduo-merge.ch.
Rewrite the existing function, `read_bitmap_1()`, in terms of its more
generic counterpart.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack-bitmap writer machinery understands how to select and
store pseudo-merge commits, teach it how to write the new optional
pseudo-merge .bitmap extension.
No readers yet exist for this new extension to the .bitmap format. The
following commits will take any preparatory step(s) necessary before
then implementing the routines necessary to read this new table.
In the meantime, the new `write_pseudo_merges()` function implements
writing this new format as described by a previous commit in
Documentation/technical/bitmap-format.txt.
Writing this table is fairly straightforward and consists of a few
sub-components:
- a pair of bitmaps for each pseudo-merge (one for the pseudo-merge
"parents", and another for the objects reachable from those parents)
- for each commit, the offset of either (a) the pseudo-merge it
belongs to, or (b) an extended lookup table if it belongs to >1
pseudo-merge groups
- if there are any commits belonging to >1 pseudo-merge group, the
extended lookup tables (which each consist of the number of
pseudo-merge groups a commit appears in, and then that many 4-byte
unsigned )
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the new pseudo-merge machinery how to select non-bitmapped commits
for inclusion in different pseudo-merge group(s) based on a handful of
criteria.
Note that the selected pseudo-merge commits aren't actually used or
written anywhere yet. This will be done in the following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Future commits will want to parse a double-precision floating point
value from configuration, but we have no way to parse such a value prior
to this patch.
The core of the routine is implemented in git_parse_double(). Unlike
git_parse_unsigned() and git_parse_signed(), however, the function
implemented here only works on type "double", and not related types like
"float", or "long double".
This is because "float" and "long double" use different functions to
convert from ASCII strings to floating point values (strtof() and
strtold(), respectively). Likewise, there is no pointer type that can
assign to any of these values (except for "void *"), so the only way to
define this trio of functions would be with a macro expansion that is
parameterized over the floating point type and conversion function.
That is all doable, but likely to be overkill given our current needs,
which is only to parse double-precision floats.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pseudo-merge selection code will be added in a subsequent commit,
and will need a way to push the allocated commit structures into the
bitmap writer from a separate compilation unit.
Make the `bitmap_writer_push_bitmapped_commit()` function part of the
pack-bitmap.h header in order to make this possible.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to implement pseudo-merge bitmap selection by implementing a
necessary new function, `bitmap_writer_has_bitmapped_object_id()`.
This function returns whether or not the bitmap_writer selected the
given object ID for bitmapping. This will allow the pseudo-merge
machinery to reject candidates for pseudo-merges if they have already
been selected as an ordinary bitmap tip.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to write pseudo-merge bitmaps by annotating individual bitmapped
commits (which are represented by the `bitmapped_commit` structure) with
an extra bit indicating whether or not they are a pseudo-merge.
In subsequent commits, pseudo-merge bitmaps will be generated by
allocating a fake commit node with parents covering the full set of
commits represented by the pseudo-merge bitmap. These commits will be
added to the set of "selected" commits as usual, but will be written
specially instead of being included with the rest of the selected
commits.
Mechanically speaking, there are two parts of this change:
- The bitmapped_commit struct gets a new bit indicating whether it is
a pseudo-merge, or an ordinary commit selected for bitmaps.
- A handful of changes to only write out the non-pseudo-merge commits
when enumerating through the selected array (see the new
`bitmap_writer_selected_nr()` function). Pseudo-merge commits appear
after all non-pseudo-merge commits, so it is safe to enumerate
through the selected array like so:
for (i = 0; i < bitmap_writer_selected_nr(); i++)
if (writer.selected[i].pseudo_merge)
BUG("unexpected pseudo-merge");
without encountering the BUG().
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new (empty) header file to contain the implementation for
selecting, reading, and applying pseudo-merge bitmaps.
For now this header and its corresponding implementation are left
empty, but they will evolve over the course of subsequent commit(s).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
map from commits selected for bitmaps (by OID) to a bitmapped_commit
structure (containing the bitmap itself, among other things like its XOR
offset, etc.)
This map was initialized at the end of `bitmap_writer_build()`. New
entries are added in `pack-bitmap-write.c::store_selected()`, which is
called by the bitmap_builder machinery (which is responsible for
traversing history and generating the actual bitmaps).
Reorganize when this field is initialized and when entries are added to
it so that we can quickly determine whether a commit is a candidate for
pseudo-merge selection, or not (since it was already selected to receive
a bitmap, and thus storing it in a pseudo-merge would be redundant).
The changes are as follows:
- Introduce a new `bitmap_writer_init()` function which initializes
the `writer.bitmaps` field (instead of waiting until the end of
`bitmap_writer_build()`).
- Add map entries in `push_bitmapped_commit()` (which is called via
`bitmap_writer_select_commits()`) with OID keys and NULL values to
track whether or not we *expect* to write a bitmap for some given
commit.
- Validate that a NULL entry is found matching the given key when we
store a selected bitmap.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to know whether a given pseudo-merge (comprised of a "parents"
and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap
result, we need to be able to quickly determine whether the "parents"
bitmap is a subset of the current set of objects reachable on either
side of a traversal.
Implement a helper function to prepare for that, which determines
whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a
subset of a non-EWAH bitmap (in this case, the results bitmap from
either side of the traversal).
This function makes use of the EWAH iterator to avoid inflating any part
of the EWAH bitmap after we determine it is not a subset of the non-EWAH
bitmap. This "fail-fast" allows us to avoid a potentially large amount
of wasted effort.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to implement pseudo-merge bitmaps over the next several commits
by first describing the serialization format which will store the new
pseudo-merge bitmaps themselves.
This format is implemented as an optional extension within the bitmap v1
format, making it compatible with previous versions of Git, as well as
the original .bitmap implementation within JGit.
The format is described in detail in the patch contents below, but the
high-level description is as follows:
- An array of pseudo-merge bitmaps, each containing a pair of EWAH
bitmaps: one describing the set of pseudo-merge "parents", and
another describing the set of object(s) reachable from those
parents.
- A lookup table to determine which pseudo-merge(s) a given commit
appears in. An optional extended lookup table follows when there is
at least one commit which appears in multiple pseudo-merge groups.
- Trailing metadata, including the number of pseudo-merge(s), number
of unique parents, the offset within the .bitmap file for the
pseudo-merge commit lookup table, and the size of the optional
extension itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some details to the gitpacking(7) manual page which motivate and
describe pseudo-merge bitmaps.
The exact on-disk format and many of the configuration knobs will be
described in subsequent commits.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new manual page, gitpacking(7) to collect useful information
about advanced packing concepts in Git.
In future commits in this series, this manual page will expand to
describe the new pseudo-merge bitmaps feature, as well as include
examples, relevant configuration bits, use-cases, and so on.
Outside of this series, this manual page may absorb similar pieces from
other parts of Git's documentation about packing.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that there is clearer memory ownership around the bitmap_writer
structure, introduce a bitmap_writer_free() function that callers may
use to free any memory associated with their instance of the
bitmap_writer structure.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to free() memory associated with bitmapped_commit structs by
zero'ing the 'write_as' field.
In ideal cases, it is fine to do something like:
for (i = 0; i < writer->selected_nr; i++) {
struct bitmapped_commit *bc = &writer->selected[i];
if (bc->write_as != bc->bitmap)
ewah_free(bc->write_as);
ewah_free(bc->bitmap);
}
but if not all of the 'write_as' fields were populated (e.g., because
the packing_data given does not form a reachability closure), then we
may attempt to free uninitialized memory.
Guard against this by preemptively zero'ing this field just in case.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was
introduced back in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), making it original to the bitmap implementation in Git
itself.
When that patch was merged via 0f9e62e084 (Merge branch
'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c
passed a value of "-1" for `max_bitmaps`, indicating no limit.
Since then, the only other caller (in midx.c, added via c528e17966
(pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value
of "-1" for `max_bitmaps`.
Since no callers have needed a finite limit for the `max_bitmaps`
parameter in the nearly decade that has passed since 0f9e62e084, let's
remove the parameter and any dead pieces of code connected to it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack-bitmap machinery uses a structure called 'bitmap_writer' to
collect the data necessary to write out .bitmap files. Since its
introduction in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), there has been a single static bitmap_writer structure,
which is responsible for all bitmap writing-related operations.
In practice, this is OK, since we are only ever writing a single .bitmap
file in a single process (e.g., `git multi-pack-index write --bitmap`,
`git pack-objects --write-bitmap-index`, `git repack -b`, etc.).
However, having a single static variable makes issues like data
ownership unclear, when to free variables, what has/hasn't been
initialized unclear.
Refactor this code to be written in terms of a given bitmap_writer
structure instead of relying on a static global.
Note that this exposes the structure definition of the bitmap_writer at
the pack-bitmap.h level. We could work around this by, e.g., forcing
callers to declare their writers as:
struct bitmap_writer *writer;
bitmap_writer_init(&bitmap_writer);
and then declaring `bitmap_writer_init()` as taking in a double-pointer
like so:
void bitmap_writer_init(struct bitmap_writer **writer);
which would avoid us having to expose the definition of the structure
itself. This patch takes a different approach, since future patches
(like for the ongoing pseudo-merge bitmaps work) will want to modify the
innards of this structure (in the previous example, via pseudo-merge.c).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7cc8f97108 (pack-objects: implement bitmap writing, 2013-12-21), the
bitmapped_commit struct was introduced, including the 'commit_pos'
field, which has been unused ever since its introduction more than a
decade ago.
Instead, we have used the nearby `commit_positions` array leaving the
bitmapped_commit struct with an unused 4-byte field.
We could drop the `commit_pos` field as unused, and continue to store
the values in the auxiliary array. But we could also drop the array and
store the data for each bitmapped_commit struct inside of the structure
itself, which is what this patch does.
In any spot that we previously read `commit_positions[i]`, we can now
instead read `writer.selected[i].commit_pos`. There are a few spots that
need changing as a result:
- write_selected_commits_v1() is a simple transformation, since we're
just reading the field. As a result, the function no longer needs an
explicit argument to pass the commit_positions array.
- write_lookup_table() also no longer needs the explicit
commit_positions array passed in as an argument. But it still needs
to sort an array of indices into the writer.selected array to read
them in commit_pos order, so table_cmp() is adjusted accordingly.
- bitmap_writer_finish() no longer needs to allocate, populate, and
free the commit_positions table. Instead, we can just write the data
directly into each struct bitmapped_commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21) the NEEDS_BITMAP flag was introduced into pack-bitmap.h, but
no object flags allocation table existed at the time.
In 208acbfb82 (object.h: centralize object flag allocation, 2014-03-25)
when that table was first introduced, we never added the flags from
7cc8f97108, which has remained the case since.
Rectify this by including the flag bit used by pack-bitmap.h into the
centralized table in object.h.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leakfix.
* rj/add-i-leak-fix:
add: plug a leak on interactive_add
add-patch: plug a leak handling the '/' command
add-interactive: plug a leak in get_untracked_files
apply: plug a leak in apply_data
Demote a BUG() to an die() when the failure from vsnprintf() may
not be due to a programmer error.
* rs/vsnprintf-failure-is-not-a-bug:
don't report vsnprintf(3) error as bug
The "receive-pack" program (which responds to "git push") was not
converted to run "git maintenance --auto" when other codepaths that
used to run "git gc --auto" were updated, which has been corrected.
* ps/run-auto-maintenance-in-receive-pack:
builtin/receive-pack: convert to use git-maintenance(1)
run-command: introduce function to prepare auto-maintenance process
When "git bisect" reports the commit it determined to be the
culprit, we used to show it in a format that does not honor common
UI tweaks, like log.date and log.decorate. The code has been
taught to use "git show" to follow more customizations.
* pk/bisect-use-show:
bisect: report the found commit with "show"
The filename used for rejected hunks "git apply --reject" creates
was limited to PATH_MAX, which has been lifted.
* rs/apply-reject-long-name:
apply: avoid using fixed-size buffer in write_out_one_reject()
When .git/rr-cache/ rerere database gets corrupted or rerere is fed to
work on a file with conflicted hunks resolved incompletely, the rerere
machinery got confused and segfaulted, which has been corrected.
* mr/rerere-crash-fix:
rerere: fix crashes due to unmatched opening conflict markers