last-modified: use Bloom filters when available

Our 'git last-modified' performs a revision walk, and computes a diff at
each point in the walk to figure out whether a given revision changed
any of the paths it considers interesting.

When changed-path Bloom filters are available, we can avoid computing
many such diffs. Before computing a diff, we first check if any of the
remaining paths of interest were possibly changed at a given commit by
consulting its Bloom filter. If any of them are, we are resigned to
compute the diff.

If none of those queries returned "maybe", we know that the given commit
doesn't contain any changed paths which are interesting to us. So, we
can avoid computing it in this case.

Comparing the perf test results on git.git:

    Test                                        HEAD~             HEAD
    ------------------------------------------------------------------------------------
    8020.1: top-level last-modified             4.49(4.34+0.11)   2.22(2.05+0.09) -50.6%
    8020.2: top-level recursive last-modified   5.64(5.45+0.11)   5.62(5.30+0.11) -0.4%
    8020.3: subdir last-modified                0.11(0.06+0.04)   0.07(0.03+0.04) -36.4%

Based-on-patch-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Toon Claes
2025-08-05 11:33:58 +02:00
committed by Junio C Hamano
parent 97d5301c54
commit 8d9a7cdfda
2 changed files with 52 additions and 3 deletions

View File

@@ -820,7 +820,12 @@ int corrected_commit_dates_enabled(struct repository *r)
struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
{
struct commit_graph *g = r->objects->commit_graph;
struct commit_graph *g;
if (!prepare_commit_graph(r))
return NULL;
g = r->objects->commit_graph;
while (g) {
if (g->bloom_filter_settings)
return g->bloom_filter_settings;