diff: halt tree-diff early after max_changes

When computing the changed-paths bloom filters for the commit-graph,
we limit the size of the filter by restricting the number of paths
in the diff. Instead of computing a large diff and then ignoring the
result, it is better to halt the diff computation early.

Create a new "max_changes" option in struct diff_options. If non-zero,
then halt the diff computation after discovering strictly more changed
paths. This includes paths corresponding to trees that change.

Use this max_changes option in the bloom filter calculations. This
reduces the time taken to compute the filters for the Linux kernel
repo from 2m50s to 2m35s. On a large internal repository with ~500
commits that perform tree-wide changes, the time reduced from
6m15s to 3m48s.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Derrick Stolee
2020-03-30 00:31:27 +00:00
committed by Junio C Hamano
parent ed591febb4
commit e369698016
3 changed files with 14 additions and 1 deletions

View File

@@ -133,6 +133,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS;
int i;
struct diff_options diffopt;
int max_changes = 512;
if (bloom_filters.slab_size == 0)
return NULL;
@@ -141,6 +142,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
repo_diff_setup(r, &diffopt);
diffopt.flags.recursive = 1;
diffopt.max_changes = max_changes;
diff_setup_done(&diffopt);
if (c->parents)
@@ -149,7 +151,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
diff_tree_oid(NULL, &c->object.oid, "", &diffopt);
diffcore_std(&diffopt);
if (diff_queued_diff.nr <= 512) {
if (diff_queued_diff.nr <= max_changes) {
struct hashmap pathmap;
struct pathmap_hash_entry *e;
struct hashmap_iter iter;