admin/git

Author	SHA1	Message	Date
Derrick Stolee	b08ff1fee0	maintenance: add --schedule option and config Maintenance currently triggers when certain data-size thresholds are met, such as number of pack-files or loose objects. Users may want to run certain maintenance tasks based on frequency instead. For example, a user may want to perform a 'prefetch' task every hour, or 'gc' task every day. To help these users, update the 'git maintenance run' command to include a '--schedule=<frequency>' option. The allowed frequencies are 'hourly', 'daily', and 'weekly'. These values are also allowed in a new config value 'maintenance.<task>.schedule'. The 'git maintenance run --schedule=<frequency>' checks the '.schedule' config value for each enabled task to see if the configured frequency is at least as frequent as the frequency from the '--schedule' argument. We use the following order, for full clarity: 'hourly' > 'daily' > 'weekly' Use new 'enum schedule_priority' to track these values numerically. The following cron table would run the scheduled tasks with the correct frequencies: 0 1-23 * * git -C <repo> maintenance run --schedule=hourly 0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily 0 0 * * 0 git -C <repo> maintenance run --schedule=weekly This cron schedule will run --schedule=hourly every hour except at midnight. This avoids a concurrent run with the --schedule=daily that runs at midnight every day except the first day of the week. This avoids a concurrent run with the --schedule=weekly that runs at midnight on the first day of the week. Since --schedule=daily also runs the 'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily' tasks, we will still see all tasks run with the proper frequencies. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:59:44 -07:00
Derrick Stolee	e841a79a13	maintenance: add incremental-repack auto condition The incremental-repack task updates the multi-pack-index by deleting pack- files that have been replaced with new packs, then repacking a batch of small pack-files into a larger pack-file. This incremental repack is faster than rewriting all object data, but is slower than some other maintenance activities. The 'maintenance.incremental-repack.auto' config option specifies how many pack-files should exist outside of the multi-pack-index before running the step. These pack-files could be created by 'git fetch' commands or by the loose-objects task. The default value is 10. Setting the option to zero disables the task with the '--auto' option, and a negative value makes the task run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:05 -07:00
Derrick Stolee	a13e3d0ec8	maintenance: auto-size incremental-repack batch When repacking during the 'incremental-repack' task, we use the --batch-size option in 'git multi-pack-index repack'. The initial setting used --batch-size=0 to repack everything into a single pack-file. This is not sustainable for a large repository. The amount of work required is also likely to use too many system resources for a background job. Update the 'incremental-repack' task by dynamically computing a --batch-size option based on the current pack-file structure. The dynamic default size is computed with this idea in mind for a client repository that was cloned from a very large remote: there is likely one "big" pack-file that was created at clone time. Thus, do not try repacking it as it is likely packed efficiently by the server. Instead, we select the second-largest pack-file, and create a batch size that is one larger than that pack-file. If there are three or more pack-files, then this guarantees that at least two will be combined into a new pack-file. Of course, this means that the second-largest pack-file size is likely to grow over time and may eventually surpass the initially-cloned pack-file. Recall that the pack-file batch is selected in a greedy manner: the packs are considered from oldest to newest and are selected if they have size smaller than the batch size until the total selected size is larger than the batch size. Thus, that oldest "clone" pack will be first to repack after the new data creates a pack larger than that. We also want to place some limits on how large these pack-files become, in order to bound the amount of time spent repacking. A maximum batch-size of two gigabytes means that large repositories will never be packed into a single pack-file using this job, but also that repack is rather expensive. This is a trade-off that is valuable to have if the maintenance is being run automatically or in the background. Users who truly want to optimize for space and performance (and are willing to pay the upfront cost of a full repack) can use the 'gc' task to do so. Create a test for this two gigabyte limit by creating an EXPENSIVE test that generates two pack-files of roughly 2.5 gigabytes in size, then performs an incremental repack. Check that the --batch-size argument in the subcommand uses the hard-coded maximum. Helped-by: Chris Torek <chris.torek@gmail.com> Reported-by: Son Luong Ngoc <sluongng@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:05 -07:00
Derrick Stolee	52fe41ff1c	maintenance: add incremental-repack task The previous change cleaned up loose objects using the 'loose-objects' that can be run safely in the background. Add a similar job that performs similar cleanups for pack-files. One issue with running 'git repack' is that it is designed to repack all pack-files into a single pack-file. While this is the most space-efficient way to store object data, it is not time or memory efficient. This becomes extremely important if the repo is so large that a user struggles to store two copies of the pack on their disk. Instead, perform an "incremental" repack by collecting a few small pack-files into a new pack-file. The multi-pack-index facilitates this process ever since 'git multi-pack-index expire' was added in `19575c7` (multi-pack-index: implement 'expire' subcommand, 2019-06-10) and 'git multi-pack-index repack' was added in `ce1e4a1` (midx: implement midx_repack(), 2019-06-10). The 'incremental-repack' task runs the following steps: 1. 'git multi-pack-index write' creates a multi-pack-index file if one did not exist, and otherwise will update the multi-pack-index with any new pack-files that appeared since the last write. This is particularly relevant with the background fetch job. When the multi-pack-index sees two copies of the same object, it stores the offset data into the newer pack-file. This means that some old pack-files could become "unreferenced" which I will use to mean "a pack-file that is in the pack-file list of the multi-pack-index but none of the objects in the multi-pack-index reference a location inside that pack-file." 2. 'git multi-pack-index expire' deletes any unreferenced pack-files and updaes the multi-pack-index to drop those pack-files from the list. This is safe to do as concurrent Git processes will see the multi-pack-index and not open those packs when looking for object contents. (Similar to the 'loose-objects' job, there are some Git commands that open pack-files regardless of the multi-pack-index, but they are rarely used. Further, a user that self-selects to use background operations would likely refrain from using those commands.) 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set of pack-files that are listed in the multi-pack-index and creates a new pack-file containing the objects whose offsets are listed by the multi-pack-index to be in those objects. The set of pack- files is selected greedily by sorting the pack-files by modified time and adding a pack-file to the set if its "expected size" is smaller than the batch size until the total expected size of the selected pack-files is at least the batch size. The "expected size" is calculated by taking the size of the pack-file divided by the number of objects in the pack-file and multiplied by the number of objects from the multi-pack-index with offset in that pack-file. The expected size approximates how much data from that pack-file will contribute to the resulting pack-file size. The intention is that the resulting pack-file will be close in size to the provided batch size. The next run of the incremental-repack task will delete these repacked pack-files during the 'expire' step. In this version, the batch size is set to "0" which ignores the size restrictions when selecting the pack-files. It instead selects all pack-files and repacks all packed objects into a single pack-file. This will be updated in the next change, but it requires doing some calculations that are better isolated to a separate change. These steps are based on a similar background maintenance step in Scalar (and VFS for Git) [1]. This was incredibly effective for users of the Windows OS repository. After using the same VFS for Git repository for over a year, some users had _thousands_ of pack-files that combined to up to 250 GB of data. We noticed a few users were running into the open file descriptor limits (due in part to a bug in the multi-pack-index fixed by `af96fe3` (midx: add packs to packed_git linked list, 2019-04-29). These pack-files were mostly small since they contained the commits and trees that were pushed to the origin in a given hour. The GVFS protocol includes a "prefetch" step that asks for pre-computed pack- files containing commits and trees by timestamp. These pack-files were grouped into "daily" pack-files once a day for up to 30 days. If a user did not request prefetch packs for over 30 days, then they would get the entire history of commits and trees in a new, large pack-file. This led to a large number of pack-files that had poor delta compression. By running this pack-file maintenance step once per day, these repos with thousands of packs spanning 200+ GB dropped to dozens of pack- files spanning 30-50 GB. This was done all without removing objects from the system and using a constant batch size of two gigabytes. Once the work was done to reduce the pack-files to small sizes, the batch size of two gigabytes means that not every run triggers a repack operation, so the following run will not expire a pack-file. This has kept these repos in a "clean" state. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	3e220e6069	maintenance: create auto condition for loose-objects The loose-objects task deletes loose objects that already exist in a pack-file, then place the remaining loose objects into a new pack-file. If this step runs all the time, then we risk creating pack-files with very few objects with every 'git commit' process. To prevent overwhelming the packs directory with small pack-files, place a minimum number of objects to justify the task. The 'maintenance.loose-objects.auto' config option specifies a minimum number of loose objects to justify the task to run under the '--auto' option. This defaults to 100 loose objects. Setting the value to zero will prevent the step from running under '--auto' while a negative value will force it to run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	252cfb7cb8	maintenance: add loose-objects task One goal of background maintenance jobs is to allow a user to disable auto-gc (gc.auto=0) but keep their repository in a clean state. Without any cleanup, loose objects will clutter the object database and slow operations. In addition, the loose objects will take up extra space because they are not stored with deltas against similar objects. Create a 'loose-objects' task for the 'git maintenance run' command. This helps clean up loose objects without disrupting concurrent Git commands using the following sequence of events: 1. Run 'git prune-packed' to delete any loose objects that exist in a pack-file. Concurrent commands will prefer the packed version of the object to the loose version. (Of course, there are exceptions for commands that specifically care about the location of an object. These are rare for a user to run on purpose, and we hope a user that has selected background maintenance will not be trying to do foreground maintenance.) 2. Run 'git pack-objects' on a batch of loose objects. These objects are grouped by scanning the loose object directories in lexicographic order until listing all loose objects -or- reaching 50,000 objects. This is more than enough if the loose objects are created only by a user doing normal development. We noticed users with _millions_ of loose objects because VFS for Git downloads blobs on-demand when a file read operation requires populating a virtual file. This step is based on a similar step in Scalar [1] and VFS for Git. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	28cb5e66dd	maintenance: add prefetch task When working with very large repositories, an incremental 'git fetch' command can download a large amount of data. If there are many other users pushing to a common repo, then this data can rival the initial pack-file size of a 'git clone' of a medium-size repo. Users may want to keep the data on their local repos as close as possible to the data on the remote repos by fetching periodically in the background. This can break up a large daily fetch into several smaller hourly fetches. The task is called "prefetch" because it is work done in advance of a foreground fetch to make that 'git fetch' command much faster. However, if we simply ran 'git fetch <remote>' in the background, then the user running a foreground 'git fetch <remote>' would lose some important feedback when a new branch appears or an existing branch updates. This is especially true if a remote branch is force-updated and this isn't noticed by the user because it occurred in the background. Further, the functionality of 'git push --force-with-lease' becomes suspect. When running 'git fetch <remote> <options>' in the background, use the following options for careful updating: 1. --no-tags prevents getting a new tag when a user wants to see the new tags appear in their foreground fetches. 2. --refmap= removes the configured refspec which usually updates refs/remotes/<remote>/* with the refs advertised by the remote. While this looks confusing, this was documented and tested by `b40a50264a` (fetch: document and test --refmap="", 2020-01-21), including this sentence in the documentation: Providing an empty `<refspec>` to the `--refmap` option causes Git to ignore the configured refspecs and rely entirely on the refspecs supplied as command-line arguments. 3. By adding a new refspec "+refs/heads/:refs/prefetch/<remote>/" we can ensure that we actually load the new values somewhere in our refspace while not updating refs/heads or refs/remotes. By storing these refs here, the commit-graph job will update the commit-graph with the commits from these hidden refs. 4. --prune will delete the refs/prefetch/<remote> refs that no longer appear on the remote. 5. --no-write-fetch-head prevents updating FETCH_HEAD. We've been using this step as a critical background job in Scalar [1] (and VFS for Git). This solved a pain point that was showing up in user reports: fetching was a pain! Users do not like waiting to download the data that was created while they were away from their machines. After implementing background fetch, the foreground fetch commands sped up significantly because they mostly just update refs and download a small amount of new data. The effect is especially dramatic when paried with --no-show-forced-udpates (through fetch.showForcedUpdates=false). [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Jeff King	45d93eb824	shortlog: change "author" variables to "ident" We already match "committer", and we're about to start matching more things. Let's use a more neutral variable to avoid confusion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:47:50 -07:00
Christian Couder	73c6de06af	bisect: don't use invalid oid as rev when starting In `06f5608c14` (bisect--helper: `bisect_start` shell function partially in C, 2019-01-02), we changed the following shell code: - rev=$(git rev-parse -q --verify "$arg^{commit}") \|\| { - test $has_double_dash -eq 1 && - die "$(eval_gettext "'\$arg' does not appear to be a valid revision")" - break - } - revs="$revs $rev" into: + char *commit_id = xstrfmt("%s^{commit}", arg); + if (get_oid(commit_id, &oid) && has_double_dash) + die(_("'%s' does not appear to be a valid " + "revision"), arg); + + string_list_append(&revs, oid_to_hex(&oid)); + free(commit_id); In case of an invalid "arg" when "has_double_dash" is false, the old code would "break" out of the argument loop. In the new C code though, `oid_to_hex(&oid)` is unconditonally appended to "revs". This is wrong first because "oid" is junk as `get_oid(commit_id, &oid)` failed and second because it doesn't break out of the argument loop. Not breaking out of the argument loop means that "arg" is then not treated as a path restriction (which is wrong). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 09:57:48 -07:00
Alex Henrie	54200cef86	pull: don't warn if pull.ff has been set A user who understands enough to set pull.ff does not need additional instructions. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 23:04:27 -07:00
Junio C Hamano	610e2b9240	blame: validate and peel the object names on the ignore list The command reads list of object names to place on the ignore list either from the command line or from a file, but they are not checked with their object type (those read from the file are not even checked for object existence). Extend the oidset_parse_file() API and allow it to take a callback that can be used to die (e.g. when an inappropriate input is read) or modify the object name read (e.g. when a tag pointing at a commit is read, and the caller wants a commit object name), and use it in the code that handles ignore list. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 22:20:58 -07:00
Jeff King	176380fd11	Revert "fast-export: use local array to store anonymized oid" This reverts commit `f39ad38410`. That commit was trying to silence a type-punning warning on older versions of gcc. However, its analysis was all wrong. I didn't notice that we _were_ in fact type-punning because there are two versions of put_be32(): one that uses casts and unaligned loads, and another that uses bitshifts. I looked at the latter, but on my platform we were defaulting to the former. However, as of the previous commit, we'll always use the bitshift version. So we can drop this hackery to avoid the warning, making the code slightly cleaner. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:30:11 -07:00
Pranit Bauva	517ecb3161	bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions in C and add the subcommands to `git bisect--helper` to call them from git-bisect.sh . bisect_auto_next() function returns an enum bisect_error type as whole `git bisect` can exit with an error code when bisect_next() does. Return an error when `bisect_next()` fails, that fix a bug on shell script version. Using `--bisect-next` and `--bisect-auto-next` subcommands is a temporary measure to port shell function to C so as to use the existing test suite. As more functions are ported, `--bisect-auto-next` subcommand will be retired and will be called by some other methods. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:06:30 -07:00
Pranit Bauva	09535f056b	bisect--helper: reimplement `bisect_autostart` shell function in C Reimplement the `bisect_autostart()` shell function in C and add the C implementation from `bisect_next()` which was previously left uncovered. Add `--bisect-autostart` subcommand to be called from git-bisect.sh. Using `--bisect-autostart` subcommand is a temporary measure to port the shell function to C so as to use the existing test suite. As more functions are ported, this subcommand will be retired and bisect_autostart() will be called directly by `bisect_state()`. Change behavior of shell script that returned success when user aborted the bisection. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:06:30 -07:00
Junio C Hamano	6854689e65	Merge branch 'ar/fetch-ipversion-in-all' "git fetch --all --ipv4/--ipv6" forgot to pass the protocol options to instances of the "git fetch" that talk to individual remotes, which has been corrected. * ar/fetch-ipversion-in-all: fetch: pass --ipv4 and --ipv6 options to sub-fetches	2020-09-22 12:36:34 -07:00
Junio C Hamano	39149df364	Merge branch 'cs/don-t-pretend-a-failed-remote-set-head-succeeded' "git remote set-head" that failed still said something that hints the operation went through, which was misleading. * cs/don-t-pretend-a-failed-remote-set-head-succeeded: remote: don't show success message when set-head fails	2020-09-22 12:36:32 -07:00
Junio C Hamano	26a3728bed	Merge branch 'al/ref-filter-merged-and-no-merged' "git for-each-ref" and friends that list refs used to allow only one --merged or --no-merged to filter them; they learned to take combination of both kind of filtering. * al/ref-filter-merged-and-no-merged: Doc: prefer more specific file name ref-filter: make internal reachable-filter API more precise ref-filter: allow merged and no-merged filters Doc: cover multiple contains/no-contains filters t3201: test multiple branch filter combinations	2020-09-22 12:36:31 -07:00
Junio C Hamano	4aff18a3f0	Merge branch 'ls/mergetool-meld-auto-merge' The 'meld' backend of the "git mergetool" learned to give the underlying 'meld' the '--auto-merge' option, which would help reduce the amount of text that requires manual merging. * ls/mergetool-meld-auto-merge: mergetool: allow auto-merge for meld to follow the vim-diff behavior	2020-09-22 12:36:29 -07:00
Junio C Hamano	b7e65b51e5	Merge branch 'jt/threaded-index-pack' "git index-pack" learned to resolve deltified objects with greater parallelism. * jt/threaded-index-pack: index-pack: make quantum of work smaller index-pack: make resolve_delta() assume base data index-pack: calculate {ref,ofs}_{first,last} early index-pack: remove redundant child field index-pack: unify threaded and unthreaded code index-pack: remove redundant parameter Documentation: deltaBaseCacheLimit is per-thread	2020-09-22 12:36:28 -07:00
Junio C Hamano	634e0084fa	Merge branch 'es/format-patch-interdiff-cleanup' "format-patch --range-diff=<prev> <origin>..HEAD" has been taught not to ignore <origin> when <prev> is a single version. * es/format-patch-interdiff-cleanup: format-patch: use 'origin' as start of current-series-range when known diff-lib: tighten show_interdiff()'s interface diff: move show_interdiff() from its own file to diff-lib	2020-09-22 12:36:28 -07:00
Junio C Hamano	bcb68bff80	Merge branch 'os/fetch-submodule-optim' Optimization around submodule handling. * os/fetch-submodule-optim: fetch: do not look for submodule changes in unchanged refs	2020-09-22 12:36:28 -07:00
brian m. carlson	47ac970309	builtin/clone: avoid failure with GIT_DEFAULT_HASH If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to "sha256", then we can end up with a repository where the repository format version is 0 but the extensions.objectformat key is set to "sha256". This is both wrong (the user has a SHA-1 repository) and nonfunctional (because the extension cannot be used in a v0 repository). This happens because in a clone, we initially set up the repository, and then change its algorithm based on what the remote side tells us it's using. We've initially set up the repository as SHA-256 in this case, and then later on reset the repository version without clearing the extension. We could just always set the extension in this case, but that would mean that our SHA-1 repositories weren't compatible with older Git versions, even though there's no reason why they shouldn't be. And we also don't want to initialize the repository as SHA-1 initially, since that means if we're cloning an empty repository, we'll have failed to honor the GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a SHA-256 repository. Neither of those are appealing, so let's tell the repository initialization code if we're doing a reinit like this, and if so, to clear the extension if we're using SHA-1. This makes sure we produce a valid and functional repository and doesn't break any of our other use cases. Reported-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-22 09:22:32 -07:00
Johannes Schindelin	5a0c32bd4b	fast-export: avoid using unnecessary language in a code comment In an ongoing effort to avoid non-inclusive language, let's avoid using the branch name "master" in a code comment. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-21 15:19:27 -07:00
Denton Liu	3d09c22869	builtin/diff-tree: learn --merge-base The previous commit introduced ---merge-base a way to take the diff between the working tree or index and the merge base between an arbitrary commit and HEAD. It makes sense to extend this option to support the case where two commits are given too and behave in a manner identical to `git diff A...B`. Introduce the --merge-base flag as an alternative to triple-dot notation. Thus, we would be able to write the above as `git diff --merge-base A B`. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-21 13:37:03 -07:00
Denton Liu	0f5a1d449b	builtin/diff-index: learn --merge-base There is currently no easy way to take the diff between the working tree or index and the merge base between an arbitrary commit and HEAD. Even diff's `...` notation doesn't allow this because it only works between commits. However, the ability to do this would be desirable to a user who would like to see all the changes they've made on a branch plus uncommitted changes without taking into account changes made in the upstream branch. Teach diff-index and diff (with one commit) the --merge-base option which allows a user to use the merge base of a commit and HEAD as the "before" side. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-20 21:30:26 -07:00
Denton Liu	4c3fe82ef1	diff-lib: accept option flags in run_diff_index() In a future commit, we will teach run_diff_index() to accept more options via flag bits. For now, change `cached` into a flag in the `option` bitfield. The behaviour should remain exactly the same. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-20 21:30:26 -07:00
Junio C Hamano	80cacaec41	Merge branch 'mt/config-fail-nongit-early' Unlike "git config --local", "git config --worktree" did not fail early and cleanly when started outside a git repository. * mt/config-fail-nongit-early: config: complain about --worktree outside of a git repo	2020-09-18 17:58:06 -07:00
Junio C Hamano	9d4e7ec4d9	Merge branch 'jc/quote-path-cleanup' "git status --short" quoted a path with SP in it when tracked, but not those that are untracked, ignored or unmerged. They are all shown quoted consistently. * jc/quote-path-cleanup: quote: turn 'nodq' parameter into a set of flags quote: rename misnamed sq_lookup[] to cq_lookup[] wt-status: consistently quote paths in "status --short" output quote_path: code clarification quote_path: optionally allow quoting a path with SP in it quote_path: give flags parameter to quote_path() quote_path: rename quote_path_relative() to quote_path()	2020-09-18 17:58:04 -07:00
Junio C Hamano	45f462b5c5	Merge branch 'es/wt-add-detach' "git worktree add" learns that the "-d" is a synonym to "--detach" option to create a new worktree without being on a branch. * es/wt-add-detach: git-worktree.txt: discuss branch-based vs. throwaway worktrees worktree: teach `add` to recognize -d as shorthand for --detach git-checkout.txt: document -d short option for --detach	2020-09-18 17:58:04 -07:00
Junio C Hamano	e96b271d18	Merge branch 'jc/add-i-use-builtin-experimental' The "add -i/-p" machinery has been written in C but it is not used by default yet. It is made default to those who are participating in feature.experimental experiment. * jc/add-i-use-builtin-experimental: add -i: use the built-in version when feature.experimental is set	2020-09-18 17:58:02 -07:00
Junio C Hamano	21de7e9c50	Merge branch 'rs/refspec-leakfix' Leakfix. * rs/refspec-leakfix: refspec: add and use refspec_appendf() push: release strbufs used for refspec formatting	2020-09-18 17:58:00 -07:00
Junio C Hamano	9b8074427b	Merge branch 'rs/misc-cleanups' Misc cleanups. * rs/misc-cleanups: pack-bitmap-write: use hashwrite_be32() in write_hash_cache() midx: use hashwrite_u8() in write_midx_header() fast-import: use write_pack_header()	2020-09-18 17:58:00 -07:00
Taylor Blau	d356d5debe	commit-graph: introduce 'commitGraph.maxNewFilters' Introduce a configuration variable to specify a default value for the recently-introduce '--max-new-filters' option of 'git commit-graph write'. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-18 10:39:22 -07:00
Taylor Blau	809e0327f5	builtin/commit-graph.c: introduce '--max-new-filters=<n>' Introduce a command-line flag to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-18 10:35:39 -07:00
Taylor Blau	98bb796191	commit-graph: rename 'split_commit_graph_opts' In the subsequent commit, additional options will be added to the commit-graph API which have nothing to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Likewise, rename the 'flags' member to instead be 'split_flags' to clarify that it only has to do with the behavior implied by '--split'. Suggested-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 21:55:50 -07:00
Christian Schlack	5a07c6c3c2	remote: don't show success message when set-head fails Suppress the message 'origin/HEAD set to master' in case of an error. $ git remote set-head origin -a error: Not a valid ref: refs/remotes/origin/master origin/HEAD set to master Signed-off-by: Christian Schlack <christian@backhub.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:40:17 -07:00
Derrick Stolee	25914c4fde	maintenance: add trace2 regions for task execution Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	4ddc79b2da	maintenance: add auto condition for commit-graph task Instead of writing a new commit-graph in every 'git maintenance run --auto' process (when maintenance.commit-graph.enalbed is configured to be true), only write when there are "enough" commits not in a commit-graph file. This count is controlled by the maintenance.commit-graph.auto config option. To compute the count, use a depth-first search starting at each ref, and leaving markers using the SEEN flag. If this count reaches the limit, then terminate early and start the task. Otherwise, this operation will peel every ref and parse the commit it points to. If these are all in the commit-graph, then this is typically a very fast operation. Users with many refs might feel a slow-down, and hence could consider updating their limit to be very small. A negative value will force the step to run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	916d0626c2	maintenance: use pointers to check --auto The 'git maintenance run' command has an '--auto' option. This is used by other Git commands such as 'git commit' or 'git fetch' to check if maintenance should be run after adding data to the repository. Previously, this --auto option was only used to add the argument to the 'git gc' command as part of the 'gc' task. We will be expanding the other tasks to perform a check to see if they should do work as part of the --auto flag, when they are enabled by config. First, update the 'gc' task to perform the auto check inside the maintenance process. This prevents running an extra 'git gc --auto' command when not needed. It also shows a model for other tasks. Second, use the 'auto_condition' function pointer as a signal for whether we enable the maintenance task under '--auto'. For instance, we do not want to enable the 'fetch' task in '--auto' mode, so that function pointer will remain NULL. Now that we are not automatically calling 'git gc', a test in t5514-fetch-multiple.sh must be changed to watch for 'git maintenance' instead. We continue to pass the '--auto' option to the 'git gc' command when necessary, because of the gc.autoDetach config option changes behavior. Likely, we will want to absorb the daemonizing behavior implied by gc.autoDetach as a maintenance.autoDetach config option. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	65d655b52d	maintenance: create maintenance.<task>.enabled config Currently, a normal run of "git maintenance run" will only run the 'gc' task, as it is the only one enabled. This is mostly for backwards- compatible reasons since "git maintenance run --auto" commands replaced previous "git gc --auto" commands after some Git processes. Users could manually run specific maintenance tasks by calling "git maintenance run --task=<task>" directly. Allow users to customize which steps are run automatically using config. The 'maintenance.<task>.enabled' option then can turn on these other tasks (or turn off the 'gc' task). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	d7514f6ed5	maintenance: take a lock on the objects directory Performing maintenance on a Git repository involves writing data to the .git directory, which is not safe to do with multiple writers attempting the same operation. Ensure that only one 'git maintenance' process is running at a time by holding a file-based lock. Simply the presence of the .git/maintenance.lock file will prevent future maintenance. This lock is never committed, since it does not represent meaningful data. Instead, it is only a placeholder. If the lock file already exists, then no maintenance tasks are attempted. This will become very important later when we implement the 'prefetch' task, as this is our stop-gap from creating a recursive process loop between 'git fetch' and 'git maintenance run --auto'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	090511bc0b	maintenance: add --task option A user may want to only run certain maintenance tasks in a certain order. Add the --task=<task> option, which allows a user to specify an ordered list of tasks to run. These cannot be run multiple times, however. Here is where our array of maintenance_task pointers becomes critical. We can sort the array of pointers based on the task order, but we do not want to move the struct data itself in order to preserve the hashmap references. We use the hashmap to match the --task=<task> arguments into the task struct data. Keep in mind that the 'enabled' member of the maintenance_task struct is a placeholder for a future 'maintenance.<task>.enabled' config option. Thus, we use the 'enabled' member to specify which tasks are run when the user does not specify any --task=<task> arguments. The 'enabled' member should be ignored if --task=<task> appears. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	663b2b1b90	maintenance: add commit-graph task The first new task in the 'git maintenance' builtin is the 'commit-graph' task. This updates the commit-graph file incrementally with the command git commit-graph write --reachable --split By writing an incremental commit-graph file using the "--split" option we minimize the disruption from this operation. The default behavior is to merge layers until the new "top" layer is less than half the size of the layer below. This provides quick writes most of the time, with the longer writes following a power law distribution. Most importantly, concurrent Git processes only look at the commit-graph-chain file for a very short amount of time, so they will verly likely not be holding a handle to the file when we try to replace it. (This only matters on Windows.) If a concurrent process reads the old commit-graph-chain file, but our job expires some of the .graph files before they can be read, then those processes will see a warning message (but not fail). This could be avoided by a future update to use the --expire-time argument when writing the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	3103e9848f	maintenance: initialize task array In anticipation of implementing multiple maintenance tasks inside the 'maintenance' builtin, use a list of structs to describe the work to be done. The struct maintenance_task stores the name of the task (as given by a future command-line argument) along with a function pointer to its implementation and a boolean for whether the step is enabled. A list these structs are initialized with the full list of implemented tasks along with a default order. For now, this list only contains the "gc" task. This task is also the only task enabled by default. The run subcommand will return a nonzero exit code if any task fails. However, it will attempt all tasks in its loop before returning with the failure. Also each failed task will print an error message. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	a95ce12430	maintenance: replace run_auto_gc() The run_auto_gc() method is used in several places to trigger a check for repo maintenance after some Git commands, such as 'git commit' or 'git fetch'. To allow for extra customization of this maintenance activity, replace the 'git gc --auto [--quiet]' call with one to 'git maintenance run --auto [--quiet]'. As we extend the maintenance builtin with other steps, users will be able to select different maintenance activities. Rename run_auto_gc() to run_auto_maintenance() to be clearer what is happening on this call, and to expose all callers in the current diff. Rewrite the method to use a struct child_process to simplify the calls slightly. Since 'git fetch' already allows disabling the 'git gc --auto' subprocess, add an equivalent option with a different name to be more descriptive of the new behavior: '--[no-]maintenance'. Update the documentation to include these options at the same time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	3ddaad0e06	maintenance: add --quiet option Maintenance activities are commonly used as steps in larger scripts. Providing a '--quiet' option allows those scripts to be less noisy when run on a terminal window. Turn this mode on by default when stderr is not a terminal. Pipe the option to the 'git gc' child process. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	2057d75038	maintenance: create basic maintenance runner The 'gc' builtin is our current entrypoint for automatically maintaining a repository. This one tool does many operations, such as repacking the repository, packing refs, and rewriting the commit-graph file. The name implies it performs "garbage collection" which means several different things, and some users may not want to use this operation that rewrites the entire object database. Create a new 'maintenance' builtin that will become a more general- purpose command. To start, it will only support the 'run' subcommand, but will later expand to add subcommands for scheduling maintenance in the background. For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin. In fact, the only option is the '--auto' toggle, which is handed directly to the 'gc' builtin. The current change is isolated to this simple operation to prevent more interesting logic from being lost in all of the boilerplate of adding a new builtin. Use existing builtin/gc.c file because we want to share code between the two builtins. It is possible that we will have 'maintenance' replace the 'gc' builtin entirely at some point, leaving 'git gc' as an alias for some specific arguments to 'git maintenance run'. Create a new test_subcommand helper that allows us to test if a certain subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a file. A negation mode is available that will be used in later tests. Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:04 -07:00
Lin Sun	dbd8c09bfe	mergetool: allow auto-merge for meld to follow the vim-diff behavior Make the mergetool used with "meld" backend behave similarly to "vimdiff" by telling it to auto-merge non-conflicting parts and highlight the conflicting parts when `mergetool.meld.useAutoMerge` is configured with `true`, or `auto` for detecting the `--auto-merge` option automatically. Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Helped-by: David Aguilar <davvid@gmail.com> Signed-off-by: Lin Sun <lin.sun@zoom.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-16 17:11:20 -07:00
Aaron Lipman	21bf933928	ref-filter: allow merged and no-merged filters Enable ref-filter to process multiple merged and no-merged filters, and extend functionality to git branch, git tag and git for-each-ref. This provides an easy way to check for branches that are "graduation candidates:" $ git branch --no-merged master --merged next If passed more than one merged (or more than one no-merged) filter, refs must be reachable from any one of the merged commits, and reachable from none of the no-merged commits. Signed-off-by: Aaron Lipman <alipman88@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-16 12:38:10 -07:00
Alex Riesen	4e735c1326	fetch: pass --ipv4 and --ipv6 options to sub-fetches The options indicate user intent for the whole fetch operation, and ignoring them in sub-fetches (i.e. "--all" and recursive fetching of submodules) is quite unexpected when, for instance, it is intended to limit all of the communication to a specific transport protocol for some reason. Signed-off-by: Alex Riesen <alexander.riesen@cetitec.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-15 14:15:05 -07:00

... 10 11 12 13 14 ...

9307 Commits