admin/git

Author	SHA1	Message	Date
Patrick Steinhardt	99cf4d6d35	replace-object: use hash algorithm from passed-in repository In `register_replace_ref()`, we pass in a repository but then use `get_oid_hex()` to parse passed-in object IDs, which implicitly uses `the_repository`. Fix this by using the hash algorithm from the passed-in repository instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	58650befd9	protocol-caps: use hash algorithm from passed-in repository In `send_info()`, we pass in a repository but then use `get_oid_hex()` to parse passed-in object IDs, which implicitly uses `the_repository`. Fix this by using the hash algorithm from the passed-in repository instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	f2c32a66f5	oidset: pass hash algorithm when parsing file The `oidset_parse_file_carefully()` function implicitly depends on `the_repository` when parsing object IDs. Fix this by having callers pass in the hash algorithm to use. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	afa2c6ddc8	http-fetch: don't crash when parsing packfile without a repo The git-http-fetch(1) command accepts a `--packfile=` option, which allows the user to specify that it shall fetch a specific packfile, only. The parameter here is the hash of the packfile, which is specific to the object hash used by the repository. This requirement is implicit though via our use of `parse_oid_hex()`, which internally uses `the_repository`. The git-http-fetch(1) command allows for there to be no repository though, which only exists such that we can show usage via the "-h" option. In that case though, starting with `c8aed5e8da` (repository: stop setting SHA1 as the default object hash, 2024-05-07), `the_repository` does not have its object hash initialized anymore and thus we would crash when trying to parse the object ID outside of a repository. Fix this issue by dying immediately when we see a "--packfile=" parameter when outside a Git repository. This is not a functional regression as we would die later on with the same error anyway. Add a test to detect the segfault. We use the "nongit" function to do so, which we need to allow-list in `test_must_fail ()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	8a676bdc5c	hash-ll: merge with "hash.h" The "hash-ll.h" header was introduced via `d1cbe1e6d8` (hash-ll.h: split out of hash.h to remove dependency on repository.h, 2023-04-22) to make explicit the split between hash-related functions that rely on the global `the_repository`, and those that don't. This split is no longer necessary now that we we have removed the reliance on `the_repository`. Merge "hash-ll.h" back into "hash.h". This causes some code units to not include "repository.h" anymore, which requires us to add some forward declarations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	36026a0f30	refs: avoid include cycle with "repository.h" There is an include cycle between "refs.h" and "repository.h" via "commit.h", "object.h" and "hash.h". This has the effect that several definitions of structs and enums will not be visible once we merge "hash-ll.h" back into "hash.h" in the next commit. The only reason that "repository.h" includes "refs.h" is the definition of `enum ref_storage_format`. Move it into "repository.h" and have "refs.h" include "repository.h" instead to fix the cycle. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	e7da938570	global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	7abbca0e74	hash: require hash algorithm in `empty_tree_oid_hex()` The `empty_tree_oid_hex()` function use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. While at it, remove the unused `empty_blob_oid_hex()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	9c34eb93fb	hash: require hash algorithm in `is_empty_{blob,tree}_oid()` Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	861e8c76f6	hash: make `is_null_oid()` independent of `the_repository` The function `is_null_oid()` uses `oideq(oid, null_oid())` to check whether a given object ID is the all-zero object ID. `null_oid()` implicitly relies on `the_repository` though to return the correct null object ID. Get rid of this dependency by always comparing the complete hash array for being all-zeroes. This is possible due to the refactoring of object IDs so that their hash arrays are always fully initialized. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	d4d364b2c7	hash: convert `oidcmp()` and `oideq()` to compare whole hash With the preceding commit, the hash array of object IDs is now fully zero-padded even when the hash algorithm's output is smaller than the array length. With that, we can now adapt both `oidcmp()` and `oideq()` to unconditionally memcmp(3P) the whole array instead of depending on the hash size. While it may feel inefficient to compare unused bytes for e.g. SHA-1, in practice the compiler should now be able to produce code that is better optimized both because we have no branch anymore, but also because the size to compare is now known at compile time. Goldbolt spits out the following assembly on an x86_64 platform with GCC 14.1 for the old and new implementations of `oidcmp()`: oidcmp_old: movsx rax, DWORD PTR [rdi+32] test eax, eax jne .L2 mov rax, QWORD PTR the_repository[rip] cmp QWORD PTR [rax+16], 32 je .L6 .L4: mov edx, 20 jmp memcmp .L2: lea rdx, [rax+rax2] lea rax, [rax+rdx4] lea rax, hash_algos[0+rax8] cmp QWORD PTR [rax+16], 32 jne .L4 .L6: mov edx, 32 jmp memcmp oidcmp_new: mov edx, 32 jmp memcmp The new implementation gets ridi of all the branches and effectively only ends setting up `edx` for `memcmp()` and then calling it. And for `oideq()`: oideq_old: movsx rcx, DWORD PTR [rdi+32] mov rax, rdi mov rdx, rsi test ecx, ecx jne .L2 mov rcx, QWORD PTR the_repository[rip] cmp QWORD PTR [rcx+16], 32 mov rcx, QWORD PTR [rax] je .L12 .L4: mov rsi, QWORD PTR [rax+8] xor rcx, QWORD PTR [rdx] xor rsi, QWORD PTR [rdx+8] or rcx, rsi je .L13 .L8: mov eax, 1 test eax, eax sete al movzx eax, al ret .L2: lea rsi, [rcx+rcx2] lea rcx, [rcx+rsi4] lea rcx, hash_algos[0+rcx8] cmp QWORD PTR [rcx+16], 32 mov rcx, QWORD PTR [rax] jne .L4 .L12: mov rsi, QWORD PTR [rax+8] xor rcx, QWORD PTR [rdx] xor rsi, QWORD PTR [rdx+8] or rcx, rsi jne .L8 mov rcx, QWORD PTR [rax+16] mov rax, QWORD PTR [rax+24] xor rcx, QWORD PTR [rdx+16] xor rax, QWORD PTR [rdx+24] or rcx, rax jne .L8 xor eax, eax .L14: test eax, eax sete al movzx eax, al ret .L13: mov edi, DWORD PTR [rdx+16] cmp DWORD PTR [rax+16], edi jne .L8 xor eax, eax jmp .L14 oideq_new: mov rax, QWORD PTR [rdi] mov rdx, QWORD PTR [rdi+8] xor rax, QWORD PTR [rsi] xor rdx, QWORD PTR [rsi+8] or rax, rdx je .L5 .L2: mov eax, 1 xor eax, 1 ret .L5: mov rax, QWORD PTR [rdi+16] mov rdx, QWORD PTR [rdi+24] xor rax, QWORD PTR [rsi+16] xor rdx, QWORD PTR [rsi+24] or rax, rdx jne .L2 xor eax, eax xor eax, 1 ret Interestingly, the compiler decides to split the comparisons into two so that it first compares the lower half of the object ID for equality and then the upper half. If the first check shows a difference, then we wouldn't even end up comparing the second half. In both cases, the new generated code is significantly shorter and has way less branches. While I didn't benchmark the change, I'd be surprised if the new code was slower. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	c98d762ed9	global: ensure that object IDs are always padded The `oidcmp()` and `oideq()` functions only compare the prefix length as specified by the given hash algorithm. This mandates that the object IDs have a valid hash algorithm set, or otherwise we wouldn't be able to figure out that prefix. As we do not have a hash algorithm in many cases, for example when handling null object IDs, this assumption cannot always be fulfilled. We thus have a fallback in place that instead uses `the_repository` to derive the hash function. This implicit dependency is hidden away from callers and can be quite surprising, especially in contexts where there may be no repository. In theory, we can adapt those functions to always memcmp(3P) the whole length of their hash arrays. But there exist a couple of sites where we populate `struct object_id`s such that only the prefix of its hash that is actually used by the hash algorithm is populated. The remaining bytes are left uninitialized. The fact that those bytes are uninitialized also leads to warnings under Valgrind in some places where we copy those bytes. Refactor callsites where we populate object IDs to always initialize all bytes. This also allows us to get rid of `oidcpy_with_padding()`, for one because the input is now fully initialized, and because `oidcpy()` will now always copy the whole hash array. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	9da95bda74	hash: require hash algorithm in `oidread()` and `oidclr()` Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	f4836570a7	hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	129cb1b99d	hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions The functions `is_empty_{blob,tree}_sha1()` are mostly unused, except for a single callsite in "read-cache.c". Most callsites have long since been converted to use the equivalents that accept a `struct object_id` instead of a string. Adapt the remaining callsite and drop those functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Jeff King	aecd794fca	remote: drop checks for zero-url case Now that the previous commit removed the possibility that a "struct remote" will ever have zero url fields, we can drop a number of redundant checks and untriggerable code paths. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:39 -07:00
Jeff King	ffce821880	remote: always require at least one url in a remote When we return a struct from remote_get(), the result _almost_ always has at least one url. In remotes_remote_get_1(), we do this: if (name_given && !valid_remote(ret)) add_url_alias(remote_state, ret, name); if (!valid_remote(ret)) return NULL; So if the remote doesn't have a url, we give it one based on the name (this is how unconfigured urls are used as remotes). And if that doesn't work, we return NULL. But there's a catch: valid_remote() checks that we have at least one url _unless_ the remote.*.vcs field is set. This comes from `c578f51d52` (Add a config option for remotes to specify a foreign vcs, 2009-11-18), and the whole idea was to support remote helpers that don't have their own url. However, that mode has been broken since `25d5cc488a` (Pass unknown protocols to external protocol handlers, 2009-12-09)! That commit unconditionally looks at the url in get_helper(), causing a segfault with something like: git -c remote.foo.vcs=bar fetch foo We could fix that now, of course. But given that it has been broken for almost 15 years and nobody noticed, there's a better option. This weird "there might not be a url" special case requires checks all over the code base, and it's not clear if there are other similar segfaults lurking. It would be nice if we could drop that special case. So instead, let's let the "the remote name is the url" code kick in. If you have "remote.foo.vcs", then your url (unless otherwise configured) is "foo". This does have a visible effect compared to what `25d5cc488a` was trying to do. The idea back then is that for a remote without a url, we'd run: # only one command-line option! git-remote-bar foo whereas with our default url, now we'll run: git-remote-bar foo foo Again, in practice nobody can be relying on this because it has been segfaulting for 15 years. We should consider just removing this "vcs" config option entirely, but that would be a user-visible breakage. So by fixing it this way, we can keep things working that have been working, and simplify away one special case inside our code. This fixes the segfault from `25d5cc488a` (demonstrated by the test), and we can build further cleanups on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	7384e75618	t5801: test remote..vcs config The usual way to trigger a remote helper is to use the "::" syntax from: `87422439d1` (Allow specifying the remote helper in the url, 2009-11-18). Doing: git config remote.origin.url hg::https://example.com/repo will run "git-remote-hg origin https://example.com/repo". Or you can use the fallback handling from `25d5cc488a` (Pass unknown protocols to external protocol handlers, 2009-12-09): git config remote.origin.url "foo://bar" which will run "git-remote-foo origin foo://bar". But there's a third way, from `c578f51d52` (Add a config option for remotes to specify a foreign vcs, 2009-11-18): git config remote.origin.vcs foo git config remote.origin.url bar which will run "git-remote-foo origin bar". This is mostly redundant with the other methods, except that it is supposed to allow you to run without a URL at all. So: git config remote.origin.vcs foo would run "git-remote-foo origin" with no extra URL parameter (under the assumption that the helper somehow knows how to access the remote repo). However, this mode has been broken since `25d5cc488a`, shortly after it was added! That commit taught the transport code to always look at the URL string to parse off the "foo::" bits, meaning it would always segfault in the no-url case. You can see that with: git -c remote.foo.vcs=bar fetch foo Nobody seems to have noticed in the almost 15 years since, so presumably it's not a well-used feature. And without that, arguably the whole remote..vcs feature could be removed entirely, as it isn't offering anything you couldn't do with the "helper::" syntax. But it _does_ work if you have a URL, and it has been advertised in the documentation for all that time. So we shouldn't just remove it without warning. Likewise, even if we were going to deprecate it, we should avoid breaking it in the meantime. Since there are no tests for it at all, let's add a few basic ones: - this syntax doesn't work well with "git clone" (another point against it versus "helper::"). But we can use "clone -c" to set up the config manually, passing the URL as usual to clone. This does work, though note that I had to use --no-local in the test to avoid broken interactions between the local code and the helper. In the real world this would be a non-issue, since the remote URL would generally not also be a local Git repo! - likewise, we should be able to set up the config manually and fetch into a repository. This also works. - we can simulate a vcs that has no URL support by stuffing the remote path into another environment variable. This should work, but doesn't (it hits the segfault mentioned above). In the first two cases, I took the extra step of checking GIT_TRACE output to confirm that we actually ran the helper (since the URL is a valid Git repo, the clone/fetch would appear to work even if we didn't use the helper at all!). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	e2269a2b59	t5801: make remote-testgit GIT_DIR setup more robust Our tests use a fake helper that just imports from an existing Git repository. We're fed the path to that repo on the command line, and derive the GIT_DIR by tacking on "/.git". This is wrong if the path is a bare repository, but that's OK since this is just a limited test. But it's also wrong if the transport code feeds us the actual .git directory itself (i.e., we expect "/path/to/repo" but it gives us "/path/to/repo/.git"). None of the current tests do that, but let's future-proof ourselves against adding a test that does. We can instead ask "rev-parse" to set our GIT_DIR. Note that we have to first unset other git variables from our environment. Coming into this script, we'll have GIT_DIR set to the fetching repository, and we need to "switch" to the remote one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	9badf97c42	remote: allow resetting url list Because remote.*.url is treated as a multi-valued key, there is no way to override previous config. So for example if you have remote.origin.url set to some wrong value, doing: git -c remote.origin.url=right fetch would not work. It would append "right" to the list, which means we'd still fetch from "wrong" (since subsequent values are used only as push urls). Let's provide a mechanism to reset the list, like we do for other multi-valued keys (e.g., credential.helper, http.extraheaders, and merge.suppressDest all use this "empty string means reset" pattern). Reported-by: Mathew George <mathewegeorge@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	bd1b88dc7a	config: document remote..url/pushurl interaction The documentation for these keys gives a very terse definition and points you to the fetch/push manpages. But from reading those pages it was not at all obvious to me that: - these are keys that can be defined multiple times with meaningful behavior (especially remote..url) - the way that pushurl overrides url (the git-push page does mention that "pushurl defaults to url", but it is not immediately clear what a multi-valued url would do in that situation). Let's try to summarize the current behavior. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	b68118d2e8	remote: simplify url/pushurl selection When we want to know the push urls for a remote, there is some simple logic: - if the user configured any remote..pushurl keys, then those make the complete set of push urls - otherwise we push to all urls in remote..url Many spots implement this with a level of indirection, assigning to a local url/url_nr pair. But since both arrays are now strvecs, we can just use a pointer to select the appropriate strvec, shortening the code a bit. Even though this is now a one-liner, since it is application logic that is present in so many places, it's worth abstracting a helper function. In fact, we already have such a function, but it's local to builtin/push.c. So we'll just make it available everywhere via remote.h. There are two spots to pay special attention to here: 1. in builtin/remote.c's get_url(), we are selecting first based on push_mode and then falling back to "url" when we're in push_mode but no pushurl is defined. The updated code makes that much more clear, compared to the original which had an "else" fall-through. 2. likewise in that file's set_url(), we _only_ respect push_mode, sine the point is that we are adding to pushurl in that case (whether it is empty or not). And thus it does not use our helper function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	8e804415fd	remote: use strvecs to store remote url/pushurl Now that the url/pushurl fields of "struct remote" own their strings, we can switch from bare arrays to strvecs. This has a few advantages: - push/clear are now one-liners - likewise the free+assigns in alias_all_urls() can use strvec_replace() - we now use size_t for storage, avoiding possible overflow - this will enable some further cleanups in future patches There's quite a bit of fallout in the code that reads these fields, as it tends to access these arrays directly. But it's mostly a mechanical replacement of "url_nr" with "url.nr", and "url[i]" with "url.v[i]", with a few variations (e.g. "url" could become "url.v", but I used "url.v[0]" for consistency). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	52595c155a	remote: transfer ownership of memory in add_url(), etc Many of the internal functions in remote.c take const strings and store them forever in instances of "struct remote". Since the functions are internal and callers are aware of the convention, this seems to mostly work and not cause leaks. But there are some issues: - it's impossible to clear any of the arrays, because the data dependencies between them are too muddled (if you free() a string, it might also be referenced from another array, causing a user-after-free; but if you don't, that might be the last reference, causing a leak). This is mostly of interest for further refactoring and features, but there's at least one spot that's already a problem. In alias_all_urls(), we replace elements of remote->url and remote->pushurl with their aliased forms, dropping references to the original. - sometimes strings from outside callers make their way in. For example, calling remote_get("foo") when there is no configured "foo" remote will create a remote struct with the single url "foo". But we'll do so by holding on to the string passed to remote_get() forever. In practice I think this works out because we'd usually pass in a string that lasts the length of the program (a string literal, or argv reference, or other data structure allocated in the main function). But it's a rather subtle requirement. Instead, let's have remote->url and remote->pushurl own their string memory. They'll copy the const strings that are passed in, and callers can stop making their own copies. Likewise, when we overwrite an entry, we can free the memory it points to, fixing the leak mentioned above. We'll leave the struct members as "const" since they are visible to the outside world, and shouldn't usually be touched. This requires casting on free() for now, but we'll clean that further in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:37 -07:00
Jeff King	aa0595fbd6	remote: refactor alias_url() memory ownership The alias_url() function may return either a newly allocated string (which the caller must take ownership of), or the original const "url" parameter that was passed in. This often works OK because callers are generally passing in a "url" that they expect to retain ownership of anyway. So whether we got back the original or a new string, we're always interested in storing it forever. But I suspect there are some possible leaks here (e.g., add_url_alias() may end up discarding the original "url"). Whether there are active leaks or not, this is a confusing setup that makes further refactoring of memory ownership harder. So instead of returning the original string, return NULL, forcing callers to decide what to do with it explicitly. We can then build further cleanups on top of that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:37 -07:00
Jeff King	0295ce7cbf	archive: fix check for missing url Running "git archive --remote" checks that we have at least one url for the remote. It does so by looking at remote.url[0], but that won't work; if we have no url at all, then remote.url will be NULL, and we'll segfault. Check url_nr instead, which is a more direct way of asking what we want. You can trigger the segfault like this: git -c remote.foo.vcs=bar archive --remote=foo but I didn't bother adding a test. This is the tip of the iceberg for no-url remotes, and a later patch will improve that situation. I just wanted to clean up this bug so it didn't make further refactoring of this code more confusing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:37 -07:00
Patrick Steinhardt	028bb23a61	BreakingChanges: document that we do not plan to deprecate git-checkout The git-checkout(1) command is seen by many as hard to understand because it connects two somewhat unrelated features: switching between branches and restoring worktree files from arbitrary revisions. In 2019, we thus implemented two new commands git-switch(1) and git-restore(1) to split out these separate concerns into standalone functions. This "replacement" of git-checkout(1) has repeatedly triggered concerns for our userbase that git-checkout(1) will eventually go away. This is not the case though: the use of that command is still widespread, and it is not expected that this will change anytime soon. Document that all three commands will remain for the foreseeable future. This decision may be revisited in case we ever figure out that most everyone has given up on any of the commands. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:08:52 -07:00
Patrick Steinhardt	fcf0f4801d	BreakingChanges: document removal of grafting The grafting mechanism for objects has been deprecated in `e650d0643b` (docs: mark info/grafts as outdated, 2014-03-05), which is more than a decade ago. The mechanism can lead to hard-to-debug issues and has a superior replacement with replace refs. Follow through with the deprecation and mark grafts for removal in Git 3.0. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:08:52 -07:00
Patrick Steinhardt	6ccf041d1d	BreakingChanges: document upcoming change from "sha1" to "sha256" Starting with `8e42eb0e9a` (doc: sha256 is no longer experimental, 2023-07-31), the "sha256" object format is no longer considered to be experimental. Furthermore, the SHA-1 hash function is actively recommended against by for example NIST and FIPS 140-2, and attacks against it are becoming more practical both due to new weaknesses (SHAppening, SHAttered, Shambles) and due to the ever-increasing computing power. It is only a matter of time before it can be considered to be broken completely. Let's plan for this event by being active instead of waiting for it to happend and announce that the default object format is going to change from "sha1" to "sha256" with Git 3.0. All major Git implementations (libgit2, JGit, go-git) support the "sha256" object format and are thus prepared for this change. The most important missing piece in the puzzle is support in forges. But while GitLab recently gained experimental support for the "sha256" object format though, to the best of my knowledge GitHub doesn't support it yet. Ideally, announcing this upcoming change will encourage forges to start building that support. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:08:52 -07:00
Patrick Steinhardt	57ec9254eb	docs: introduce document to announce breaking changes Over time, Git has grown quite a lot. With this evolution, many ideas that were sensible at the time they were introduced are not anymore and are thus considered to be deprecated. And while some deprecations may be noted in manpages, most of them are actually deprecated in the "hive mind" of the Git community, only. Introduce a new document that tracks such breaking changes, but also deprecations which we are not willing to go through with, to address this issue. This document serves multiple purposes: - It is a way to facilitate discussion around proposed deprecations. - It allows users to learn about deprecations and speak up in case they have good reasons why a certain feature should not be deprecated. - It states intent and documents where the Git project wants to go, both in the case where we want to deprecate, but also in the case where we don't want to deprecate a specific feature. The document is _not_ intended to cast every single discussion into stone. It is supposed to be a living document that may change over time when there are good reasons for it to change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:08:52 -07:00
Junio C Hamano	10aa7c74a2	Merge branch 'gt/unit-test-oidtree' into ps/use-the-repository * gt/unit-test-oidtree: t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c	2024-06-13 09:39:46 -07:00
Junio C Hamano	092b33da2b	Merge branch 'ps/ref-storage-migration' into ps/use-the-repository * ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version	2024-06-13 09:39:08 -07:00
Derrick Stolee	f1160393c1	commit-graph: increment progress indicator This fixes a bug that was introduced by `368d19b0b7` (commit-graph: refactor compute_topological_levels(), 2023-03-20): Previously, the progress indicator was updated from `i + 1` where `i` is the loop variable of the enclosing `for` loop. After this patch, the update used `info->progress_cnt + 1` instead, however, unlike `i`, the `progress_cnt` attribute was not incremented. Let's increment it. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> [jc: squashed in a test update from Patrick Steinhardt] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-12 13:52:14 -07:00
Junio C Hamano	d63586cb31	The thirteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-12 13:37:18 -07:00
Junio C Hamano	2a061a62e2	Merge branch 'gt/decorate-unit-test' A test helper that essentially is unit tests on the "decorate" logic has been rewritten using the unit-tests framework. * gt/decorate-unit-test: t/: migrate helper/test-example-decorate to the unit testing framework	2024-06-12 13:37:18 -07:00
Junio C Hamano	51ea70c18a	Merge branch 'jk/sparse-leakfix' Many memory leaks in the sparse-checkout code paths have been plugged. * jk/sparse-leakfix: sparse-checkout: free duplicate hashmap entries sparse-checkout: free string list after displaying sparse-checkout: free pattern list in sparse_checkout_list() sparse-checkout: free sparse_filename after use sparse-checkout: refactor temporary sparse_checkout_patterns sparse-checkout: always free "line" strbuf after reading input sparse-checkout: reuse --stdin buffer when reading patterns dir.c: always copy input to add_pattern() dir.c: free removed sparse-pattern hashmap entries sparse-checkout: clear patterns when init() sees existing sparse file dir.c: free strings in sparse cone pattern hashmaps sparse-checkout: pass string literals directly to add_pattern() sparse-checkout: free string list in write_cone_to_file()	2024-06-12 13:37:17 -07:00
Junio C Hamano	c2f79440ac	Merge branch 'jk/cap-exclude-file-size' An overly large ".gitignore" files are now rejected silently. * jk/cap-exclude-file-size: dir.c: reduce max pattern file size to 100MB dir.c: skip .gitignore, etc larger than INT_MAX	2024-06-12 13:37:17 -07:00
Junio C Hamano	b8bdb2f283	Merge branch 'jc/safe-directory-leading-path' The safe.directory configuration knob has been updated to optionally allow leading path matches. * jc/safe-directory-leading-path: safe.directory: allow "lead/ing/path/*" match	2024-06-12 13:37:16 -07:00
Junio C Hamano	22cf18fd9e	Merge branch 'gt/t-hash-unit-test' A pair of test helpers that essentially are unit tests on hash algorithms have been rewritten using the unit-tests framework. * gt/t-hash-unit-test: t/: migrate helper/test-{sha1, sha256} to unit-tests/t-hash strbuf: introduce strbuf_addstrings() to repeatedly add a string	2024-06-12 13:37:15 -07:00
Junio C Hamano	56346ba24e	Merge branch 'cp/reftable-unit-test' Basic unit tests for reftable have been reimplemented under the unit test framework. * cp/reftable-unit-test: t: improve the test-case for parse_names() t: add test for put_be16() t: move tests from reftable/record_test.c to the new unit test t: move tests from reftable/stack_test.c to the new unit test t: move reftable/basics_test.c to the unit testing framework	2024-06-12 13:37:14 -07:00
Junio C Hamano	a39e28ace7	Merge branch 'jc/t1517-more' A new test was added to ensure git commands that are designed to run outside repositories do work. * jc/t1517-more: imap-send: minimum leakfix t1517: more coverage for commands that work without repository	2024-06-12 13:37:14 -07:00
Ghanshyam Thakkar	ed54840872	t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c helper/test-oidtree.c along with t0069-oidtree.sh test the oidtree.h library, which is a wrapper around crit-bit tree. Migrate them to the unit testing framework for better debugging and runtime performance. Along with the migration, add an extra check for oidtree_each() test, which showcases how multiple expected matches can be given to check_each() helper. To achieve this, introduce a new library called 'lib-oid.h' exclusively for the unit tests to use. It currently mainly includes utility to generate object_id from an arbitrary hex string (i.e. '12a' -> '12a0000000000000000000000000000000000000'). This also handles the hash algo selection based on GIT_TEST_DEFAULT_HASH. This library will also be helpful when we port other unit tests such as oid-array, oidset etc. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> [jc: small fixlets squashed in] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-12 13:33:20 -07:00
Patrick Steinhardt	037df60013	object-name: don't try to abbreviate to lengths greater than hexsz When given a length that equals the current hash algorithm's hex size, then `repo_find_unique_abbrev_r()` exits early without trying to find an abbreviation. This is only sensible because there is nothing to abbreviate in the first place, so searching through objects to find a unique prefix would be a waste of compute. What we don't handle though is the case where the user passes a length greater than the hash length. This is fine in practice as we still compute the correct result. But at the very least, this is a waste of resources as we try to abbreviate a value that cannot be abbreviated, which causes us to hit the object database. Start to explicitly handle values larger than hexsz to avoid this performance penalty, which leads to a measureable speedup. The following benchmark has been executed in linux.git: Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~) Time (mean ± σ): 12.812 s ± 0.040 s [User: 12.225 s, System: 0.554 s] Range (min … max): 12.723 s … 12.857 s 10 runs Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD) Time (mean ± σ): 11.095 s ± 0.029 s [User: 10.546 s, System: 0.521 s] Range (min … max): 11.037 s … 11.122 s 10 runs Summary git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran 1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-12 12:57:18 -07:00
Patrick Steinhardt	59ff92c516	parse-options-cb: stop clamping "--abbrev=" to hash length The `OPT__ABBREV()` option allows the user to specify the length that object hashes shall be abbreviated to. This length needs to be in the range of `(MIN_ABBREV, the_hash_algo->hexsz)`, which is why we clamp the value as required. While this makes sense in the case of `MIN_ABBREV`, it is unnecessary for the upper boundary as the value is eventually passed down to `repo_find_unnique_abbrev_r()`, which handles values larger than the current hash length just fine. In the preceding commit, we have changed parsing of the "core.abbrev" config to stop clamping to the upper boundary. Let's do the same here so that the code becomes simpler, we are consistent with how we treat the "core.abbrev" config and so that we stop depending on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-12 12:57:18 -07:00
Patrick Steinhardt	524c0183c9	config: fix segfault when parsing "core.abbrev" without repo The "core.abbrev" config allows the user to specify the minimum length when abbreviating object hashes. Next to the values "auto" and "no", this config also accepts a concrete length that needs to be bigger or equal to the minimum length and smaller or equal to the hash algorithm's hex length. While the former condition is trivial, the latter depends on the object format used by the current repository. It is thus a variable upper boundary that may either be 40 (SHA-1) or 64 (SHA-256). This has two major downsides. First, the user that specifies this config must be aware of the object hashes that its repository use. If they want to configure the value globally, then they cannot pick any value in the range `[41, 64]` if they have any repository that uses SHA-1. If they did, Git would error out when parsing the config. Second, and more importantly, parsing "core.abbrev" crashes when outside of a Git repository because we dereference `the_hash_algo` to figure out its hex length. Starting with `c8aed5e8da` (repository: stop setting SHA1 as the default object hash, 2024-05-07) though, we stopped initializing `the_hash_algo` outside of Git repositories. Fix both of these issues by not making it an error anymore when the given length exceeds the hash length. Instead, leave the abbreviated length intact. `repo_find_unique_abbrev_r()` handles this just fine except for a performance penalty which we will fix in a subsequent commit. Reported-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-12 12:57:18 -07:00
Patrick Steinhardt	1d969afb78	Makefile: add ability to append to CFLAGS and LDFLAGS There are some usecases where we may want to append CFLAGS to the default CFLAGS set by Git. This could for example be to enable or disable specific compiler warnings or to change the optimization level that code is compiled with. This cannot be done without overriding the complete CFLAGS value though and thus requires the user to redeclare the complete defaults used by Git. Introduce a new variable `CFLAGS_APPEND` that gets appended to the default value of `CFLAGS`. As compiler options are last-one-wins, this fulfills both of the usecases mentioned above. It's also common practice across many other projects to have such a variable. While at it, also introduce a matching `LDFLAGS_APPEND` variable. While there isn't really any need for this variable as there are no default `LDFLAGS`, users may expect this variable to exist, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-11 16:11:43 -07:00
Taylor Blau	e162aed591	pack-revindex.c: guard against out-of-bounds pack lookups The function midx_key_to_pack_pos() is a helper function used by midx_to_pack_pos() and midx_pair_to_pack_pos() to translate a (pack, offset) tuple into a position into the MIDX pseudo-pack order. Ensure that the pack ID given to midx_pair_to_pack_pos() is bounded by the number of packs within the MIDX to prevent, for instance, uninitialized memory from being used as a pack ID. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-11 16:08:28 -07:00
Taylor Blau	ed4a1d6ae1	pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse When performing multi-pack reuse, reuse_partial_packfile_from_bitmap() is responsible for generating an array of bitmapped_pack structs from which to perform reuse. In the multi-pack case, we loop over the MIDXs packs and copy the result of calling `nth_bitmapped_pack()` to construct the list of reusable paths. But we may also want to do pack-reuse over a single pack, either because we only had one pack to perform reuse over (in the case of single-pack bitmaps), or because we explicitly asked to do single pack reuse even with a MIDX[^1]. When this is the case, the array we generate of reusable packs contains only a single element, which is either (a) the pack attached to the single-pack bitmap, or (b) the MIDX's preferred pack. In `795006fff4` (pack-bitmap: gracefully handle missing BTMP chunks, 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap() function and stopped assigning the pack_int_id field when reusing only the MIDX's preferred pack. This results in an uninitialized read down in try_partial_reuse() like so: ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8 #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8 #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3 #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3 #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27 #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3 #6 0x55c5ccc8fac8 in run_builtin git.c:474:11 which happens when try_partial_reuse() tries to call midx_pair_to_pack_pos() when it tries to reject cross-pack deltas. Avoid the uninitialized read by ensuring that the pack_int_id field is set in the single-pack reuse case by setting it to either the MIDX preferred pack's pack_int_id, or '-1', in the case of single-pack bitmaps. In the latter case, we never read the pack_int_id field, so the choice of '-1' is intentional as a "garbage in, garbage out" measure. Guard against further regressions in this area by adding a test which ensures that we do not throw out deltas from the preferred pack as "cross-pack" due to an uninitialized pack_int_id. [^1]: This can happen for a couple of reasons, either because the repository is configured with 'pack.allowPackReuse=(true\|single)', or because the MIDX was generated prior to the introduction of the BTMP chunk, which contains information necessary to perform multi-pack reuse. Reported-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-11 16:08:28 -07:00
Taylor Blau	0c5a62f14b	midx-write.c: do not read existing MIDX with `packs_to_include` Commit `d6a8c58675` (midx-write.c: support reading an existing MIDX with `packs_to_include`, 2024-05-29) changed the MIDX generation machinery to support reading from an existing MIDX when writing a new one. Unfortunately, the rest of the MIDX generation machinery is not prepared to deal with such a change. For instance, the function responsible for adding to the object ID fanout table from a MIDX source (midx_fanout_add_midx_fanout()) will gladly add objects from an existing MIDX for some fanout level regardless of whether or not those objects came from packs that are to be included in the subsequent MIDX write. This results in broken pseudo-pack object order (leading to incorrect object traversal results) and segmentation faults, like so (generated by running the added test prior to the changes in midx-write.c): #0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590 #1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects", packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15) at midx-write.c:1171 #2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects", packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15) at midx-write.c:1274 [...] In stack frame #0, the code on midx-write.c:590 is using the new pack ID corresponding to some object which was added from the existing MIDX. Importantly, the pack from which that object was selected in the existing MIDX does not appear in the new MIDX as it was excluded via `--stdin-packs`. In this instance, the pack in question had pack ID "1" in the existing MIDX, but since it was excluded from the new MIDX, we never filled in that entry in the pack_perm table, resulting in: (gdb) p ctx->pack_perm@2 $1 = {0, 1515870810} Which is what causes the segfault above when we try and read: struct pack_info pack = &ctx->info[ctx->pack_perm[i]]; if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) pack->bitmap_pos = 0; Fundamentally, we should be able to read information from an existing MIDX when generating a new one. But in practice the midx-write.c code assumes that we won't run into issues like the above with incongruent pack IDs, and often makes those assumptions in extremely subtle and fragile ways. Instead, let's avoid reading from an existing MIDX altogether, and stick with the pre-d6a8c58675 implementation. Harden against any regressions in this area by adding a test which demonstrates these issues. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-11 16:08:28 -07:00
Patrick Steinhardt	fbf7a46d88	builtin/blame: fix leaking ignore revs files When parsing the blame configuration we add "blame.ignoreRevsFile" configs to a string list. This string list is declared as with `NODUP`, and thus we hand over the allocated string to that list. We eventually end up calling `string_list_clear()` on that list, but due to it being declared as `NODUP` we will not release the associated strings and thus leak memory. Fix this issue by setting up the list as `DUP` instead and free the config string after insertion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-11 13:15:08 -07:00

... 42 43 44 45 46 ...

75953 Commits