"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.
* ds/sparse-apply-add-p:
p2000: add performance test for patch-mode commands
reset: integrate sparse index with --patch
git add: make -p/-i aware of sparse index
apply: integrate with the sparse index
Updates to meson-based build procedure.
* rj/build-tweaks-part2:
configure.ac: upgrade to a compilation check for sysinfo
meson.build: correct setting of GIT_EXEC_PATH
meson: correct path to system config/attribute files
meson: correct install location of YAML.pm
meson.build: quote the GITWEBDIR build configuration
"git merge-tree" learned an option to see if it resolves cleanly
without actually creating a result.
* en/merge-tree-check:
merge-tree: add a new --quiet flag
merge-ort: add a new mergeability_only option
Support to create a loose object file with unknown object type has
been dropped.
* jk/no-funny-object-types:
object-file: drop support for writing objects with unknown types
hash-object: handle --literally with OPT_NEGBIT
hash-object: merge HASH_* and INDEX_* flags
hash-object: stop allowing unknown types
t: add lib-loose.sh
t/helper: add zlib test-tool
oid_object_info(): drop type_name strbuf
fsck: stop using object_info->type_name strbuf
oid_object_info_convert(): stop using string for object type
cat-file: use type enum instead of buffer for -t option
object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
cat-file: make --allow-unknown-type a noop
object-file.h: fix typo in variable declaration
The userdiff pattern for shell scripts has been updated to cope
with more bash-isms.
* md/userdiff-bash-shell-function:
userdiff: extend Bash pattern to cover more shell function forms
The dependency on the_repository variable has been reduced from the
code paths in "git replay".
* en/replay-wo-the-repository:
replay: replace the_repository with repo parameter passed to cmd_replay ()
Teach "git send-email" to also consult `hostname -f` for mail
domain to compute the identity given to SMTP servers.
* ag/send-email-hostname-f:
send-email: try to get fqdn by running hostname -f on Linux and macOS
CI settings at GitLab has been updated to run MSVC based Meson job
automatically (as opposed to be done only upon manual request).
* ps/ci-gitlab-enable-msvc-meson-job:
gitlab-ci: always run MSVC-based Meson job
Two "scalar" subcommands that adds a repository that hasn't been
under "scalar"'s control are taught an option not to enable the
scheduled maintenance on it.
* ds/scalar-no-maintenance:
scalar reconfigure: improve --maintenance docs
scalar reconfigure: add --maintenance=<mode> option
scalar clone: add --no-maintenance option
scalar register: add --no-maintenance option
scalar: customize register_dir()'s behavior
win+Meson CI pipeline, unlike other pipelines for Windows,
used to build artifacts in develper mode, which has been changed to
build them in release mode for consistency.
* js/ci-build-win-in-release-mode:
ci(win+Meson): build in Release mode
Performance regression in not-yet-released code has been corrected.
* ps/reftable-read-block-perffix:
reftable: fix perf regression when reading blocks of unwanted type
Code cleanup.
* jk/oidmap-cleanup:
raw_object_store: drop extra pointer to replace_map
oidmap: add size function
oidmap: rename oidmap_free() to oidmap_clear()
The `send-email` documentation has been updated with OAuth2.0
related examples.
* ag/doc-send-email:
docs: add credential helper for outlook and gmail in OAuth list of helpers
docs: improve send-email documentation
send-mail: improve checks for valid_fqdn
Bundle-URI feature did not use refs recorded in the bundle other
than normal branches as anchoring points to optimize the follow-up
fetch during "git clone"; now it is told to utilize all.
* sc/bundle-uri-use-all-refs-in-bundle:
bundle-uri: add test for bundle-uri clones with tags
bundle-uri: copy all bundle references ino the refs/bundle space
Commit f5e3c6c57d ("meson: do a full usage-based compile check for
sysinfo", 2025-04-25) updated the 'sysinfo()' check, as part of the
meson build, due to the failure of the check on Solaris. Prior to
that commit, the meson build only checked the availability of the
'<sys/sysinfo.h>' header file. On Solaris, both the header and the
'sysinfo()' function exist, but are completely unrelated to the same
function on Linux (and cygwin).
Commit 50dec7c566 ("config.mak.uname: add sysinfo() configuration for
cygwin", 2025-04-17) added a similar 'sysinfo()' check to the autoconf
build. This check looked for the 'sysinfo()' function itself, rather
than just the header, but it will fail (incorrectly set HAVE_SYSINFO)
for the same reason.
In order to correctly identify the 'sysinfo()' function we require as
part of 'git-gc' (used in the 'total_ram() function), we also upgrade
to a compilation check, in a similar way to the meson commit. Note that
since commit c9a51775a3 ("builtin/gc.c: correct RAM calculation when
using sysinfo", 2025-04-17) both the 'totalram' and 'mem_unit' fields
of the 'struct sysinfo' are used, so the new check includes both of
those fields in the compile check.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the non-'runtime prefix' case, the meson build sets the GIT_EXEC_PATH
build variable to an absolute path equivalent to <prefix>/libexec/git-core.
In comparison, the default make build sets it to a relative path equivalent
to 'libexec/git-core'. Indeed, the make build requires the use of some
means outside of the Makefile (eg. config.mak[.*] or the command-line)
to set GIT_EXEC_PATH to anything other than 'libexec/git-core'.
For example, the make invocation:
$ make gitexecdir=/some/other/bin all install
will build git with GIT_EXEC_PATH set to '/some/other/bin' and install
the 'library' executables to that location. However, without setting the
'gitexecdir' make variable, irrespective of the 'runtime prefix' setting,
the GIT_EXEC_PATH is always set to 'libexec/git-core'.
The meson built-in 'libexecdir' option can be used to provide a similar
configurability. The default value for the option is 'libexec'. Attempting
to set the option to '' on the command-line, will reset it to the '.'
string, presumably to ensure a relative path value.
This commit allows the meson build, similar to the above, to configure the
project like:
$ meson setup --buildtype=debugoptimized -Dprefix=$HOME -Dpcre2=disabled \
-Dlibexecdir=/some/other/bin build
so that the GIT_EXEC_PATH is set to '/some/other/bin'. Absent the
-Dlibexecdir argument, the GIT_EXEC_PATH is set to 'libexec/git-core'.
In order to correct the value of GIT_EXEC_PATH, default the value to the
static string value 'libexec/git-core', and only override if the value
of the 'libexecdir' option has a value different to 'libexec' or '.'.
Also, like the Makefile, add a check for an absolute path when the
runtime prefix option is true (and if so, error out).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The path to the system-wide config and attributes files are not being
set correctly in the meson build. Unless explicitly overridden on the
command line during setup, the 'gitconfig' and 'gitattributes' options
are defaulting to absolute paths in the '/etc' system directory. This
is only appropriate if the <prefix> is set specifically to '/usr'.
The directory in which these files are placed is generally referred to
as the 'system configuration directory' or 'sysconfdir' for short. When
the prefix is '/usr' then the sysconfdir is usually set to '/etc', but
any other value for prefix results in the relative directory value 'etc'
instead. (eg if prefix is '/usr/local', then the 'etc' relative value
results in a system configuration directory of '/usr/local/etc'). When
setting the 'sysconfdir' builtin option value, the meson system uses
exactly this algorithm, so we can use get_option('sysconfdir') directly
when setting the (non-overridden) build variables.
In order to allow for overriding from the command line, remove the
default values specified for the 'gitconfig' and 'gitattributes' options
in the 'meson_options.txt' file. This allows the user to specify any
pathname for those options, while being able to test for the unset
(empty) value. An absolute pathname will be used unchanged and a relative
pathname will be appended to '<prefix>/'. These values are then used to
set the 'ETC_GITCONFIG' and 'ETC_GITATTRIBUTES' build variables which are,
in turn, passed to the compiler as '-D' arguments.
When the 'gitconfig' or 'gitattributes' options are not used, then use
the built-in 'sysconfdir' and set the ETC_GITCONFIG build variable to
the string "<sysconfdir>/gitconfig". Similarly, set ETC_ATTRIBUTES to
"<sysconfdir>/gitattributes".
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing an 'meson install' the YAML.pm file is incorrectly
placed in the <prefix>/share/perl5/Git/SVN directory. The YAML.pm
file should be placed in a 'Memoize' subdirectory instead. In order
to correct the location, update the 'install_dir' of the relevant
target in the 'perl/Git/SVN/Memoize/meson.build' file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build configuration options with (non-empty) values, for example
filesystem paths potentially containing spaces, have been set using
the '.set_quoted()' method. However, the GITWEBDIR value has been
set using the '.set()' method instead. In order to correctly quote
the GITWEBDIR value, replace the '.set()' method with '.set_quoted()'.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit. This option allows the merge machinery to, in the
outer layer of the merge:
* exit early when a conflict is detected
* avoid writing (most) merged blobs/trees to the object store
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new mergeability_only option. This
option allows the merge machinery to, in the "outer layer" of the merge:
* exit upon first[-ish] conflict
* avoid (not prevent) writing merged blobs/trees to the object store
I have a number of qualifiers there, so let me explain each:
"outer layer":
Note that since the recursive merge of merge bases (corresponding to
call_depth > 0) can conflict without the outer final merge
(corresponding to call_depth == 0) conflicting, we can't short-circuit
nor avoid writing merged blobs/trees to the object store during those
inner merges.
"first-ish conflict":
The current patch only exits early from process_entries() on the first
conflict it detects, but conflicts could have been detected in a
previous function call, namely detect_and_process_renames(). However:
* conflicts detected by detect_and_process_renames() are quite rare
conflict types
* the detection would still come after regular rename detection
(which is the expensive part of detect_and_process_renames()), so
it is not saving us much in computation time given that
process_entries() directly follows detect_and_process_renames()
* [this overlaps with the next bullet point] process_entries() is the
place where virtually all object writing occurs (object writing is
sometimes more of a concern for Forges than computation time), so
exiting early here isn't saving us much in object writes either
* the code changes needed to handle an earlier exit are slightly
more invasive in detect_and_process_renames() than for
process_entries().
Given the rareness of the even earlier conflicts, the limited savings
we'd get from exiting even earlier, and in an attempt to keep this
patch simpler, we don't guarantee that we actually exit on the first
conflict detected. We can always revisit this decision later if we
decide that a further micro-optimization to exit slightly earlier in
rare cases is worthwhile.
"avoid (not prevent) writing objects":
The detect_and_process_renames() call can also write objects to the
object store, when rename/rename conflicts involve one (or more) files
that have also been modified on both sides. Because of this alternate
call path leading to handle_content_merges(), our "early exit" does not
prevent writing objects entirely, even within the "outer layer"
(i.e. even within call_depth == 0). I figure that's fine though, since
we're already writing objects for the inner merges (i.e. for call_depth
> 0), which are likely going to represent vastly more objects than files
involved in rename/rename+modify/modify cases in the outer merge, on
average.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous three changes contributed performance improvements to 'git
apply', 'git add -p', and 'git reset -p' when using a sparse index. The
improvement to 'git apply' also improved 'git checkout -p'. Add
performance tests to demonstrate this (and to help validate that
performance remains good in the future).
In the truncated test output below, we see that the full checkout
performance changes within noise expectations, but the sparse index
cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for
'git reset -p'. 'git checkout -p' improves immediatley by 91% because it
does not need any change to its builtin.
Test HEAD~4 HEAD~3 HEAD~2 HEAD~1
-------------------------------------------------------------------------------------
2000.118: ... git add -p (full-v3) 0.79 0.79 +0.0% 0.82 +3.8% 0.82 +3.8%
2000.119: ... git add -p (full-v4) 0.74 0.76 +2.7% 0.74 +0.0% 0.76 +2.7%
2000.120: ... git add -p (sparse-v3) 1.94 1.28 -34.0% 0.07 -96.4% 0.07 -96.4%
2000.121: ... git add -p (sparse-v4) 1.93 1.28 -33.7% 0.06 -96.9% 0.06 -96.9%
2000.122: ... git checkout -p (full-v3) 1.18 1.18 +0.0% 1.18 +0.0% 1.19 +0.8%
2000.123: ... git checkout -p (full-v4) 1.10 1.12 +1.8% 1.11 +0.9% 1.11 +0.9%
2000.124: ... git checkout -p (sparse-v3) 1.31 0.11 -91.6% 0.11 -91.6% 0.11 -91.6%
2000.125: ... git checkout -p (sparse-v4) 1.29 0.11 -91.5% 0.11 -91.5% 0.11 -91.5%
2000.126: ... git reset -p (full-v3) 0.81 0.80 -1.2% 0.83 +2.5% 0.83 +2.5%
2000.127: ... git reset -p (full-v4) 0.78 0.77 -1.3% 0.77 -1.3% 0.78 +0.0%
2000.128: ... git reset -p (sparse-v3) 1.58 0.92 -41.8% 0.91 -42.4% 0.07 -95.6%
2000.129: ... git reset -p (sparse-v4) 1.58 0.92 -41.8% 0.92 -41.8% 0.07 -95.6%
It is worth noting that if our test was more involved and had multiple
hunks to evaluate, then the time spent in 'git apply' would dominate due
to multiple index loads and writes. As it stands, we need the sparse
index improvement in 'git add -p' itself to confirm this performance
improvement.
Since the change for 'git add -i' is identical, we avoid a second test
case for that similar operation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.
Move this check earlier within cmd_reset() to improve performance here.
Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.
It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.
Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.
Add test cases that confirm that these interactive add options work with
the sparse index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.
This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.
The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.
The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
- It failed to capture valid functions where `{` was on the next line
due to line continuation (`\`).
- It did not recognize functions with single command body, such as
`x () echo hello`.
Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.
Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.
Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since "hash-object --literally" no longer supports objects with unknown
types, there are now no callers of write_object_file_literally() and its
helpers. Let's drop them to simplify the code.
In particular, this gets rid of some ugly copy-and-paste code from
write_object_file_literally(), which is a parallel implementation of
write_object_file(). When the split was originally made, the two weren't
that long, but commits like 63a6745a07 (object-file: update the loose
object map when writing loose objects, 2023-10-01) ended up having to
duplicate some tricky code.
This patch drops all of that duplication and should make things less
error-prone going forward.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we recently removed the hash_literally() function, the hash-object
--literally option has been simplified to just removing the
INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool,
we can just have the option parser remove the bit from the set of flags
directly. This simplifies the helper functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hash-object command has its own custom flag bits that it sets based
on command-line options. But since we dropped hash_literally() in the
previous commit, the only thing we do with those flag bits is convert
them directly into "index_flags" to pass to index_fd().
This extra layer of indirection makes the code harder to read and reason
about. Let's just use the INDEX_* flags directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).
Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.
We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").
The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).
We'll be able to clean up these code paths a bit more in subsequent
patches.
The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit adds a shell library for writing raw loose objects into the
object database. Normally this is done with hash-object, but the
specific intent here is to allow broken objects that hash-object may not
support.
We'll convert several cases that use "hash-object --literally" to write
objects with invalid types. That works currently, but dropping this
dependency will allow us to remove that feature and simplify the
object-writing code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's occasionally useful when testing or debugging to be able to do raw
zlib inflate/deflate operations (e.g., to check the bytes of a specific
loose or packed object).
Even though zlib's deflate algorithm is used by many other programs,
this is surprisingly hard to do in a portable way. E.g., gzip can do
this if you manually munge some header bytes. But the result is somewhat
arcane, and we don't assume gzip is available anyway. Likewise, pigz
will handle raw zlib, but we can't assume it is available.
So let's introduce a short test helper for just doing zlib operations.
We'll use it in subsequent patches to add some new tests, but it would
also have come in handy a few times in the past:
- The hard-coded pack data from 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) could probably be generated on
the fly.
- Likewise we could avoid the hard-coded data from 0b1493c2d4
(git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT,
2025-02-25). Though note this would require support for more zlib
options.
- It would have helped with the debugging documented in 41dfbb2dbe
(howto: add article on recovering a corrupted object, 2013-10-25).
I'll leave refactoring existing tests for another day, but I hope the
examples above show the general utility.
I aimed for simplicity in the code. In particular, it will read all
input into a memory buffer, rather than streaming. That makes the zlib
loops harder to get wrong (which has been a source of subtle bugs in the
past).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide a mechanism for callers to get the object type as a raw
string, rather than an object_type enum. This was in theory useful for
returning types that are not representable in the enum, but we consider
any such type to be an error, and there are no callers that use the
strbuf anymore.
Let's drop support to simplify the code a bit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck-ing a loose object, we use object_info's type_name strbuf to
record the parsed object type as a string. For most objects this is
redundant with the object_type enum, but it does let us report the
string when we encounter an object with an unknown type (for which there
is no matching enum value).
There are a few downsides, though:
1. The code to report these cases is not actually robust. Since we did
not pass a strbuf to unpack_loose_header(), we only retrieved types
from headers up to 32 bytes. In longer cases, we'd simply say
"object corrupt or missing".
2. This is the last caller that uses object_info's type_name strbuf
support. It would be nice to refactor it so that we can simplify
that code.
3. Likewise, we'll check the hash of the object using its unknown type
(again, as long as that type is short enough). That depends on the
hash_object_file_literally() code, which we'd eventually like to
get rid of.
So we can simplify things by bailing immediately in read_loose_object()
when we encounter an unknown type. This has a few user-visible effects:
a. Instead of producing a single line of error output like this:
error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
we'll now issue two lines (the first from read_loose_object() when
we see the unparsable header, and the second from the fsck code,
since we couldn't read the object):
error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
This is a little more verbose, but this sort of error should be
rare (such objects are almost impossible to work with, and cannot
be transferred between repositories as they are not representable
in packfiles). And as a bonus, reporting the broken header in full
could help with debugging other cases (e.g., a header like "blob
xyzzy\0" would fail in parsing the size, but previously we'd not
have showed the offending bytes).
b. An object with an unknown type will be reported as corrupt, without
actually doing a hash check. Again, I think this is unlikely to
matter in practice since such objects are totally unusable.
We'll update one fsck test to match the new error strings. And we can
remove another test that covered the case of an object with an unknown
type _and_ a hash corruption. Since we'll skip the hash check now in
this case, the test is no longer interesting.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In oid_object_info_convert(), we convert objects between their sha1 and
sha256 variants. To do this, we naturally need to know the type, which
we get from oid_object_info_extended() using its type_name strbuf
option.
But getting the value as a string (versus an object_type enum) is not
helpful. Since we do not allow unknown types, the regular enum is
sufficient. And the resulting code is a bit simpler, as we no longer
have to manage the extra allocation nor convert the string to an enum
ourselves.
Note that at first glance, it might seem like we should retain the error
check for "type == -1" to catch bogus types found by the underlying
parser. But we don't need it, as an unknown type would have yielded an
error from the call to oid_object_info_extended(), which would already
have caused us to return an error.
In fact, I suspect this was always impossible to trigger. Even when we
were converting the string to a type enum ourselves, an invalid type
would never have escaped oid_object_info_extended(), since we never
passed the (now removed) OBJECT_INFO_ALLOW_UNKNOWN_TYPE option.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is
no need to pass a strbuf into oid_object_info_extended() to record the
type. The regular object_type enum is sufficient to capture all of the
types we will allow.
This simplifies the code a bit, and will eventually let us drop
object_info's type_name strbuf support.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cat-file dropped its "--allow-unknown-type" option in the previous
commit, there are no more uses of the internal flag that implemented it.
Let's drop it.
That in turn lets us drop the strbuf parameter of unpack_loose_header(),
which now is always NULL. And without that, we can drop all of the
additional code to inflate larger headers into the strbuf.
Arguably we could drop ULHR_TOO_LONG, as no callers really care about
the distinction from ULHR_BAD. But it's easy enough to retain, and it
does let us produce a slightly more specific message in one instance.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cat-file command has some minor support for handling objects with
"unknown" types. I.e., strings that are not "blob", "commit", "tree", or
"tag".
In theory this could be used for debugging or experimenting with
extensions to Git. But in practice this support is not very useful:
1. You can get the type and size of such objects, but nothing else.
Not even the contents!
2. Only loose objects are supported, since packfiles use numeric ids
for the types, rather than strings.
3. Likewise you cannot ever transfer objects between repositories,
because they cannot be represented in the packfiles used for the
on-the-wire protocol.
The support for these unknown types complicates the object-parsing code,
and has led to bugs such as b748ddb7a4 (unpack_loose_header(): fix
infinite loop on broken zlib input, 2025-02-25). So let's drop it.
The first step is to remove the user-facing parts, which are accessible
only via cat-file. This is technically backwards-incompatible, but given
the limitations listed above, these objects couldn't possibly be useful
in any workflow.
However, we can't just rip out the option entirely. That would hurt a
caller who ran:
git cat-file -t --allow-unknown-object <oid>
and fed it normal, well-formed objects. There --allow-unknown-type was
doing nothing, but we wouldn't want to start bailing with an error. So
to protect any such callers, we'll retain --allow-unknown-type as a
noop.
The code change is fairly small (but we'll able to clean up more code in
follow-on patches). The test updates drop any use of the option. We
still retain tests that feed the broken objects to cat-file without
--allow-unknown-type, as we should continue to confirm that those
objects are rejected. Note that in one spot we can drop a layer of loop,
re-indenting the body; viewing the diff with "-w" helps there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>