"git prune" may try to iterate over .git/objects/pack for trash
files to remove in it, and loudly fail when the directory is
missing, which is not necessary. The command has been taught to
ignore such a failure.
* ew/prune-with-missing-objects-pack:
prune: quiet ENOENT on missing directories
Assorted fixes of parsing end-user input as integers.
* pw/config-int-parse-fixes:
git_parse_signed(): avoid integer overflow
config: require at least one digit when parsing numbers
git_parse_unsigned: reject negative values
`parse_object()` hardening when checking for the existence of a
suspected blob object.
* jk/parse-object-type-mismatch:
parse_object(): simplify blob conditional
parse_object(): check on-disk type of suspected blob
parse_object(): drop extra "has" check before checking object type
"git receive-pack" used to use all the local refs as the boundary for
checking connectivity of the data "git push" sent, but now it uses
only the refs that it advertised to the pusher. In a repository with
the .hideRefs configuration, this reduces the resources needed to
perform the check.
cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com>
cf. <xmqqr0yrizqm.fsf@gitster.g>
* ps/receive-use-only-advertised:
receive-pack: only use visible refs for connectivity check
rev-parse: add `--exclude-hidden=` option
revision: add new parameter to exclude hidden refs
revision: introduce struct to handle exclusions
revision: move together exclusion-related functions
refs: get rid of global list of hidden refs
refs: fix memory leak when parsing hideRefs config
Fix an issue where core.fsmonitor on macOS would not notice created
or modified symbolic links.
* sz/macos-fsmonitor-symlinks:
fsmonitor--daemon: on macOS support symlink
A pair of bugfixes to the Documentation/howto/maintain-git.txt guide.
* tb/howto-maintain-git-fixes:
Documentation: build redo-seen.sh from jch..seen
Documentation: build redo-jch.sh from master..jch
Teach chainlint.pl to show corresponding line numbers when printing
the source of a test.
* es/chainlint-lineno:
chainlint: prefix annotated test definition with line numbers
chainlint: latch line numbers at which each token starts and ends
chainlint: sidestep impoverished macOS "terminfo"
Fix a source of flakiness in CI when compiling with SANITIZE=leak.
* ab/t7610-timeout:
t7610: use "file:///dev/null", not "/dev/null", fixes MinGW
t7610: fix flaky timeout issue, don't clone from example.com
'git maintenance register' is taught to write configuration to an
arbitrary path, and 'git for-each-repo' is taught to expand tilde
characters in paths.
* rp/maintenance-qol:
builtin/gc.c: fix use-after-free in maintenance_unregister()
maintenance --unregister: fix uninit'd data use & -Wdeclaration-after-statement
maintenance: add option to register in a specific config
for-each-repo: interpolate repo path arguments
Correct an error where `git rebase` would mistakenly use a branch or
tag named "refs/rewritten/xyz" when missing a rebase label.
* pw/strict-label-lookups:
sequencer: tighten label lookups
sequencer: unify label lookup
Redact headers from cURL's h2h3 module in GIT_CURL_VERBOSE and
others.
* gc/redact-h2h3-headers:
http: redact curl h2h3 headers in info
t: run t5551 tests with both HTTP and HTTP/2
"make coccicheck" is time consuming. It has been made to run more
incrementally.
* ab/coccicheck-incremental:
Makefile: don't create a ".build/.build/" for cocci, fix output
spatchcache: add a ccache-alike for "spatch"
cocci: run against a generated ALL.cocci
cocci rules: remove <id>'s from rules that don't need them
Makefile: copy contrib/coccinelle/*.cocci to build/
cocci: optimistically use COMPUTE_HEADER_DEPENDENCIES
cocci: make "coccicheck" rule incremental
cocci: split off "--all-includes" from SPATCH_FLAGS
cocci: split off include-less "tests" from SPATCH_FLAGS
Makefile: split off SPATCH_BATCH_SIZE comment from "cocci" heading
Makefile: have "coccicheck" re-run if flags change
Makefile: add ability to TAB-complete cocci *.patch rules
cocci rules: remove unused "F" metavariable from pending rule
Makefile + shared.mak: rename and indent $(QUIET_SPATCH_T)
Teach chainlint.pl to annotate the original test definition instead
of the token stream.
* es/chainlint-output:
chainlint: annotate original test definition rather than token stream
chainlint: latch start/end position of each token
chainlint: tighten accuracy when consuming input stream
chainlint: add explanatory comments
'scalar reconfigure -a' is taught to automatically remove
scalar.repo entires which no longer exist.
* js/remove-stale-scalar-repos:
tests(scalar): tighten the stale `scalar.repo` test some
scalar reconfigure -a: remove stale `scalar.repo` entries
Fix a regression in the bisect-helper which mistakenly treats
arguments to the command given to 'git bisect run' as arguments to
the helper.
* dd/bisect-helper-subcommand:
bisect--helper: parse subcommand with OPT_SUBCOMMAND
bisect--helper: move all subcommands into their own functions
bisect--helper: remove unused options
Preparation to remove git-submodule.sh and replace it with a builtin.
* ab/submodule-helper-prep-only:
submodule--helper: use OPT_SUBCOMMAND() API
submodule--helper: drop "update --prefix <pfx>" for "-C <pfx> update"
submodule--helper: remove --prefix from "absorbgitdirs"
submodule API & "absorbgitdirs": remove "----recursive" option
submodule.c: refactor recursive block out of absorb function
submodule tests: test for a "foreach" blind-spot
submodule--helper: fix a memory leak in "status"
submodule tests: add tests for top-level flag output
submodule--helper: move "config" to a test-tool
Commit 8db2dad7a0 (parse_object(): check on-disk type of suspected blob,
2022-11-17) simplified the conditional for checking if we might have a
blob. But we can simplify it further. In:
!obj || (obj && obj->type == OBJ_BLOB)
the short-circuit "OR" means "obj" will always be true on the right-hand
side. The compiler almost certainly optimized that out anyway, but
dropping it makes the conditional easier to understand for humans.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test case "push with config push.useBitmap" of t5516 was introduced
in commit 82f67ee13f (send-pack.c: add config push.useBitmaps,
2022-06-17). It won't work in verbose mode, e.g.:
$ sh t5516-fetch-push.sh --run='1,115' -v
This is because "git-push" will run in a tty in this case, and the
subcommand "git pack-objects" will contain an argument "--progress"
instead of "-q". Adding a specific option "--quiet" to "git push" will
get a stable result for t5516.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
filter_combine__init() allocates a struct combine_filter_data object and
assigns it to the filter_data member of struct filter_options. Release
it in the complementing filter_combine__free().
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories. Quiet down prune in cases where either
$GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
but emit the system error in other cases to help users diagnose
permissions problems or resource constraints.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid calling 'cache_tree_update()' when doing so would be redundant.
* vd/skip-cache-tree-update:
rebase: use 'skip_cache_tree_update' option
read-tree: use 'skip_cache_tree_update' option
reset: use 'skip_cache_tree_update' option
unpack-trees: add 'skip_cache_tree_update' option
cache-tree: add perf test comparing update and prime
Update the credential-cache documentation to provide a more realistic
example.
* mh/increase-credential-cache-timeout:
Documentation: increase example cache timeout to 1 hour
`git rebase --update-refs` would delete references when all `update-ref`
commands in the sequencer were removed, which has been corrected.
* vd/update-refs-delete:
rebase --update-refs: avoid unintended ref deletion
"git repack" learns to send cruft objects out of the way into
packfiles outside the repository.
* tb/repack-expire-to:
builtin/repack.c: implement `--expire-to` for storing pruned objects
builtin/repack.c: write cruft packs to arbitrary locations
builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`
builtin/repack.c: pass "out" to `prepare_pack_objects`
Makefile comments updates and reordering to clarify knobs used to
choose SHA implementations.
* ab/sha-makefile-doc:
Makefile: discuss SHAttered in *_SHA{1,256} discussion
Makefile: document default SHA-1 backend on OSX
Makefile & test-tool: replace "DC_SHA1" variable with a "define"
Makefile: document SHA-1 and SHA-256 default and selection order
Makefile: document default SHA-256 backend
Makefile: rephrase the discussion of *_SHA1 knobs
Makefile: create and use sections for "define" flag listing
Makefile: correct DC_SHA1 documentation
INSTALL: remove discussion of SHA-1 backends
Makefile: always (re)set DC_SHA1 on fallback
Various test updates.
* ab/misc-hook-submodule-run-command:
run-command tests: test stdout of run_command_parallel()
submodule tests: reset "trace.out" between "grep" invocations
hook tests: fix redirection logic error in 96e7225b31
On my use case involving 771 islands of Linux on kernel.org,
this reduces memory usage by around 25MB. The bulk of that
comes from free_remote_islands, since free_config_regexes only
saves around 40k.
This memory is saved early in the memory-intensive pack process,
making it available for the remainder of the long process.
Signed-off-by: Eric Wong <e@80x24.org>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In parse_object(), we try to handle blobs by streaming rather than
loading them entirely into memory. The most common case here will be
that we haven't seen the object yet and check oid_object_info(), which
tells us we have a blob.
But we trigger this code on one other case: when we have an in-memory
object struct with type OBJ_BLOB (and without its "parsed" flag set,
since otherwise we'd return early from the function). This indicates
that some other part of the code suspected we have a blob (e.g., it was
mentioned by a tree or tag) but we haven't yet looked at the on-disk
copy.
In this case before hitting the streaming path, we check if we have the
object on-disk at all. This is mostly pointless extra work, as the
streaming path would complain if it couldn't open the object (albeit
with the message "hash mismatch", which is a little misleading).
But it's also insufficient to catch all problems. The streaming code
will only tell us "yes, the on-disk object matches the oid". But it
doesn't actually confirm that what we found was indeed a blob, and
neither does repo_has_object_file().
One way to improve this would be to teach stream_object_signature() to
check the type (either by returning it to us to check, or taking an
"expected" type). But there's an even simpler fix here: if we suspect
the object is a blob, just call oid_object_info() to confirm that we
have it on-disk, and that it really is a blob.
This is slightly less efficient than teaching stream_object_signature()
to do it (since it has to open the object already). But this case very
rarely comes up. In practice, we usually don't have any clue what the
type is, in which case we already call oid_object_info(). This
"suspected" case happens only when some other code created an object
struct but didn't actually parse the blob, which is actually tricky to
trigger at all (see the discussion of the test below).
I reworked the conditional a bit so that instead of:
if ((suspected_blob && oid_object_info() == OBJ_BLOB)
(no_clue && oid_object_info() == OBJ_BLOB)
we have the simpler:
if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB)
This is shorter, but also reflects what we really want say, which is
"have we ruled out this being a blob; if not, check it on-disk".
In either case, if oid_object_info() fails to tell us it's a blob, we'll
skip the streaming code path and call repo_read_object_file(), just as
before. And if we really do have a mismatch with the existing object
struct, we'll eventually call lookup_commit(), etc, via
parse_object_buffer(), which will complain that it doesn't match our
existing obj->type.
So this fixes one of the lingering expect_failure cases from 0616617c7e
(t: introduce tests for unexpected object types, 2019-04-09). That test
works by peeling a tag that claims to point to a blob (triggering us to
create the struct), but really points to something else, which we later
discover when we call parse_object() as part of the actual traversal).
Prior to this commit, we'd quietly check the sha1 and mark the blob as
"parsed". Now we correctly complain about the mismatch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When parsing an object of unknown type, we check to see if it's a blob,
so we can use our streaming code path. This uses oid_object_info() to
check the type, but before doing so we call repo_has_object_file(). This
latter is pointless, as oid_object_info() will already fail if the
object is missing. Checking it ahead of time just complicates the code
and is a waste of resources (albeit small).
Let's drop the redundant check.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When serving a push, git-receive-pack(1) needs to verify that the
packfile sent by the client contains all objects that are required by
the updated references. This connectivity check works by marking all
preexisting references as uninteresting and using the new reference tips
as starting point for a graph walk.
Marking all preexisting references as uninteresting can be a problem
when it comes to performance. Git forges tend to do internal bookkeeping
to keep alive sets of objects for internal use or make them easy to find
via certain references. These references are typically hidden away from
the user so that they are neither advertised nor writeable. At GitLab,
we have one particular repository that contains a total of 7 million
references, of which 6.8 million are indeed internal references. With
the current connectivity check we are forced to load all these
references in order to mark them as uninteresting, and this alone takes
around 15 seconds to compute.
We can optimize this by only taking into account the set of visible refs
when marking objects as uninteresting. This means that we may now walk
more objects until we hit any object that is marked as uninteresting.
But it is rather unlikely that clients send objects that make large
parts of objects reachable that have previously only ever been hidden,
whereas the common case is to push incremental changes that build on top
of the visible object graph.
This provides a huge boost to performance in the mentioned repository,
where the vast majority of its refs hidden. Pushing a new commit into
this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
as it is configured in Gitaly leads to a 4.5-fold speedup:
Benchmark 1: main
Time (mean ± σ): 30.977 s ± 0.157 s [User: 30.226 s, System: 1.083 s]
Range (min … max): 30.796 s … 31.071 s 3 runs
Benchmark 2: pks-connectivity-check-hide-refs
Time (mean ± σ): 6.799 s ± 0.063 s [User: 6.803 s, System: 0.354 s]
Range (min … max): 6.729 s … 6.850 s 3 runs
Summary
'pks-connectivity-check-hide-refs' ran
4.56 ± 0.05 times faster than 'main'
As we mostly go through the same codepaths even in the case where there
are no hidden refs at all compared to the code before there is no change
in performance when no refs are hidden:
Benchmark 1: main
Time (mean ± σ): 48.188 s ± 0.432 s [User: 49.326 s, System: 5.009 s]
Range (min … max): 47.706 s … 48.539 s 3 runs
Benchmark 2: pks-connectivity-check-hide-refs
Time (mean ± σ): 48.027 s ± 0.500 s [User: 48.934 s, System: 5.025 s]
Range (min … max): 47.504 s … 48.500 s 3 runs
Summary
'pks-connectivity-check-hide-refs' ran
1.00 ± 0.01 times faster than 'main'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Add a new `--exclude-hidden=` option that is similar to the one we just
added to git-rev-list(1). Given a section name `uploadpack` or `receive`
as argument, it causes us to exclude all references that would be hidden
by the respective `$section.hideRefs` configuration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Users can optionally hide refs from remote users in git-upload-pack(1),
git-receive-pack(1) and others via the `transfer.hideRefs`, but there is
not an easy way to obtain the list of all visible or hidden refs right
now. We'll require just that though for a performance improvement in our
connectivity check.
Add a new option `--exclude-hidden=` that excludes any hidden refs from
the next pseudo-ref like `--all` or `--branches`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The functions that handle exclusion of refs work on a single string
list. We're about to add a second mechanism for excluding refs though,
and it makes sense to reuse much of the same architecture for both kinds
of exclusion.
Introduce a new `struct ref_exclusions` that encapsulates all the logic
related to excluding refs and move the `struct string_list` that holds
all wildmatch patterns of excluded refs into it. Rename functions that
operate on this struct to match its name.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Move together the definitions of functions that handle exclusions of
refs so that related functionality sits in a single place, only.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We're about to add a new argument to git-rev-list(1) that allows it to
add all references that are visible when taking `transfer.hideRefs` et
al into account. This will require us to potentially parse multiple sets
of hidden refs, which is not easily possible right now as there is only
a single, global instance of the list of parsed hidden refs.
Refactor `parse_hide_refs_config()` and `ref_is_hidden()` so that both
take the list of hidden references as input and adjust callers to keep a
local list, instead. This allows us to easily use multiple hidden-ref
lists. Furthermore, it allows us to properly free this list before we
exit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When parsing the hideRefs configuration, we first duplicate the config
value so that we can modify it. We then subsequently append it to the
`hide_refs` string list, which is initialized with `strdup_strings`
enabled. As a consequence we again reallocate the string, but never
free the first duplicate and thus have a memory leak.
While we never clean up the static `hide_refs` variable anyway, this is
no excuse to make the leak worse by leaking every value twice. We are
also about to change the way this variable will be handled so that we do
indeed start to clean it up. So let's fix the memory leak by using the
`string_list_append_nodup()` so that we pass ownership of the allocated
string to `hide_refs`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When `git notes` prepares the template it adds an empty newline between
the comment header and the content:
>
> #
> # Write/edit the notes for the following object:
>
> # commit 0f3c55d4c2b7864bffb2d92278eff08d0b2e083f
> # etc
This is wrong structurally because that newline is part of the comment,
too, and thus should be commented. Also, it throws off some positioning
strategies of editors and plugins, and it differs from how we do commit
templates.
Change this to follow the standard set by `git commit`:
>
> #
> # Write/edit the notes for the following object:
> #
> # commit 0f3c55d4c2b7864bffb2d92278eff08d0b2e083f
>
Tests pass unchanged after this code change.
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
On MinGW the "/dev/null" is translated to "nul" on command-lines, even
though as in this case it'll never end up referring to an actual file.
So on Windows the fix for the previous "example.com" timeout issue in
8354cf752e (t7610: fix flaky timeout issue, don't clone from
example.com, 2022-11-05) would yield:
fatal: repo URL: 'nul' must be absolute or begin with ./|../
Let's evade this yet again by prefixing this with "file://", which
makes this pass in the Windows CI.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>