commit b0a5a12a60 ("unpack-trees: allow check_updates() to work on a
different index", 2020-03-27) allowed check_updates() to work on a
different index, but it called get_progress() which was hardcoded to
work on o->result much like check_updates() had been. Update it to also
accept an index parameter and have check_updates() pass that parameter
along so that both are working on the same index.
Noticed-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to skip unmerged paths in the index when sparse checkout
is in use would have made out-of-bound access of the in-core index
when the last path was unmerged, which has been corrected.
* ds/sparse-updates-oob-access-fix:
unpack-trees: avoid array out-of-bounds error
The sparse-checkout patterns have been forbidden from excluding all
paths, leaving an empty working tree, for a long time. This
limitation has been lifted.
* ds/sparse-allow-empty-working-tree:
sparse-checkout: stop blocking empty workdirs
The loop in warn_conflicted_path() that checks for the count of
entries with the same path uses "i+count" for the array
entry. However, the loop only verifies that the value of count is
below the array size. Fix this by adding i to the condition.
I hit this condition during a test of the in-tree sparse-checkout
feature, so it is exercised by the end of the series.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
[jc: readability fix]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the error condition when updating the sparse-checkout leaves
an empty working directory.
This behavior was added in 9e1afb167 (sparse checkout: inhibit empty
worktree, 2009-08-20). The comment was added in a7bc906f2 (Add
explanation why we do not allow to sparse checkout to empty working
tree, 2011-09-22) in response to a "dubious" comment in 84563a624
(unpack-trees.c: cosmetic fix, 2010-12-22).
With the recent "cone mode" and "git sparse-checkout init [--cone]"
command, it is common to set a reasonable sparse-checkout pattern
set of
/*
!/*/
which matches only files at root. If the repository has no such files,
then their "git sparse-checkout init" command will fail.
Now that we expect this to be a common pattern, we should not have the
commands fail on an empty working directory. If it is a confusing
result, then the user can recover with "git sparse-checkout disable"
or "git sparse-checkout set". This is especially simple when using cone
mode.
Reported-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"sparse-checkout" UI improvements.
* en/sparse-checkout:
sparse-checkout: provide a new reapply subcommand
unpack-trees: failure to set SKIP_WORKTREE bits always just a warning
unpack-trees: provide warnings on sparse updates for unmerged paths too
unpack-trees: make sparse path messages sound like warnings
unpack-trees: split display_error_msgs() into two
unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*
unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier
sparse-checkout: use improved unpack_trees porcelain messages
sparse-checkout: use new update_sparsity() function
unpack-trees: add a new update_sparsity() function
unpack-trees: pull sparse-checkout pattern reading into a new function
unpack-trees: do not mark a dirty path with SKIP_WORKTREE
unpack-trees: allow check_updates() to work on a different index
t1091: make some tests a little more defensive against failures
unpack-trees: simplify pattern_list freeing
unpack-trees: simplify verify_absent_sparse()
unpack-trees: remove unused error type
unpack-trees: fix minor typo in comment
"git diff" in a partial clone learned to avoid lazy loading blob
objects in more casese when they are not needed.
* jt/avoid-prefetch-when-able-in-diff:
diff: restrict when prefetching occurs
diff: refactor object read
diff: make diff_populate_filespec_options struct
promisor-remote: accept 0 as oid_nr in function
There are 3 callers to promisor_remote_get_direct() that first check if
the number of objects to be fetched is equal to 0. Fold that check into
promisor_remote_get_direct(), and in doing so, be explicit as to what
promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success,
immediately).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting and clearing of the SKIP_WORKTREE bit is not only done when
users run 'sparse-checkout'; other commands such as 'checkout' also run
through unpack_trees() which has logic for handling this special bit.
As such, we need to consider how they handle special cases. A couple
comparison points should help explain the rationale for changing how
unpack_trees() handles these bits:
Ignoring sparse checkouts for a moment, if you are switching
branches and have dirty changes, it is only considered an error that
will prevent the branch switching from being successful if the dirty
file happens to be one of the paths with different contents.
SKIP_WORKTREE has always been considered advisory; for example, if
rebase or merge need or even want to materialize a path as part of
their work, they have always been allowed to do so regardless of the
SKIP_WORKTREE setting. This has been used for unmerged paths, but
it was often used for paths it wasn't needed just because it made
the code simpler. It was a best-effort consideration, and when it
materialized paths contrary to the SKIP_WORKTREE setting, it was
never required to even print a warning message.
In the past if you trying to run e.g. 'git checkout' and:
1) you had a path that was materialized and had some dirty changes
2) the path was listed in $GITDIR/info/sparse-checkout
3) this path did not different between the current and target branches
then despite the comparison points above, the inability to set
SKIP_WORKTREE was treated as a *hard* error that would abort the
checkout operation. This is completely inconsistent with how
SKIP_WORKTREE is handled elsewhere, and rather annoying for users as
leaving the paths materialized in the working copy (with a simple
warning) should present no problem at all.
Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a
warning and allow the operations to continue.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When sparse-checkout runs to update the list of sparsity patterns, it
gives warnings if it can't remove paths from the working tree because
those files have dirty changes. Add a similar warning for unmerged
paths as well.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The messages for problems with sparse paths are phrased as errors that
cause the operation to abort, even though we are not making the
operation abort. Reword the messages to make sense in their new
context.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
display_error_msgs() is never called to show messages of both ERROR_*
and WARNING_* types at the same time; it is instead called multiple
times, separately for each type. Since we want to display these types
differently, make two slightly different versions of this function.
A subsequent commit will further modify unpack_trees() and how it calls
the new display_warning_msgs().
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to treat issues with setting the SKIP_WORKTREE bit as a warning
rather than an error; rename the enum values to reflect this intent as
a simple step towards that goal.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A minor change, but we want to convert the sparse messages to warnings
and this allows us to group warnings and errors.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, the only way to update the SKIP_WORKTREE bits for various
paths was invoking `git read-tree -mu HEAD` or calling the same code
that this codepath invoked. This however had a number of problems if
the index or working directory were not clean. First, let's consider
the case:
Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files)
If the working tree was clean this was fine, but if there were files or
directories or symlinks or whatever already present at the given path
then the operation would abort with an error. Let's label this case
for later discussion:
A) There is an untracked path in the way
Now let's consider the opposite case:
Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files)
If the index and working tree was clean this was fine, but if there were
any unclean paths we would run into problems. There are three different
cases to consider:
B) The path is unmerged
C) The path has unstaged changes
D) The path has staged changes (differs from HEAD)
If any path fell into case B or C, then the whole operation would be
aborted with an error. With sparse-checkout, the whole operation would
be aborted for case D as well, but for its predecessor of using `git
read-tree -mu HEAD` directly, any paths that fell into case D would be
removed from the working copy and the index entry for that path would be
reset to match HEAD -- which looks and feels like data loss to users
(only a few are even aware to ask whether it can be recovered, and even
then it requires walking through loose objects trying to match up the
right ones).
Refusing to remove files that have unsaved user changes is good, but
refusing to work on any other paths is very problematic for users. If
the user is in the middle of a rebase or has made modifications to files
that bring in more dependencies, then for their build to work they need
to update the sparse paths. This logic has been preventing them from
doing so. Sometimes in response, the user will stage the files and
re-try, to no avail with sparse-checkout or to the horror of losing
their changes if they are using its predecessor of `git read-tree -mu
HEAD`.
Add a new update_sparsity() function which will not error out in any of
these cases but behaves as follows for the special cases:
A) Leave the file in the working copy alone, clear the SKIP_WORKTREE
bit, and print a warning (thus leaving the path in a state where
status will report the file as modified, which seems logical).
B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged.
C) Do NOT mark this path as SKIP_WORKTREE and print a warning about
the dirty path.
D) Mark the path as SKIP_WORKTREE, but do not revert the version
stored in the index to match HEAD; leave the contents alone.
I tried a different behavior for A (leave the SKIP_WORKTREE bit set),
but found it very surprising and counter-intuitive (e.g. the user sees
it is present along with all the other files in that directory, tries to
stage it, but git add ignores it since the SKIP_WORKTREE bit is set). A
& C seem like optimal behavior to me. B may be as well, though I wonder
if printing a warning would be an improvement. Some might be slightly
surprised by D at first, but given that it does the right thing with
`git commit` and even `git commit -a` (`git add` ignores entries that
are marked SKIP_WORKTREE and thus doesn't delete them, and `commit -a`
is similar), it seems logical to me.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a populate_from_existing_patterns() function for reading the
path_patterns from $GIT_DIR/info/sparse-checkout so that we can re-use
it elsewhere.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a path is dirty, removing from the working tree risks losing data.
As such, we want to make sure any such path is not marked with
SKIP_WORKTREE. While the current callers of this code detect this case
and re-populate with a previous set of sparsity patterns, we want to
allow some paths to be marked with SKIP_WORKTREE while others are left
unmarked without it being considered an error. The reason this
shouldn't be considered an error is that SKIP_WORKTREE has always been
an advisory-only setting; merge and rebase for example were free to
materialize paths and clear the SKIP_WORKTREE bit in order to accomplish
their work even though they kept the SKIP_WORKTREE bit set for other
paths. Leaving dirty working files in the working tree is thus a
natural extension of what we have already been doing.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
check_updates() previously assumed it was working on o->result. We want
to use this function in combination with a different index_state, so
take the intended index_state as a parameter.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit e091228e17 ("sparse-checkout: update working directory
in-process", 2019-11-21) allowed passing a pre-defined set of patterns
to unpack_trees(). However, if o->pl was NULL, it would still read the
existing patterns and use those. If those patterns were read into a
data structure that was allocated, naturally they needed to be free'd.
However, despite the same function being responsible for knowing about
both the allocation and the free'ing, the logic for tracking whether to
free the pattern_list was hoisted to an outer function with an
additional flag in unpack_trees_options. Put the logic back in the
relevant function and discard the now unnecessary flag.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
verify_absent_sparse() was introduced in commit 08402b0409
("merge-recursive: distinguish "removed" and "overwritten" messages",
2010-08-11), and has always had exactly one caller which always passes
error_type == ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN. This function
then checks whether error_type is this value, and if so, sets it instead
to ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN. It has been nearly a decade
and no other caller has been created, and no other value has ever been
passed, so just pass the expected value to begin with.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 08402b0409 ("merge-recursive: distinguish "removed" and
"overwritten" messages", 2010-08-11) split
ERROR_WOULD_LOSE_UNTRACKED
into both
ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN
ERROR_WOULD_LOSE_UNTRACKED_REMOVED
and also split
ERROR_WOULD_LOSE_ORPHANED
into both
ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN
ERROR_WOULD_LOSE_ORPHANED_REMOVED
However, despite the split only three of these four types were used.
ERROR_WOULD_LOSE_ORPHANED_REMOVED was not put into use when it was
introduced and nothing else has used it in the intervening decade
either. Remove it.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide more information (e.g. the object of the tree-ish in which
the blob being converted appears, in addition to its path, which
has already been given) to smudge/clean conversion filters.
* bc/filter-process:
t0021: test filter metadata for additional cases
builtin/reset: compute checkout metadata for reset
builtin/rebase: compute checkout metadata for rebases
builtin/clone: compute checkout metadata for clones
builtin/checkout: compute checkout metadata for checkouts
convert: provide additional metadata to filters
convert: permit passing additional metadata to filter processes
builtin/checkout: pass branch info down to checkout_worktree
Fix "git checkout --recurse-submodules" of a nested submodule
hierarchy.
* pb/recurse-submodules-fix:
t/lib-submodule-update: add test removing nested submodules
unpack-trees: check for missing submodule directory in merged_entry
unpack-trees: remove outdated description for verify_clean_submodule
t/lib-submodule-update: move a test to the right section
t/lib-submodule-update: remove outdated test description
t7112: remove mention of KNOWN_FAILURE_SUBMODULE_RECURSIVE_NESTED
Provide commit metadata for checkout code paths that use unpack_trees
and friends. When we're checking out a commit, use the commit
information, but don't provide commit information if we're checking out
from the index, since there need not be any particular commit associated
with the index, and even if there is one, we can't know what it is.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using `git checkout --recurse-submodules` to switch between a
branch with no submodules and a branch with initialized nested
submodules currently causes a fatal error:
$ git checkout --recurse-submodules branch-with-nested-submodules
fatal: exec '--super-prefix=submodule/nested/': cd to 'nested'
failed: No such file or directory
error: Submodule 'nested' could not be updated.
error: Submodule 'submodule/nested' cannot checkout new HEAD.
error: Submodule 'submodule' could not be updated.
M submodule
Switched to branch 'branch-with-nested-submodules'
The checkout succeeds but the worktree and index of the first level
submodule are left empty:
$ cd submodule
$ git -c status.submoduleSummary=1 status
HEAD detached at b3ce885
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
deleted: .gitmodules
deleted: first.t
deleted: nested
fatal: not a git repository: 'nested/.git'
Submodule changes to be committed:
* nested 1e96f59...0000000:
$ git ls-files -s
$ # empty
$ ls -A
.git
The reason for the fatal error during the checkout is that a child git
process tries to cd into the yet unexisting nested submodule directory.
The sequence is the following:
1. The main git process (the one running in the superproject) eventually
reaches write_entry() in entry.c, which creates the first level
submodule directory and then calls submodule_move_head() in submodule.c,
which spawns `git read-tree` in the submodule directory.
2. The first child git process (the one in the submodule of the
superproject) eventually calls check_submodule_move_head() at
unpack_trees.c:2021, which calls submodule_move_head in dry-run mode,
which spawns `git read-tree` in the nested submodule directory.
3. The second child git process tries to chdir() in the yet unexisting
nested submodule directory in start_command() at run-command.c:829 and
dies before exec'ing.
The reason why check_submodule_move_head() is reached in the first child
and not in the main process is that it is inside an
if(submodule_from_ce()) construct, and submodule_from_ce() returns a
valid struct submodule pointer, whereas it returns a null pointer in the
main git process.
The reason why submodule_from_ce() returns a null pointer in the main
git process is because the call to cache_lookup_path() in config_from()
(called from submodule_from_path() in submodule_from_ce()) returns a
null pointer since the hashmap "for_path" in the submodule_cache of
the_repository is not yet populated. It is not populated because both
repo_get_oid(repo, GITMODULES_INDEX, &oid) and repo_get_oid(repo,
GITMODULES_HEAD, &oid) in config_from_gitmodules() at
submodule-config.c:639-640 return -1, as at this stage of the operation,
neither the HEAD of the superproject nor its index contain any
.gitmodules file.
In contrast, in the first child the hashmap is populated because
repo_get_oid(repo, GITMODULES_HEAD, &oid) returns 0 as the HEAD of the
first level submodule, i.e. .git/modules/submodule/HEAD, points to a
commit where .gitmodules is present and records 'nested' as a submodule.
Fix this bug by checking that the submodule directory exists before
calling check_submodule_move_head() in merged_entry() in the `if(!old)`
branch, i.e. if going from a commit with no submodule to a commit with a
submodule present.
Also protect the other call to check_submodule_move_head() in
merged_entry() the same way as it is safer, even though the `else if
(!(old->ce_flags & CE_CONFLICTED))` branch of the code is not at play in
the present bug.
The other calls to check_submodule_move_head() in other functions in
unpack_trees.c are all already protected by calls to lstat() somewhere
in
the program flow so we don't need additional protection for them.
All commands in the unpack_trees machinery are affected, i.e. checkout,
reset and read-tree when called with the --recurse-submodules flag.
This bug was first reported in [1].
[1]
https://lore.kernel.org/git/7437BB59-4605-48EC-B05E-E2BDB2D9DABC@gmail.com/
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Reported-by: Damien Robert <damien.olivier.robert@gmail.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function verify_clean_submodule() learned to verify if a submodule
working tree is clean in a7bc845a9a (unpack-trees: check if we can
perform the operation for submodules, 2017-03-14), but the commented
description above it was not updated to reflect that, such that this
description has been outdated since then.
Since Git has now learned to optionnally recursively check out
submodules during a superproject checkout, remove this outdated
description.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some rough edges in the sparse-checkout feature, especially around
the cone mode, have been cleaned up.
* ds/sparse-checkout-harden:
sparse-checkout: fix cone mode behavior mismatch
sparse-checkout: improve docs around 'set' in cone mode
sparse-checkout: escape all glob characters on write
sparse-checkout: use C-style quotes in 'list' subcommand
sparse-checkout: unquote C-style strings over --stdin
sparse-checkout: write escaped patterns in cone mode
sparse-checkout: properly match escaped characters
sparse-checkout: warn on globs in cone patterns
sparse-checkout: detect short patterns
sparse-checkout: cone mode does not recognize "**"
sparse-checkout: fix documentation typo for core.sparseCheckoutCone
clone: fix --sparse option with URLs
sparse-checkout: create leading directories
t1091: improve here-docs
t1091: use check_files to reduce boilerplate
Traditionally, we avoided threaded grep while searching in objects
(as opposed to files in the working tree) as accesses to the object
layer is not thread-safe. This limitation is getting lifted.
* mt/threaded-grep-in-object-store:
grep: use no. of cores as the default no. of threads
grep: move driver pre-load out of critical section
grep: re-enable threads in non-worktree case
grep: protect packed_git [re-]initialization
grep: allow submodule functions to run in parallel
submodule-config: add skip_if_read option to repo_read_gitmodules()
grep: replace grep_read_mutex by internal obj read lock
object-store: allow threaded access to object reading
replace-object: make replace operations thread-safe
grep: fix racy calls in grep_objects()
grep: fix race conditions at grep_submodule()
grep: fix race conditions on userdiff calls
The intention of the special "cone mode" in the sparse-checkout
feature is to always match the same patterns that are matched by the
same sparse-checkout file as when cone mode is disabled.
When a file path is given to "git sparse-checkout set" in cone mode,
then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the
MATCHED_RECURSIVE response, and hence these were left out of the
matched cone.
Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED
and add a test that prevents regression.
Reported-by: Finn Bryant <finnbryant@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code recently added in this release to move to the entry beyond
the ones in the same directory in the index in the sparse-cone mode
did not count the number of entries to skip over incorrectly, which
has been corrected.
* ds/sparse-cone:
.mailmap: fix GGG authoship screwup
unpack-trees: correctly compute result count
The Undefined Behavior Sanitizer in clang-11 seems to have learned a new
trick: it complains about computing offsets from a NULL pointer, even if
that offset is 0. This causes numerous test failures. For example, from
t1090:
unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer
...
not ok 6 - in partial clone, sparse checkout only fetches needed blobs
The code in question looks like this:
struct cache_entry **cache_end = cache + nr;
...
while (cache != cache_end)
and we sometimes pass in a NULL and 0 for "cache" and "nr". This is
conceptually fine, as "cache_end" would be equal to "cache" in this
case, and we wouldn't enter the loop at all. But computing even a zero
offset violates the C standard. And given the fact that UBSan is
noticing this behavior, this might be a potential problem spot if the
compiler starts making unexpected assumptions based on undefined
behavior.
So let's just avoid it, which is pretty easy. In some cases we can just
switch to iterating with a numeric index (as we do in sequencer.c here).
In other cases (like the cache_end one) the use of an end pointer is
more natural; we can keep that by just explicitly checking for the
NULL/0 case when assigning the end pointer.
Note that there are two ways you can write this latter case, checking
for the pointer:
cache_end = cache ? cache + nr : cache;
or the size:
cache_end = nr ? cache + nr : cache;
For the case of a NULL/0 ptr/len combo, they are equivalent. But writing
it the second way (as this patch does) has the property that if somebody
were to incorrectly pass a NULL pointer with a non-zero length, we'd
continue to notice and segfault, rather than silently pretending the
length was zero.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that tries to skip over the entries for the paths in a
single directory using the cache-tree was not careful enough
against corrupt index file.
* es/unpack-trees-oob-fix:
unpack-trees: watch for out-of-range index position
Currently, submodule-config.c doesn't have an externally accessible
function to read gitmodules only if it wasn't already read. But this
exact behavior is internally implemented by gitmodules_read_check(), to
perform a lazy load. Let's merge this function with
repo_read_gitmodules() adding a 'skip_if_read' which allows both
internal and external callers to access this functionality. This
simplifies a little the code. The added option will also be used in
the following patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The clear_ce_flags_dir() method processes the cache entries within
a common directory. The returned int is the number of cache entries
processed by that directory. When using the sparse-checkout feature
in cone mode, we can skip the pattern matching for entries in the
directories that are entirely included or entirely excluded.
eb42feca (unpack-trees: hash less in cone mode, 2019-11-21)
introduced this performance feature. The old mechanism relied on
the counts returned by calling clear_ce_flags_1(), but the new
mechanism calculated the number of rows by subtracting "cache_end"
from "cache" to find the size of the range. However, the equation
is wrong because it divides by sizeof(struct cache_entry *). This
is not how pointer arithmetic works!
A coverity build of Git for Windows in preparation for the 2.25.0
release found this issue with the warning, "Pointer differences,
such as cache_end - cache, are automatically scaled down by the
size (8 bytes) of the pointed-to type (struct cache_entry *).
Most likely, the division by sizeof(struct cache_entry *) is
extraneous and should be eliminated." This warning is correct.
This leaves us with the question "how did this even work?" The
problem that occurs with this incorrect pointer arithmetic is
a performance-only bug, and a very slight one at that. Since
the entry count returned by clear_ce_flags_dir() is reduced by
a factor of 8, the loop in clear_ce_flags_1() will re-process
entries from those directories.
By inserting global counters into unpack-tree.c and tracing
them with trace2_data_intmax() (in a private change, for
testing), I was able to see count how many times the loop inside
clear_ce_flags_1() processed an entry and how many times
clear_ce_flags_dir() was called. Each of these are reduced by at
least a factor of 8 with the current change. A factor larger
than 8 happens when multiple levels of directories are repeated.
Specifically, in the Linux kernel repo, the command
git sparse-checkout set LICENSES
restricts the working directory to only the files at root and
in the LICENSES directory. Here are the measured counts:
clear_ce_flags_1 loop blocks:
Before: 11,520
After: 1,621
clear_ce_flags_dir calls:
Before: 7,048
After: 606
While these are dramatic counts, the time spent in
clear_ce_flags_1() is under one millisecond in each case, so
the improvement is not measurable as an end-to-end time.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's possible in a case where the index file contains a tree extension
but no blobs within that tree exist for index_pos_by_traverse_info() to
segfault. If the name_entry passed into index_pos_by_traverse_info() has
no blobs inside, AND is alphabetically later than all blobs currently in
the index file, index_pos_by_traverse_info() will segfault. For example,
an index file which looks something like this:
aaa#0
bbb/aaa#0
[Extensions]
TREE: zzz
In this example, 'index_name_pos(..., "zzz/", ...)' will return '-4',
indicating that "zzz/" could be inserted at position 3. However, when
the checks which ensure that the insertion position of "zzz/" look for a
blob at that position beginning with "zzz/", the index cache is accessed
out of range, causing a segfault.
This kind of index state is not typically generated during user
operations, and is in fact an edge case of the state being checked for
in the conditional where it was added. However, since the entry for the
BUG() line is ambiguous, tell some additional context to help Git
developers debug the failure later. When we know the name of the dir we
were trying to look up, it becomes possible to examine the index file
in a hex util to determine what went wrong; the position gives a hint
about where to start looking.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
check_updates() has a lot of code that repeatedly checks whether
o->update or o->dry_run are set. (Note that o->dry_run is a
near-synonym for !o->update, but not quite as per commit 2c9078d05b
("unpack-trees: add the dry_run flag to unpack_trees_options",
2011-05-25).) In fact, this function almost turns into a no-op whenever
the condition
!o->update || o->dry_run
is met. Simplify the code by checking this condition at the beginning
of the function, and when it is true, do the few things that are
relevant and return early.
There are a few things that make the conversion not quite obvious:
* The fact that check_updates() does not actually turn into a no-op
when updates are not wanted may be slightly surprising. However,
commit 33ecf7eb61 (Discard "deleted" cache entries after using them
to update the working tree, 2008-02-07) put the discarding of
unused cache entries in check_updates() so we still need to keep
the call to remove_marked_cache_entries(). It's possible this
call belongs in another function, but it is certainly needed as
tests will fail if it is removed.
* The original called remove_scheduled_dirs() unconditionally.
Technically, commit 7847892716 (unlink_entry(): introduce
schedule_dir_for_removal(), 2009-02-09) should have made that call
conditional, but it didn't matter in practice because
remove_scheduled_dirs() becomes a no-op when all the calls to
unlink_entry() are skipped. As such, we do not need to call it.
* When (o->dry_run && o->update), the original would have two calls
to git_attr_set_direction() surrounding a bunch of skipped updates.
These two calls to git_attr_set_direction() cancel each other out
and thus can be omitted when o->dry_run is true just as they
already are when !o->update.
* The code would previously call setup_collided_checkout_detection()
and report_collided_checkout() even when o->dry_run. However, this
was just an expensive no-op because
setup_collided_checkout_detection() merely cleared the CE_MATCHED
flag for each cache entry, and report_collided_checkout() reported
which ones had it set. Since a dry-run would skip all the
checkout_entry() calls, CE_MATCHED would never get set and thus
no collisions would be reported. Since we can't detect the
collisions anyway without doing updates, skipping the collisions
detection setup and reporting is an optimization.
* The code previously would call get_progress() and
display_progress() even when (!o->update || o->dry_run). This
served to show how long it took to skip all the updates, which is
somewhat useless. Since we are skipping the updates, we can skip
showing how long it takes to skip them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.
* ds/sparse-cone: (21 commits)
sparse-checkout: improve OS ls compatibility
sparse-checkout: respect core.ignoreCase in cone mode
sparse-checkout: check for dirty status
sparse-checkout: update working directory in-process for 'init'
sparse-checkout: cone mode should not interact with .gitignore
sparse-checkout: write using lockfile
sparse-checkout: use in-process update for disable subcommand
sparse-checkout: update working directory in-process
sparse-checkout: sanitize for nested folders
unpack-trees: add progress to clear_ce_flags()
unpack-trees: hash less in cone mode
sparse-checkout: init and set in cone mode
sparse-checkout: use hashmaps for cone patterns
sparse-checkout: add 'cone' mode
trace2: add region in clear_ce_flags
sparse-checkout: create 'disable' subcommand
sparse-checkout: add '--stdin' option to set subcommand
sparse-checkout: 'set' subcommand
clone: add --sparse mode
sparse-checkout: create 'init' subcommand
...
* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...