These are all the commits from the sparse-checkout feature that have made it to `next` as of this week.
Taking them early provides the following benefits:
1. We have some extra hardening for strange directory names.
2. It fixes the `git clone --sparse` bug. (This unfortunately did not make v2.25.1)
3. It adds the `git sparse-checkout add` subcommand.
4. It allows Windows-style paths on Windows filesystems (`git sparse-checkout add dir1\dir2`)
There is one merge commit that resolves the conflict from reverting the "needs a clean status" commit. The rest will be included in `v2.26.0` so will not be part of that rebase.
When using Windows, a user may run 'git sparse-checkout set A\B\C'
to add the Unix-style path A/B/C to their sparse-checkout patterns.
Normalizing the input path converts the backslashes to slashes before we
add the string 'A/B/C' to the recursive hashset.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the sparse-checkout feature, a user may want to incrementally
grow their sparse-checkout pattern set. Allow adding patterns using a
new 'add' subcommand. This is not much different from the 'set'
subcommand, because we still want to allow the '--stdin' option and
interpret inputs as directories when in cone mode and patterns
otherwise.
When in cone mode, we are growing the cone. This may actually reduce the
set of patterns when adding directory A when A/B is already a directory
in the cone. Test the different cases: siblings, parents, ancestors.
When not in cone mode, we can only assume the patterns should be
appended to the sparse-checkout file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of adding "add" and "remove" subcommands to the
sparse-checkout builtin, extract a modify_pattern_list() method from the
sparse_checkout_set() method. This command will read input from the
command-line or stdin to construct a set of patterns, then modify the
existing sparse-checkout patterns after a successful update of the
working directory.
Currently, the only way to modify the patterns is to replace all of the
patterns. This will be extended in a later update.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of extending the sparse-checkout builtin with "add"
and "remove" subcommands, extract the code that fills a pattern list
based on the input values. The input changes depending on the
presence of "--stdin" or the value of core.sparseCheckoutCone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent update in the Linux VM images used by Azure Pipelines surfaced
a new problem in the "Documentation" job. Apparently, this warning
appears 396 times on `stderr` when running `make doc`:
/usr/lib/ruby/vendor_ruby/rubygems/defaults/operating_system.rb:10: warning: constant Gem::ConfigMap is deprecated
This problem was already reported to the `rubygems` project via
https://github.com/rubygems/rubygems/issues/3068.
As there is nothing Git can do about this warning, and as the
"Documentation" job reports this warning as a failure, let's just
silence it and move on.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
If a user runs git update-git-for-windows, then they will upgrade to a
version that does not support microsoft/vfsforgit or microsoft/scalar.
Therefore, let's prevent this.
This addresses https://github.com/microsoft/git/issues/241
The `gvfs-helper` is supposed to avoid calling `git fetch-pack` by downloading objects through the GVFS protocol instead. For some reason, some `git fetch` calls still end up calling `git fetch-pack` which gets a complaint from the remote because it does not support that kind of fetch.
Put a hard stop in the `fetch_git()` method to prevent this process run.
When using the GVFS protocol, we should _never_ call "git fetch-pack"
to attempt downloading a pack-file via the regular Git protocol. It
appears that the mechanism that prevented this in the VFS for Git
world is due to the read-object hook populating the commits at the
new ref tips in a different way than the gvfs-helper does.
By acting as if the fetch-pack succeeds here in remote-curl, we
prevent a failed fetch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The intention of the special "cone mode" in the sparse-checkout
feature is to always match the same patterns that are matched by the
same sparse-checkout file as when cone mode is disabled.
When a file path is given to "git sparse-checkout set" in cone mode,
then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the
MATCHED_RECURSIVE response, and hence these were left out of the
matched cone.
Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED
and add a test that prevents regression.
Reported-by: Finn Bryant <finnbryant@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing documentation does not clarify how the 'set' subcommand
changes when core.sparseCheckoutCone is enabled. Correct this by
changing some language around the "A/B/C" example. Also include a
description of the input format matching the output of 'git ls-tree
--name-only'.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-checkout patterns allow special globs according to
fnmatch(3). When writing cone-mode patterns for paths containing
these characters, they must be escaped.
Use is_glob_special() to check which characters must be escaped
this way, and add a path to the tests that contains all glob
characters at once. Note that ']' is not special, since the
initial bracket '[' is escaped.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When in cone mode, the 'git sparse-checkout list' subcommand lists
the directories included in the sparse cone. When these directories
contain odd characters, such as a backslash, then we need to use
C-style quotes similar to 'git ls-tree'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.
Even more specifically, the goal is to always allow the following from
the root of a repo:
git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.
Use unquote_c_style() when parsing lines from stdin. Command-line
arguments will be parsed as-is, assuming the user can do the correct
level of escaping from their environment to match the exact directory
names.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.
However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cone mode, the sparse-checkout feature uses hashset containment
queries to match paths. Make this algorithm respect escaped asterisk
(*) and backslash (\) characters.
Create dup_and_filter_pattern() method to convert a pattern by
removing escape characters and dropping an optional "/*" at the end.
This method is available in dir.h as we will use it in
builtin/sparse-checkout.c in a later change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cone mode, the sparse-checkout commmand will write patterns that
allow faster pattern matching. This matching only works if the patterns
in the sparse-checkout file are those written by that command. Users
can edit the sparse-checkout file and create patterns that cause the
cone mode matching to fail.
The cone mode patterns may end in "/*" but otherwise an un-escaped
asterisk or other glob character is invalid. Add checks to disable
cone mode when seeing these values.
A later change will properly handle escaped globs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cone mode, the shortest pattern the sparse-checkout command will
write into the sparse-checkout file is "/*". This is handled carefully
in add_pattern_to_hashsets(), so warn if any other pattern is this
short. This will assist future pattern checks by allowing us to assume
there are at least three characters in the pattern.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
command creates a restricted set of possible patterns that are used
by a custom algorithm to quickly match those patterns.
If a user manually edits the sparse-checkout file, then they could
create patterns that do not match these expectations. The cone-mode
matching algorithm can return incorrect results. The solution is to
detect these incorrect patterns, warn that we do not recognize them,
and revert to the standard algorithm.
Check each pattern for the "**" substring, and revert to the old
logic if seen. While technically a "/<dir>/**" pattern matches
the meaning of "/<dir>/", it is not one that would be written by
the sparse-checkout builtin in cone mode. Attempting to accept that
pattern change complicates the logic and instead we punt and do
not accept any instance of "**".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --sparse option was added to the clone builtin in d89f09c (clone:
add --sparse mode, 2019-11-21) and was tested with a local path clone
in t1091-sparse-checkout-builtin.sh. However, due to a difference in
how local paths are handled versus URLs, this mechanism does not work
with URLs.
Modify the test to use a "file://" URL, which would output this error
before the code change:
Cloning into 'clone'...
fatal: cannot change to 'file://.../repo': No such file or directory
error: failed to initialize sparse-checkout
These errors are due to using a "-C <path>" option to call 'git -C
<path> sparse-checkout init' but the URL is being given instead of
the target directory.
Update that target directory to evaluate this correctly. I have also
manually tested that https:// URLs are handled correctly as well.
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git init' command creates the ".git/info" directory and fills it
with some default files. However, 'git worktree add' does not create
the info directory for that worktree. This causes a problem when running
"git sparse-checkout init" inside a worktree. While care was taken to
allow the sparse-checkout config to be specific to a worktree, this
initialization was untested.
Safely create the leading directories for the sparse-checkout file. This
is the safest thing to do even without worktrees, as a user could delete
their ".git/info" directory and expect Git to recover safely.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1091-sparse-checkout-builtin.sh uses here-docs to populate the
expected contents of the sparse-checkout file. These do not use
shell interpolation, so use "-\EOF" instead of "-EOF". Also use
proper tabbing.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When testing the sparse-checkout feature, we need to compare the
contents of the working-directory against some expected output.
Using here-docs was useful in the beginning, but became repetetive
as the test script grew.
Create a check_files helper to make the tests simpler and easier
to extend. It also reduces instances of bad here-doc whitespace.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the upgrade, the library names changed from libeay32/ssleay32 to
libcrypto/libssl.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This reverts commit 777b420347.
This is a temporary fix while the regression here gets fixed upstream.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
This reverts commit cff4e9138d.
This is temporary until we fix this behavior upstream. For now,
we need to allow the sparse-checkout command to run when the
status is not clean.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When we create temp files for downloading packs, we use a name
based on the current timestamp. There is no randomness in the
name, so we can have collisions in the same second.
Retry the temp pack names using a new "-<retry>" suffix to the
name before the ".temp".
This is a follow-up to #229.
When we create temp files for downloading packs, we use a name
based on the current timestamp. There is no randomness in the
name, so we can have collisions in the same second.
Retry the temp pack names using a new "-<retry>" suffix to the
name before the ".temp".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
This is a follow-up to #227.
1. When a new flag is added to our Git config, we can run `gvfs-helper prefetch` inside of our `git fetch` calls. This will help ensure we have updated commits and trees even if the background prefetches have fallen behind (or are not running).
2. With a new `--no-update-remote-refs` we can avoid updating the `refs/remotes` namespace. This will allow us to run `git fetch --all --no-update-remote-refs +refs/heads/*:refs/hidden/*` and we will get the new refs into a local folder (that doesn't appear anywhere). The most important thing is that users will still see when their remote refs update.
Teach gvfs-helper to better support the concurrent fetching of the
same packfile by multiple instances.
If 2 instances of gvfs-helper did a POST and requested the same set of
OIDs, they might receive the exact same packfile (same checksum SHA).
Both processes would then race to install their copy of the .pack and
.idx files into the ODB/pack directory.
This is not a problem on Unix (because of filesystem semantics).
On Windows, this can cause an EBUSY/EPERM problem for the loser while
the winner is holding a handle to the target files. (The existing
packfile code already handled simple the existence and/or replacement
case.)
The solution presented here is to silently let the loser claim
victory IIF the .pack and .idx are already present in the ODB.
(We can't check this in advance because we don't know the packfile
SHA checksum until after we receive it and run index-pack.)
We avoid using a per-packfile lockfile (or a single lockfile for
the `vfs-` prefix) to avoid the usual issues with stale lockfiles.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
This replaces #223. There was a strangely-subtle issue about reading
the trailing hash from the downloaded packs that caused issues when
reading from the origin remote.
Add `gvfs-helper prefetch` command line option
and `objects.prefetch` mode in `gvfs-helper server`.
Sorry, but this contains a major refactor of the packfile and loose file handling
to let me share it with the prefetch code. As a side benefit, I collapsed the
tempfile creation before the request goes out and merged the install_ code
after the result is returned.
I also changed packfile code to use the packfile-checksum rather than a
timestamp so that we look more like normal Git.
More details are in the commit message.
To allow running "git fetch" in the background, we want to ensure
that users still see when the remote refs update in their user-
facing fetch commands. Use a new --no-update-remote-refs option
to prevent storing these updates to refs/remotes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The gvfs-helper allows us to download prefetch packs using a simple
subprocess call. The gvfs-helper-client.h method will automatically
compute the timestamp if passing 0, and passing NULL for the number
of downloaded packs is valid.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Teach gvfs-helper to support "/gvfs/prefetch" REST API.
This includes a new `gvfs-helper prefetch --since=<t>` command line option.
And a new `objects.prefetch` verb in `gvfs-helper server` mode.
If `since` argument is omitted, `gvfs-helper` will search the local
shared-cache for the most recent prefetch packfile and start from
there.
The <t> is usually a seconds-since-epoch, but may also be a "friendly"
date -- such as "midnight", "yesterday" and etc. using the existing
date selection mechanism.
Add `gh_client__prefetch()` API to allow `git.exe` to easily call
prefetch (and using the same long-running process as immediate and
queued object fetches).
Expanded t5799 unit tests to include prefetch tests. Test setup now
also builds some commits-and-trees packfiles for testing purposes with
well-known timestamps.
Expanded t/helper/test-gvfs-protocol.exe to support "/gvfs/prefetch"
REST API.
Massive refactor of existing packfile handling in gvfs-helper.c to
reuse more code between "/gvfs/objects POST" and "/gvfs/prefetch".
With this we now properly name packfiles with the checksum SHA1
rather than a date string.
Refactor also addresses some of the confusing tempfile setup and
install_<result> code processing (introduced to handle the ambiguity
of how POST works with commit objects).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
The fsmonitor script that can be used for running all the git tests
using watchman was causing some of the tests to fail because it wrote
to stderr and created some files for debugging purposes.
Add a new debug script to use with debugging and modify the other script
to remove the code that would cause tests to fail.
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
If our POST request includes a commit ID, then the the remote will
send a pack-file containing the commit and all trees reachable from
its root tree. With the current implementation, this causes a
failure since we call install_loose() when asking for one object.
Modify the condition to check for install_pack() when the response
type changes.