During a run of the Scalar functional tests, we hit a case where the
inexact rename detection of a 'git cherry-pick' command slowed to the
point of writing its delayed progress, failing the test because stderr
differed from the control case. Showing progress like this when stderr
is not a terminal is non-standard for Git, so inject an isatty(2) when
initializing the progress option in sequencer.c.
Unfortunately, there is no '--quiet' option in 'git cherry-pick'
currently wired up. This could be considered in the future, and the
isatty(2) could be moved to that position. This would also be needed for
commands like 'git rebase', so we leave that for another time.
During a run of the Scalar functional tests, we hit a case where the
inexact rename detection of a 'git cherry-pick' command slowed to the
point of writing its delayed progress, failing the test because stderr
differed from the control case. Showing progress like this when stderr
is not a terminal is non-standard for Git, so inject an isatty(2) when
initializing the progress option in sequencer.c.
Unfortunately, there is no '--quiet' option in 'git cherry-pick'
currently wired up. This could be considered in the future, and the
isatty(2) could be moved to that position. This would also be needed for
commands like 'git rebase', so we leave that for another time.
Reported-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Test cases specific to handling untracked files in `git stash` a) ensure
that files outside the sparse checkout definition are handled as-expected
and b) document the index expansion inside of `git stash -u`. Note that, in b),
it is not the full repository index that is expanded - it is the temporary,
standalone index containing the stashed untracked files only.
Signed-off-by: Victoria Dye <vdye@github.com>
This branch is exactly #410, but with one more commit: enabling the sparse index by default in d59110a81b42fe80cae0b788869c28c3b1eb8785.
Having this in the `vfs-2.33.0` branch helps build confidence that the sparse index is doing what it should be doing by running in the Scalar functional tests and in our test branches.
If we want to cut a new `microsoft/git` release without enabling the sparse index, we can simply revert this commit.
This verifies that `diff` and `diff --staged` behave the same in sparse
index repositories in the following partially-staged scenarios (i.e. the
index, HEAD, and working directory differ at a given path):
1. Path is within sparse-checkout cone.
2. Path is outside sparse-checkout cone.
3. A merge conflict exists for paths outside sparse-checkout cone.
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
```
6e74958f59 p2000: add 'git checkout -' test and decrease depth
3e1d03c41b p2000: compress repo names
cd94f82005 commit: integrate with sparse-index
65e79b8037 sparse-index: recompute cache-tree
e9a9981477 checkout: stop expanding sparse indexes
4b801c854f t1092: document bad 'git checkout' behavior
71e301501c unpack-trees: resolve sparse-directory/file conflicts
5e96df4df5 t1092: test merge conflicts outside cone
defab1b86d add: allow operating on a sparse-only index
9fc4313c88 pathspec: stop calling ensure_full_index
0ec03ab021 add: ignore outside the sparse-checkout in refresh()
adf5b15ac3 add: remove ensure_full_index() with --renormalize
```
These commits are equivalent to those already in `next` via gitgitgadget/git#999.
```
80b8d6c56b Merge branch 'sparse-index/add' into stolee/sparse-index/add
```
This merge resolves conflicts with some work that happened in parallel, but is already in upstream `master`.
```
c407b2cb34 t7519: rewrite sparse index test
9dad0d2a3d sparse-index: silently return when not using cone-mode patterns
29749209c6 sparse-index: silently return when cache tree fails
e7cdaa0141 unpack-trees: fix nested sparse-dir search
347410c5d5 sparse-checkout: create helper methods
4537233f04 attr: be careful about sparse directories
5282a8614f sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag
3a2f3164fc sparse-checkout: clear tracked sparse dirs
fb47b56719 sparse-checkout: add config to disable deleting dirs
```
These commits are the ones under review as of gitgitgadget/git#1009. Recent review made this less stable. It's a slightly different and more robust version of #396.
> Note: I'm still not done with the feedback for upstream, but the remaining feedback is "can we add tests that cover these tricky technical bits?" and in `microsoft/git` these are already covered by the Scalar functional tests (since that's how they were found).
```
080b02c7b6 diff: ignore sparse paths in diffstat
d91a647494 merge: make sparse-aware with ORT
df49b5f030 merge-ort: expand only for out-of-cone conflicts
cdecb85d77 t1092: add cherry-pick, rebase tests
0c1ecfbdc6 sequencer: ensure full index if not ORT strategy
406dfbede9 sparse-index: integrate with cherry-pick and rebase
```
These commits integrate with `git merge`, `git cherry-pick`, `git revert`, and `git rebase` as of gitgitgadget/git#1019. This got some feedback that changed how the tests were working so they are more robust. This led to a new commit (0c1ecfbdc6).
```
cbb0ab3b06 Merge branch 'sparse-index/merge' into vfs-2.33.0
acb8623f22 t7524: test no longer fails
```
Finally, the commits are merged into `vfs-2.33.0` and also we include a fix to a `microsoft/git` test that is no longer broken.
There is some strangeness when expanding a sparse-index that exists
within a submodule. We will need to resolve that later, but for now,
let's do a better job of explicitly disabling the sparse-index when
requested, and do so in t7817.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Upstream, a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
04-08-2021) modified how 'git add <pathspec>' works with cache entries
marked with the SKIP_WORKTREE bit. The intention is to prevent a user
from accidentally adding a path that is outside their sparse-checkout
definition but somehow matches an existing index entry.
This breaks when using the virtual filesystem in VFS for Git. It is
rare, but we could be in a scenario where the user has staged a change
and then the file is projected away. If the user re-adds the file, then
this warning causes the command to fail with the advise message.
Disable this logic when core_virtualfilesystem is enabled.
This should allow the VFS for Git functional tests to pass (at least
the ones in the default run). I'll create a `-pr` installer build to
check before merging this.
The diff_populate_filespec() method is used to describe the diff after a
merge operation is complete, especially when a conflict appears. In
order to avoid expanding a sparse index, the reuse_worktree_file() needs
to be adapted to ignore files that are outside of the sparse-checkout
cone. The file names and OIDs used for this check come from the merged
tree in the case of the ORT strategy, not the index, hence the ability
to look into these paths without having already expanded the index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Upstream, a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08) modified how 'git add <pathspec>' works with cache entries
marked with the SKIP_WORKTREE bit. The intention is to prevent a user
from accidentally adding a path that is outside their sparse-checkout
definition but somehow matches an existing index entry.
A similar change for 'git rm' happened in d5f4b82 (rm: honor sparse
checkout patterns, 2021-04-08).
This breaks when using the virtual filesystem in VFS for Git. It is
rare, but we could be in a scenario where the user has staged a change
and then the file is projected away. If the user re-adds the file, then
this warning causes the command to fail with the advise message.
Disable this logic when core_virtualfilesystem is enabled.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The clean_tracked_sparse_directories() method deletes the tracked
directories that go out of scope when the sparse-checkout cone changes,
at least in cone mode. This is new behavior, but is recommended based on
our understanding of how users are interacting with the feature in most
cases.
It is possible that some users will object to the new behavior, so
create a new configuration option 'index.deleteSparseDirectories' that
can be set to 'false' to make clean_tracked_sparse_directories() do
nothing. This will keep all untracked files in the working tree and
cause performance problems with the sparse index, but those trade-offs
are for the user to decide.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Microsoft's fork of Git is not quite Git for Windows, therefore we want
to tell the keen reader all about it. :-)
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
We have long inherited the pull request template from
git-for-windows/git, but we should probably do a better job of
specifying the need for why a PR in microsoft/git exists instead of an
upstream contribution.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
We have been using the default issue template from git-for-windows/git,
but we should ask different questions than Git for Windows. Update the
issue template to ask these helpful questions.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The steps to update the microsoft-git cask are:
1. brew update
2. brew upgrade --cask microsoft-git
This is adapted from the UpgradeVerb within microsoft/scalar. There is
one important simplification: Scalar needed to check 'brew list --cask'
to find out if the 'scalar' cask or the 'scalar-azrepos' cask was
installed (which determined if the 'microsoft-git' cask was a necessary
dependency). We do not need that here, since we are already in the
microsoft-git cask.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
On Windows, we have the 'git update-git-for-windows' command. It is
poorly named within the microsoft/git fork, because the script has been
updated to look at the GitHub releases of microsoft/git, not
git-for-windows/git.
Still, it handles all the complicated details about downloading,
verifying, and running the installer.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Just do the boilerplate stuff of making a new builtin, including
documentation and integration with git.c.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Add a GitHub workflow that is triggered on the `release` event to
automatically update the `microsoft-git` Homebrew Cask on the
`microsoft/git` Tap.
A secret `HOMEBREW_TOKEN` with push permissions to the
`microsoft/homebrew-git` repository must exist. A pull request will be
created at the moment to allow for last minute manual verification.
Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
When the virtualfilesystem is enabled the previous implementation of
clear_ce_flags would iterate all of the cache entries and query whether
each one is in the virtual filesystem to determine whether to clear one
of the SKIP_WORKTREE bits. For each cache entry, we would do a hash
lookup for each parent directory in the is_included_in_virtualfilesystem
function.
The former approach is slow for a typical Windows OS enlistment with
3 million files where only a small percentage is in the virtual
filesystem. The cost is
O(n_index_entries * n_chars_per_path * n_parent_directories_per_path).
In this change, we use the same approach as apply_virtualfilesystem,
which iterates the set of entries in the virtualfilesystem and searches
in the cache for the corresponding entries in order to clear their
flags. This approach has a cost of
O(n_virtual_filesystem_entries * n_chars_per_path * log(n_index_entries)).
The apply_virtualfilesystem code was refactored a bit and modified to
clear flags for all names that 'alias' a given virtual filesystem name
when ignore_case is set.
n_virtual_filesystem_entries is typically much less than
n_index_entries, in which case the new approach is much faster. We wind
up building the name hash for the index, but this occurs quickly thanks
to the multi-threading.
Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com>
For Scalar and VFS for Git, we use an alternate as a shared object
cache. We need to enable the maintenance builtin to work on that
shared object cache, especially in the background.
'scalar run <task>' would set GIT_OBJECT_DIRECTORY to handle this.
We set GIT_OBJECT_DIRECTORY based on the gvfs.sharedCache config,
but we also need the checks in pack_loose() to look at that object
directory instead of the current ODB's.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Test t5516-fetch-push.sh has a test 'deny fetch unreachable SHA1,
allowtipsha1inwant=true' that checks stderr for a specific error
string from the remote. In some build environments the error sent
over the remote connection gets mingled with the error from the
die() statement. Since both signals are being output to the same
file descriptor (but from parent and child processes), the output
we are matching with grep gets split.
To reduce the risk of this failure, follow this process instead:
1. Write an error message to stderr.
2. Write an error message across the connection.
3. exit(1).
This reorders the events so the error is written entirely before
the client receives a message from the remote, removing the race
condition.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
`sparse` complains with an error message like this:
gvfs-helper.c:2912:17: error: expression using sizeof on a
function
The culprit is this line:
curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, fwrite);
Similar lines exist in `http-push.c` and other files that are in
upstream Git, and to avoid these bogus warnings, they are already
exempted from `sparse`'s tender, loving care. We simply add
`gvfs-helper.c` to that list.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When we create temp files for downloading packs, we use a name
based on the current timestamp. There is no randomness in the
name, so we can have collisions in the same second.
Retry the temp pack names using a new "-<retry>" suffix to the
name before the ".temp".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When using the GVFS protocol, we should _never_ call "git fetch-pack"
to attempt downloading a pack-file via the regular Git protocol. It
appears that the mechanism that prevented this in the VFS for Git
world is due to the read-object hook populating the commits at the
new ref tips in a different way than the gvfs-helper does.
By acting as if the fetch-pack succeeds here in remote-curl, we
prevent a failed fetch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Teach gvfs-helper to better support the concurrent fetching of the
same packfile by multiple instances.
If 2 instances of gvfs-helper did a POST and requested the same set of
OIDs, they might receive the exact same packfile (same checksum SHA).
Both processes would then race to install their copy of the .pack and
.idx files into the ODB/pack directory.
This is not a problem on Unix (because of filesystem semantics).
On Windows, this can cause an EBUSY/EPERM problem for the loser while
the winner is holding a handle to the target files. (The existing
packfile code already handled simple the existence and/or replacement
case.)
The solution presented here is to silently let the loser claim
victory IIF the .pack and .idx are already present in the ODB.
(We can't check this in advance because we don't know the packfile
SHA checksum until after we receive it and run index-pack.)
We avoid using a per-packfile lockfile (or a single lockfile for
the `vfs-` prefix) to avoid the usual issues with stale lockfiles.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
The gvfs-helper allows us to download prefetch packs using a simple
subprocess call. The gvfs-helper-client.h method will automatically
compute the timestamp if passing 0, and passing NULL for the number
of downloaded packs is valid.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Teach gvfs-helper to support "/gvfs/prefetch" REST API.
This includes a new `gvfs-helper prefetch --since=<t>` command line option.
And a new `objects.prefetch` verb in `gvfs-helper server` mode.
If `since` argument is omitted, `gvfs-helper` will search the local
shared-cache for the most recent prefetch packfile and start from
there.
The <t> is usually a seconds-since-epoch, but may also be a "friendly"
date -- such as "midnight", "yesterday" and etc. using the existing
date selection mechanism.
Add `gh_client__prefetch()` API to allow `git.exe` to easily call
prefetch (and using the same long-running process as immediate and
queued object fetches).
Expanded t5799 unit tests to include prefetch tests. Test setup now
also builds some commits-and-trees packfiles for testing purposes with
well-known timestamps.
Expanded t/helper/test-gvfs-protocol.exe to support "/gvfs/prefetch"
REST API.
Massive refactor of existing packfile handling in gvfs-helper.c to
reuse more code between "/gvfs/objects POST" and "/gvfs/prefetch".
With this we now properly name packfiles with the checksum SHA1
rather than a date string.
Refactor also addresses some of the confusing tempfile setup and
install_<result> code processing (introduced to handle the ambiguity
of how POST works with commit objects).
Update 2023-05-22 (v2.41.0): add '--no-rev-index' to 'index-pack' to avoid
writing the extra (unused) file.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach helper/test-gvfs-protocol to be able to send corrupted
loose blobs.
Add unit test for gvfs-helper to detect receipt of a corrupted loose blob.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
It is possible that a loose object that is written from a GVFS protocol
"get object" request does not match the expected hash. Error out in this
case.
2021-10-30: The prototype for read_loose_object() changed in 31deb28 (fsck:
don't hard die on invalid object types, 2021-10-01) and 96e41f5 (fsck:
report invalid object type-path combinations, 2021-10-01).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Earlier versions of the test always returned a packfile in response to a POST.
Now we look at the number of objects in the POST request.
If > 1, always send a packfile.
If = 1 and it is a commit, send a packfile.
Otherwise, send a loose object.
This is to better model the behavior of the GVFS server/protocol which
treats commits differently.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
gvfs-helper prints a "loose <oid>" or "packfile <name>" messages after
they are received to help invokers update their in-memory caches.
Move the code to accumulate these messages in the result_list into
the install_* functions rather than waiting until the end.
POST requests containing 1 object may return a loose object or a packfile
depending on whether the object is a commit or non-commit. Delaying the
message generation just complicated the caller.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Create t/helper/test-gvfs-protocol.c and t/t5799-gvfs-helper.sh
to test gvfs-helper.
Create t/helper/test-gvfs-protocol.c as a stand-alone web server that
speaks the GVFS Protocol [1] and serves loose objects and packfiles
to clients. It is borrows heavily from the code in daemon.c.
It includes a "mayhem" mode to cause various network and HTTP errors
to test the retry/recovery ability of gvfs-helper.
Create t/t5799-gvfs-helper.sh to test gvfs-helper.
[1] https://github.com/microsoft/VFSForGit/blob/master/Protocol.md
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
If our POST request includes a commit ID, then the the remote will
send a pack-file containing the commit and all trees reachable from
its root tree. With the current implementation, this causes a
failure since we call install_loose() when asking for one object.
Modify the condition to check for install_pack() when the response
type changes.
Also, create a tempfile for the pack-file download or else we will
have problems!
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
During development, it was very helpful to see the gvfs-helper do its
work to request a pack-file or download a loose object. When these
messages appear during normal use, it leads to a very noisy terminal
output.
Remove all progress indicators when downloading loose objects. We know
that these can be numbered in the thousands in certain kinds of history
calls, and would litter the terminal output with noise. This happens
during 'git fetch' or 'git pull' as well when the tip commits are
checked for the new refs.
Remove the "Requesting packfile with %ld objects" message, as this
operation is very fast. We quickly follow up with the more valuable
"Receiving packfile %ld%ld with %ld objects". When a large "git
checkout" causes many pack-file downloads, it is good to know that Git
is asking for data from the server.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Expose the differences in the semantics of GET and POST for
the "gvfs/objects" API:
HTTP GET: fetches a single loose object over the network.
When a commit object is requested, it just returns
the single object.
HTTP POST: fetches a batch of objects over the network.
When the oid-set contains a commit object, all
referenced trees are also included in the response.
gvfs-helper is updated to take "get" and "post" command line options.
the gvfs-helper "server" mode is updated to take "objects.get" and
"objects.post" verbs.
For convenience, the "get" option and the "objects.get" verb
do allow more than one object to be requested. gvfs-helper will
automatically issue a series of (single object) HTTP GET requests
and creating a series of loose objects.
The "post" option and the "objects.post" verb will perform bulk
object fetching using the batch-size chunking. Individual HTTP
POST requests containing more than one object will be created
as a packfile. A HTTP POST for a single object will create a
loose object.
This commit also contains some refactoring to eliminate the
assumption that POST is always associated with packfiles.
In gvfs-helper-client.c, gh_client__get_immediate() now uses the
"objects.get" verb and ignores any currently queued objects.
In gvfs-helper-client.c, the OIDSET built by gh_client__queue_oid()
is only processed when gh_client__drain_queue() is called. The queue
is processed using the "object.post" verb.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add robust-retry mechanism to automatically retry a request after network
errors. This includes retry after:
[] transient network problems reported by CURL.
[] http 429 throttling (with associated Retry-After)
[] http 503 server unavailable (with associated Retry-After)
Add voluntary throttling using Azure X-RateLimit-* hints to avoid being
soft-throttled (tarpitted) or hard-throttled (429) on later requests.
Add global (outside of a single request) azure-throttle data to track the
rate limit hints from the cache-server and main Git server independently.
Add exponential retry backoff. This is used for transient network problems
when we don't have a Retry-After hint.
Move the call to index-pack earlier in the response/error handling sequence
so that if we receive a 200 but yet the packfile is truncated/corrupted, we
can use the regular retry logic to get it again.
Refactor the way we create tempfiles for packfiles to use
<odb>/pack/tempPacks/ rather than working directly in the <odb>/pack/
directory.
Move the code to create a new tempfile to the start of a single request
attempt (initial and retry attempts), rather than at the overall start
of a request. This gives us a fresh tempfile for each network request
attempt. This simplifies the retry mechanism and isolates us from the file
ownership issues hidden within the tempfile class. And avoids the need to
truncate previous incomplete results. This was necessary because index-pack
was pulled into the retry loop.
Minor: Add support for logging X-VSS-E2EID to telemetry on network errors.
Minor: rename variable:
params.b_no_cache_server --> params.b_permit_cache_server_if_defined.
This variable is used to indicate whether we should try to use the
cache-server when it is defined. Got rid of double-negative logic.
Minor: rename variable:
params.label --> params.tr2_label
Clarify that this variable is only used with trace2 logging.
Minor: Move the code to automatically map cache-server 400 responses
to normal 401 response earlier in the response/error handling sequence
to simplify later retry logic.
Minor: Decorate trace2 messages with "(cs)" or "(main)" to identify the
server in log messages. Add params->server_type to simplify this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Fix parsing of the "loose <odb>" response from `gvfs-helper` and
use the actually parsed OID when updating the loose oid cache.
Previously, an uninitialized "struct oid" was used to update
the cache. This did not cause any corruption, but could cause
extra fetches for objects visited multiple times.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2 message for CURL and HTTP errors.
Fix typo reporting network error code back to gvfs-helper-client.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
The config variable `gvfs.sharedCache` contains the pathname to an alternate
<odb> that will be used by `gvfs-helper` to store dynamically-fetched missing
objects. If this directory does not exist on disk, `prepare_alt_odb()` omits
this directory from the in-memory list of alternates. This causes `git`
commands (and `gvfs-helper` in particular) to fall-back to `.git/objects` for
storage of these objects. This disables the shared-cache and leads to poorer
performance.
Teach `alt_obj_usable()` and `prepare_alt_odb()`, match up the directory
named in `gvfs.sharedCache` with an entry in `.git/objects/info/alternates`
and force-create the `<odb>` root directory (and the associated `<odb>/pack`
directory) if necessary.
If the value of `gvfs.sharedCache` refers to a directory that is NOT listed
as an alternate, create an in-memory alternate entry in the odb-list. (This
is similar to how GIT_ALTERNATE_OBJECT_DIRECTORIES works.)
This work happens the first time that `prepare_alt_odb()` is called.
Furthermore, teach the `--shared-cache=<odb>` command line option in
`gvfs-helper` (which is runs after the first call to `prepare_alt_odb()`)
to override the inherited shared-cache (and again, create the ODB directory
if necessary).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>