We already have the `config` command that accesses the `gvfs/config`
endpoint.
To implement `scalar`, we also need to be able to access the `vsts/info`
endpoint. Let's add a command to do precisely that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The GVFS cache server can return multiple pairs of (.pack, .idx)
files. If both are provided, `gvfs-helper` assumes that they are
valid without any validation. This might cause problems if the
.pack file is corrupt inside the data stream. (This might happen
if the cache server sends extra unexpected STDERR data or if the
.pack file is corrupt on the cache server's disk.)
All of the .pack file verification logic is already contained
within `git index-pack`, so let's ignore the .idx from the data
stream and force compute it.
This defeats the purpose of some of the data cacheing on the cache
server, but safety is more important.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
When we create temp files for downloading packs, we use a name
based on the current timestamp. There is no randomness in the
name, so we can have collisions in the same second.
Retry the temp pack names using a new "-<retry>" suffix to the
name before the ".temp".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Teach gvfs-helper to better support the concurrent fetching of the
same packfile by multiple instances.
If 2 instances of gvfs-helper did a POST and requested the same set of
OIDs, they might receive the exact same packfile (same checksum SHA).
Both processes would then race to install their copy of the .pack and
.idx files into the ODB/pack directory.
This is not a problem on Unix (because of filesystem semantics).
On Windows, this can cause an EBUSY/EPERM problem for the loser while
the winner is holding a handle to the target files. (The existing
packfile code already handled simple the existence and/or replacement
case.)
The solution presented here is to silently let the loser claim
victory IIF the .pack and .idx are already present in the ODB.
(We can't check this in advance because we don't know the packfile
SHA checksum until after we receive it and run index-pack.)
We avoid using a per-packfile lockfile (or a single lockfile for
the `vfs-` prefix) to avoid the usual issues with stale lockfiles.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach gvfs-helper to support "/gvfs/prefetch" REST API.
This includes a new `gvfs-helper prefetch --since=<t>` command line option.
And a new `objects.prefetch` verb in `gvfs-helper server` mode.
If `since` argument is omitted, `gvfs-helper` will search the local
shared-cache for the most recent prefetch packfile and start from
there.
The <t> is usually a seconds-since-epoch, but may also be a "friendly"
date -- such as "midnight", "yesterday" and etc. using the existing
date selection mechanism.
Add `gh_client__prefetch()` API to allow `git.exe` to easily call
prefetch (and using the same long-running process as immediate and
queued object fetches).
Expanded t5799 unit tests to include prefetch tests. Test setup now
also builds some commits-and-trees packfiles for testing purposes with
well-known timestamps.
Expanded t/helper/test-gvfs-protocol.exe to support "/gvfs/prefetch"
REST API.
Massive refactor of existing packfile handling in gvfs-helper.c to
reuse more code between "/gvfs/objects POST" and "/gvfs/prefetch".
With this we now properly name packfiles with the checksum SHA1
rather than a date string.
Refactor also addresses some of the confusing tempfile setup and
install_<result> code processing (introduced to handle the ambiguity
of how POST works with commit objects).
Update 2023-05-22 (v2.41.0): add '--no-rev-index' to 'index-pack' to avoid
writing the extra (unused) file.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
It is possible that a loose object that is written from a GVFS protocol
"get object" request does not match the expected hash. Error out in this
case.
2021-10-30: The prototype for read_loose_object() changed in 31deb28 (fsck:
don't hard die on invalid object types, 2021-10-01) and 96e41f5 (fsck:
report invalid object type-path combinations, 2021-10-01).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
gvfs-helper prints a "loose <oid>" or "packfile <name>" messages after
they are received to help invokers update their in-memory caches.
Move the code to accumulate these messages in the result_list into
the install_* functions rather than waiting until the end.
POST requests containing 1 object may return a loose object or a packfile
depending on whether the object is a commit or non-commit. Delaying the
message generation just complicated the caller.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Create t/helper/test-gvfs-protocol.c and t/t5799-gvfs-helper.sh
to test gvfs-helper.
Create t/helper/test-gvfs-protocol.c as a stand-alone web server that
speaks the GVFS Protocol [1] and serves loose objects and packfiles
to clients. It is borrows heavily from the code in daemon.c.
It includes a "mayhem" mode to cause various network and HTTP errors
to test the retry/recovery ability of gvfs-helper.
Create t/t5799-gvfs-helper.sh to test gvfs-helper.
[1] https://github.com/microsoft/VFSForGit/blob/master/Protocol.md
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
If our POST request includes a commit ID, then the the remote will
send a pack-file containing the commit and all trees reachable from
its root tree. With the current implementation, this causes a
failure since we call install_loose() when asking for one object.
Modify the condition to check for install_pack() when the response
type changes.
Also, create a tempfile for the pack-file download or else we will
have problems!
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
During development, it was very helpful to see the gvfs-helper do its
work to request a pack-file or download a loose object. When these
messages appear during normal use, it leads to a very noisy terminal
output.
Remove all progress indicators when downloading loose objects. We know
that these can be numbered in the thousands in certain kinds of history
calls, and would litter the terminal output with noise. This happens
during 'git fetch' or 'git pull' as well when the tip commits are
checked for the new refs.
Remove the "Requesting packfile with %ld objects" message, as this
operation is very fast. We quickly follow up with the more valuable
"Receiving packfile %ld%ld with %ld objects". When a large "git
checkout" causes many pack-file downloads, it is good to know that Git
is asking for data from the server.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Expose the differences in the semantics of GET and POST for
the "gvfs/objects" API:
HTTP GET: fetches a single loose object over the network.
When a commit object is requested, it just returns
the single object.
HTTP POST: fetches a batch of objects over the network.
When the oid-set contains a commit object, all
referenced trees are also included in the response.
gvfs-helper is updated to take "get" and "post" command line options.
the gvfs-helper "server" mode is updated to take "objects.get" and
"objects.post" verbs.
For convenience, the "get" option and the "objects.get" verb
do allow more than one object to be requested. gvfs-helper will
automatically issue a series of (single object) HTTP GET requests
and creating a series of loose objects.
The "post" option and the "objects.post" verb will perform bulk
object fetching using the batch-size chunking. Individual HTTP
POST requests containing more than one object will be created
as a packfile. A HTTP POST for a single object will create a
loose object.
This commit also contains some refactoring to eliminate the
assumption that POST is always associated with packfiles.
In gvfs-helper-client.c, gh_client__get_immediate() now uses the
"objects.get" verb and ignores any currently queued objects.
In gvfs-helper-client.c, the OIDSET built by gh_client__queue_oid()
is only processed when gh_client__drain_queue() is called. The queue
is processed using the "object.post" verb.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add robust-retry mechanism to automatically retry a request after network
errors. This includes retry after:
[] transient network problems reported by CURL.
[] http 429 throttling (with associated Retry-After)
[] http 503 server unavailable (with associated Retry-After)
Add voluntary throttling using Azure X-RateLimit-* hints to avoid being
soft-throttled (tarpitted) or hard-throttled (429) on later requests.
Add global (outside of a single request) azure-throttle data to track the
rate limit hints from the cache-server and main Git server independently.
Add exponential retry backoff. This is used for transient network problems
when we don't have a Retry-After hint.
Move the call to index-pack earlier in the response/error handling sequence
so that if we receive a 200 but yet the packfile is truncated/corrupted, we
can use the regular retry logic to get it again.
Refactor the way we create tempfiles for packfiles to use
<odb>/pack/tempPacks/ rather than working directly in the <odb>/pack/
directory.
Move the code to create a new tempfile to the start of a single request
attempt (initial and retry attempts), rather than at the overall start
of a request. This gives us a fresh tempfile for each network request
attempt. This simplifies the retry mechanism and isolates us from the file
ownership issues hidden within the tempfile class. And avoids the need to
truncate previous incomplete results. This was necessary because index-pack
was pulled into the retry loop.
Minor: Add support for logging X-VSS-E2EID to telemetry on network errors.
Minor: rename variable:
params.b_no_cache_server --> params.b_permit_cache_server_if_defined.
This variable is used to indicate whether we should try to use the
cache-server when it is defined. Got rid of double-negative logic.
Minor: rename variable:
params.label --> params.tr2_label
Clarify that this variable is only used with trace2 logging.
Minor: Move the code to automatically map cache-server 400 responses
to normal 401 response earlier in the response/error handling sequence
to simplify later retry logic.
Minor: Decorate trace2 messages with "(cs)" or "(main)" to identify the
server in log messages. Add params->server_type to simplify this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2 message for CURL and HTTP errors.
Fix typo reporting network error code back to gvfs-helper-client.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
The config variable `gvfs.sharedCache` contains the pathname to an alternate
<odb> that will be used by `gvfs-helper` to store dynamically-fetched missing
objects. If this directory does not exist on disk, `prepare_alt_odb()` omits
this directory from the in-memory list of alternates. This causes `git`
commands (and `gvfs-helper` in particular) to fall-back to `.git/objects` for
storage of these objects. This disables the shared-cache and leads to poorer
performance.
Teach `alt_obj_usable()` and `prepare_alt_odb()`, match up the directory
named in `gvfs.sharedCache` with an entry in `.git/objects/info/alternates`
and force-create the `<odb>` root directory (and the associated `<odb>/pack`
directory) if necessary.
If the value of `gvfs.sharedCache` refers to a directory that is NOT listed
as an alternate, create an in-memory alternate entry in the odb-list. (This
is similar to how GIT_ALTERNATE_OBJECT_DIRECTORIES works.)
This work happens the first time that `prepare_alt_odb()` is called.
Furthermore, teach the `--shared-cache=<odb>` command line option in
`gvfs-helper` (which is runs after the first call to `prepare_alt_odb()`)
to override the inherited shared-cache (and again, create the ODB directory
if necessary).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Create gvfs-helper. This is a helper tool to use the GVFS Protocol
REST API to fetch objects and configuration data from a GVFS cache-server
or Git server. This tool uses libcurl to send object requests to either
server. This tool creates loose objects and/or packfiles.
Create gvfs-helper-client. This code resides within git proper and
uses the sub-process API to manage gvfs-helper as a long-running background
process.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>