Preparation for SHA-256 upgrade continues.
* bc/object-id-part17: (26 commits)
midx: switch to using the_hash_algo
builtin/show-index: replace sha1_to_hex
rerere: replace sha1_to_hex
builtin/receive-pack: replace sha1_to_hex
builtin/index-pack: replace sha1_to_hex
packfile: replace sha1_to_hex
wt-status: convert struct wt_status to object_id
cache: remove null_sha1
builtin/worktree: switch null_sha1 to null_oid
builtin/repack: write object IDs of the proper length
pack-write: use hash_to_hex when writing checksums
sequencer: convert to use the_hash_algo
bisect: switch to using the_hash_algo
sha1-lookup: switch hard-coded constants to the_hash_algo
config: use the_hash_algo in abbrev comparison
combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
bundle: switch to use the_hash_algo
connected: switch GIT_SHA1_HEXSZ to the_hash_algo
show-index: switch hard-coded constants to the_hash_algo
blame: remove needless comparison with GIT_SHA1_HEXSZ
...
Add trace2 regions to fetch-pack.c and builtins/fetch.c to better track
time spent in the various phases of a fetch:
* listing refs
* negotiation for protocol versions v0-v2
* fetching refs
* consuming refs
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The list-objects-filter API (used to create a sparse/lazy clone)
learned to take a combined filter specification.
* md/list-objects-filter-combo:
list-objects-filter-options: make parser void
list-objects-filter-options: clean up use of ALLOC_GROW
list-objects-filter-options: allow mult. --filter
strbuf: give URL-encoding API a char predicate fn
list-objects-filter-options: make filter_spec a string_list
list-objects-filter-options: move error check up
list-objects-filter: implement composite filters
list-objects-filter-options: always supply *errbuf
list-objects-filter: put omits set in filter struct
list-objects-filter: encapsulate filter components
Instead of hard-coding constants, use parse_oid_hex to compute a pointer
and use it in further parsing operations.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'feature.experimental' setting includes config options that are
not committed to become defaults, but could use additional testing.
Update the following config settings to take new defaults, and to
use the repo_settings struct if not already using them:
* 'pack.useSparse=true'
* 'fetch.negotiationAlgorithm=skipping'
In the case of fetch.negotiationAlgorithm, the existing logic
would load the config option only when about to use the setting,
so had a die() statement on an unknown string value. This is
removed as now the config is parsed under prepare_repo_settings().
In general, this die() is probably misplaced and not valuable.
A test was removed that checked this die() statement executed.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Protocol capabilities that go over wire should never be translated,
but it was incorrectly marked for translation, which has been
corrected. The output of protocol capabilities for debugging has
been tweaked a bit.
* nd/fetch-capability-tweak:
fetch-pack: print server version at the top in -v -v
fetch-pack: print all relevant supported capabilities with -v -v
fetch-pack: move capability names out of i18n strings
Make the filter_spec string a string_list rather than a raw C string.
The list of strings must be concatted together to make a complete
filter_spec. A future patch will use this capability to build "combine:"
filter specs gradually.
A strbuf would seem to be a more natural choice for this object, but it
unfortunately requires initialization besides just zero'ing out the
memory. This results in all container structs, and all containers of
those structs, etc., to also require initialization. Initializing them
all would be more cumbersome that simply using a string_list, which
behaves properly when its contents are zero'd.
For the purposes of code simplification, change behavior in how filter
specs are conveyed over the protocol: do not normalize the tree:<depth>
filter specs since there should be no server in existence that supports
tree:# but not tree:#k etc.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before the previous patch, the server version is printed after all the
"Server supports" lines. The previous one puts the version in the middle
of "Server supports" group.
Instead of moving it to the bottom, I move it to the top. Version may
stand out more at the top as we will have even more debug out after
capabilities.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check if some capability is supported, we do print something in
verbose mode. Some capabilities are not printed though (and it made me
think it's not supported; I was more used to GIT_TRACE_PACKET) so let's
print them all.
It's a bit more code. And one could argue for printing all supported
capabilities the server sends us. But I think it's still valuable this
way because we see the capabilities that the client cares about.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reduces the work on translators since they only have one string to
translate (and I think it's still enough context to translate). It also
makes sure no capability name is translated by accident.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are no callers left of lookup_object() that aren't just passing us
the "hash" member of a "struct object_id". Let's take the whole struct,
which gets us closer to removing all raw sha1 variables. It also
matches the existing conversions of lookup_blob(), etc.
The conversions of callers were done by hand, but they're all mechanical
one-liners.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, if any server options are specified during a protocol v2
fetch, server options will be sent before "command=fetch". Write server
options to the request buffer in send_fetch_request() so that the
components of the request are sent in the correct order.
The protocol documentation states that the command must come first. The
Git server implementation in serve.c (see process_request() in that
file) tolerates any order of command and capability, which is perhaps
why we haven't noticed this. This was noticed when testing against a
JGit server implementation, which follows the documentation in this
regard.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up and a fix for "git fetch" by an explicit object name
(as opposed to fetching refs by name).
* jk/fetch-reachability-error-fix:
fetch: do not consider peeled tags as advertised tips
remote.c: make singular free_ref() public
fetch: use free_refs()
pkt-line: prepare buffer before handling ERR packets
upload-pack: send ERR packet for non-tip objects
t5530: check protocol response for "not our ref"
t5516: drop ok=sigpipe from unreachable-want tests
Fix for protocol v2 support in "git fetch-pack" of shallow clones.
* jt/fetch-no-update-shallow-in-proto-v2:
fetch-pack: respect --no-update-shallow in v2
fetch-pack: call prepare_shallow_info only if v0
Our filter_refs() function accidentally considers the target of a peeled
tag to be advertised by the server, even though upload-pack on the
server side does not consider it so. This can result in the client
making a bogus fetch to the server, which will end with the server
complaining "not our ref". Whereas the correct behavior is for the
client to notice that the server will not allow the request and error
out immediately.
So as bugs go, this is not very serious (the outcome is the same either
way -- the fetch fails). But it's worth making the logic here correct
and consistent with other related cases (e.g., fetching an oid that the
server did not mention at all).
The crux of the issue comes from fdb69d33c4 (fetch-pack: always allow
fetching of literal SHA1s, 2017-05-15). After that, the strategy of
filter_refs() is basically:
- for each advertised ref, try to match it with a "sought" ref
provided by the user. Skip any malformed refs (which includes
peeled values like "refs/tags/foo^{}"), and place any unmatched
items onto the unmatched list.
- if there are unmatched sought refs, then put all of the advertised
tips into an oidset, including the unmatched ones.
- for each sought ref, see if it's in the oidset, in which case it's
legal for us to ask the server for it
The problem is in the second step. Our list of unmatched refs includes
the peeled refs, even though upload-pack does not allow them to be
directly fetched. So the simplest fix would be to exclude them during
that step.
However, we can observe that the unmatched list isn't used for anything
else, and is freed at the end. We can just free those malformed refs
immediately. That saves us having to check each ref a second time to see
if it's malformed.
Note that this code only kicks in when "strict" is in effect. I.e., if
we are using the v0 protocol and uploadpack.allowReachableSHA1InWant is
not in effect. With v2, all oids are allowed, and we do not bother
creating or consulting the oidset at all. To future-proof our test
against the upcoming GIT_TEST_PROTOCOL_VERSION flag, we'll manually mark
it as a v0-only test.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need for us to write this loop manually when a helper
function can already do it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In do_fetch_pack_v2(), the "sought" array is sorted by name, and it is
not subsequently reordered (within the function). Therefore,
receive_wanted_refs() can assume that "sought" is sorted, and can thus
use a binary search when storing wanted-refs retrieved from the server.
Replace the existing linear search with a binary search. This improves
performance significantly when mirror cloning a repository with more
than 1 million refs.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fetch_pack(), be clearer that there is no shallow information before
the fetch when v2 is used - memset the struct shallow_info to 0 instead
of calling prepare_shallow_info().
This patch is in preparation for a future patch in which a v2 fetch
might call prepare_shallow_info() after shallow info has been retrieved
during the fetch, so I needed to ensure that prepare_shallow_info() is
not called before the fetch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need the caller of fetch_pack() to pass in "dest", which is the
remote URL. Since ba227857d2 (Reduce the number of connects when
fetching, 2008-02-04), the caller is responsible for calling
git_connect() itself, and our "dest" parameter is unused.
That commit also started passing us the resulting "conn" child_process
from git_connect(). But likewise, we do not need do anything with it.
The descriptors in "fd" are enough for us, and the caller is responsible
for cleaning up "conn".
We can just drop both parameters.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On platforms where "git fetch" is killed with SIGPIPE (e.g. OSX),
the upload-pack that runs on the other end that hangs up after
detecting an error could cause "git fetch" to die with a signal,
which led to a flakey test. "git fetch" now ignores SIGPIPE during
the network portion of its operation (this is not a problem as we
check the return status from our write(2)s).
* jk/no-sigpipe-during-network-transport:
fetch: ignore SIGPIPE during network operation
fetch: avoid calling write_or_die()
The write_or_die() function has one quirk that a caller might not
expect: when it sees EPIPE from the write() call, it translates that
into a death by SIGPIPE. This doesn't change the overall behavior (the
program exits either way), but it does potentially confuse test scripts
looking for a non-signal exit code.
Let's switch away from using write_or_die() in a few code paths, which
will give us more consistent exit codes. It also gives us the
opportunity to write more descriptive error messages, since we have
context that write_or_die() does not.
Note that this won't do much by itself, since we'd typically be killed
by SIGPIPE before write_or_die() even gets a chance to do its thing.
That will be addressed in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" over protocol v2 that needs to make a second connection
to backfill tags did not clear a variable that holds shallow
repository information correctly, leading to an access of freed
piece of memory.
* bc/fetch-pack-clear-alternate-shallow:
fetch-pack: clear alternate shallow in one more place
fetch-pack: clear alternate shallow when complete
The previous one did not clear the variable in one codepath,
but we should aim to be complete.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
[jc: made a reroll into incremental, as the previous one already is
in the next branch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.
* jt/fetch-v2-sideband:
tests: define GIT_TEST_SIDEBAND_ALL
{fetch,upload}-pack: sideband v2 fetch response
sideband: reverse its dependency on pkt-line
pkt-line: introduce struct packet_writer
pack-protocol.txt: accept error packets in any context
Use packet_reader instead of packet_read_line
Update the protocol message specification to allow only the limited
use of scaled quantities. This is ensure potential compatibility
issues will not go out of hand.
* js/filter-options-should-use-plain-int:
filter-options: expand scaled numbers
tree:<depth>: skip some trees even when collecting omits
list-objects-filter: teach tree:# how to handle >0
When we write an alternate shallow file in update_shallow, we write it
into the lock file. The string stored in alternate_shallow_file is
copied from the lock file path, but it is freed the moment that the lock
file is closed, since we call strbuf_release to free that path.
This used to work, since we did not invoke git index-pack more than
once, but now that we do, we reuse the freed memory. Ensure we reset the
value to NULL to avoid using freed memory. git index-pack will read the
repository's shallow file, which will have been updated with the correct
information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Define a GIT_TEST_SIDEBAND_ALL environment variable meant to be used
from tests. When set to true, this overrides uploadpack.allowsidebandall
to true, allowing the entire test suite to be run as if this
configuration is in place for all repositories.
As of this patch, all tests pass whether GIT_TEST_SIDEBAND_ALL is unset
or set to 1.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a response to a fetch request has sideband support only while
the packfile is being sent, meaning that the server cannot send notices
until the start of the packfile.
Extend sideband support in protocol v2 fetch responses to the whole
response. upload-pack will advertise it if the
uploadpack.allowsidebandall configuration variable is set, and
fetch-pack will automatically request it if advertised.
If the sideband is to be used throughout the whole response, upload-pack
will use it to send errors instead of prefixing a PKT-LINE payload with
"ERR ".
This will be tested in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When communicating with a remote server or a subprocess, use
expanded numbers rather than numbers with scaling suffix in the
object filter spec (e.g. "limit:blob=1k" becomes
"limit:blob=1024").
Update the protocol docs to note that clients should always perform this
expansion, to allow for more compatibility between server
implementations.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 685fbd3291 ("fetch-pack: perform a fetch using v2", 2018-03-15)
attempted to teach Git deepen-relative in protocol v2 (among other
things), but it didn't work:
(1) fetch-pack.c needs to emit "deepen-relative".
(2) upload-pack.c needs to ensure that the correct deepen_relative
variable is passed to deepen() (there are two - the static variable
and the one in struct upload_pack_data).
(3) Before deepen() computes the list of reachable shallows, it first
needs to mark all "our" refs as OUR_REF. v2 currently does not do
this, because unlike v0, it is not needed otherwise.
Fix all this and include a test demonstrating that it works now. For
(2), the static variable deepen_relative is also eliminated, removing a
source of confusion.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.
This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).
To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.
A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up with optimization for the codepath that checks
(non-)existence of loose objects.
* jk/loose-object-cache:
odb_load_loose_cache: fix strbuf leak
fetch-pack: drop custom loose object cache
sha1-file: use loose object cache for quick existence check
object-store: provide helpers for loose_objects_cache
sha1-file: use an object_directory for the main object dir
handle alternates paths the same as the main object dir
sha1_file_name(): overwrite buffer instead of appending
rename "alternate_object_database" to "object_directory"
submodule--helper: prefer strip_suffix() to ends_with()
fsck: do not reuse child_process structs
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.
Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.
Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By using and sharing a packet_reader while handling a Git pack protocol
request, the same reader option is used throughout the code. This makes
it easy to set a reader option to the request parsing code.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
loose objects we don't have.
Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
custom solution.
Note that this might perform slightly differently, as the original code
stopped calling readdir() when we saw more loose objects than there were
refs. So:
1. The old code might have spent work on readdir() to fill the cache,
but then decided there were too many loose objects, wasting that
effort.
2. The new code might spend a lot of time on readdir() if you have a
lot of loose objects, even though there are very few objects to
ask about.
In practice it probably won't matter either way; see the previous commit
for some discussion of the tradeoff.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each section in a protocol v2 response is followed by either a DELIM
packet (indicating more sections to follow) or a FLUSH packet
(indicating none to follow). But when parsing the "acknowledgments"
section, do_fetch_pack_v2() is liberal in accepting both, but determines
whether to continue reading or not based solely on the contents of the
"acknowledgments" section, not on whether DELIM or FLUSH was read.
There is no issue with a protocol-compliant server, but can result in
confusing error messages when communicating with a server that
serves unexpected additional sections. Consider a server that sends
"new-section" after "acknowledgments":
- client writes request
- client reads the "acknowledgments" section which contains no "ready",
then DELIM
- since there was no "ready", client needs to continue negotiation, and
writes request
- client reads "new-section", and reports to the end user "expected
'acknowledgments', received 'new-section'"
For the person debugging the involved Git implementation(s), the error
message is confusing in that "new-section" was not received in response
to the latest request, but to the first one.
One solution is to always continue reading after DELIM, but in this
case, we can do better. We know from the protocol that "ready" means at
least the packfile section is coming (hence, DELIM) and that no "ready"
means that no sections are to follow (hence, FLUSH). So teach
process_acks() to enforce this.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing into a repository that borrows its objects from an
alternate object store, "git receive-pack" that responds to the
push request on the other side lists the tips of refs in the
alternate to reduce the amount of objects transferred. This
sometimes is detrimental when the number of refs in the alternate
is absurdly large, in which case the bandwidth saved in potentially
fewer objects transferred is wasted in excessively large ref
advertisement. The alternate refs that are advertised are now
configurable with a pair of configuration variables.
* tb/filter-alternate-refs:
transport.c: introduce core.alternateRefsPrefixes
transport.c: introduce core.alternateRefsCommand
transport.c: extract 'fill_alternate_refs_command'
transport: drop refnames from for_each_alternate_ref
Over some transports, fetching objects with an exact commit object
name can be done without first seeing the ref advertisements. The
code has been optimized to exploit this.
* jt/avoid-ls-refs:
fetch: do not list refs if fetching only hashes
transport: list refs before fetch if necessary
transport: do not list refs if possible
transport: allow skipping of ref listing
A partial clone that is configured to lazily fetch missing objects
will on-demand issue a "git fetch" request to the originating
repository to fill not-yet-obtained objects. The request has been
optimized for requesting a tree object (and not the leaf blob
objects contained in it) by telling the originating repository that
no blobs are needed.
* jt/non-blob-lazy-fetch:
fetch-pack: exclude blobs when lazy-fetching trees
fetch-pack: avoid object flags if no_dependents
None of the current callers use the refname parameter we pass to their
callbacks. In theory somebody _could_ do so, but it's actually quite
weird if you think about it: it's a ref in somebody else's repository.
So the name has no meaning locally, and in fact there may be duplicates
if there are multiple alternates.
The users of this interface really only care about seeing some ref tips,
since that promises that the alternate has the full commit graph
reachable from there. So let's keep the information we pass back to the
bare minimum.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When all refs to be fetched are exact OIDs, it is possible to perform a
fetch without requiring the remote to list refs if protocol v2 is used.
Teach Git to do this.
This currently has an effect only for lazy fetches done from partial
clones. The change necessary to likewise optimize "git fetch <remote>
<sha-1>" will be done in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tip_oids_contain() lazily loads refs into an oidset at its first call.
It abuses the internal (sub)member .map.tablesize of that oidset to
check if it has done that already.
Determine if the oidset needs to be populated upfront and then do that
instead. This duplicates a loop, but simplifies the existing one by
separating concerns between the two.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code to determine if a request is unmatched to its own little
helper. This allows us to reuse it in a subsequent patch.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A partial clone with missing trees can be obtained using "git clone
--filter=tree:none <repo>". In such a repository, when a tree needs to
be lazily fetched, any tree or blob it directly or indirectly references
is fetched as well, regardless of whether the original command required
those objects, or if the local repository already had some of them.
This is because the fetch protocol, which the lazy fetch uses, does not
allow clients to request that only the wanted objects be sent, which
would be the ideal solution. This patch implements a partial solution:
specify the "blob:none" filter, somewhat reducing the fetch payload.
This change has no effect when lazily fetching blobs (due to how filters
work). And if lazily fetching a commit (such repositories are difficult
to construct and is not a use case we support very well, but it is
possible), referenced commits and trees are still fetched - only the
blobs are not fetched.
The necessary code change is done in fetch_pack() instead of somewhere
closer to where the "filter" instruction is written to the wire so that
only one part of the code needs to be changed in order for users of all
protocol versions to benefit from this optimization.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.
The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.
Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.
There are other possible solutions to this issue:
(1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
part, and whenever no_dependents is set, only use the
non-flag-using part.
(2) Make fetch_pack() be able to be used with arbitrary repository
objects. fetch_pack() should then create its own repository object
based on the given repository object, with its own object
hashtable, so that the flags do not conflict.
(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>