When "fetch --depth=N" where N exceeds the longest chain of history in
the source repo, usually we just send an "unshallow" line to the
client so full history is obtained.
When the source repo is shallow we need to make sure to "unshallow"
the current shallow point _and_ "shallow" again when the commit
reaches its shallow bottom in the source repo.
This should fix both cases: large <N> and --unshallow.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If either receive-pack or upload-pack is called on a shallow
repository, shallow commits (*) will be sent after the ref
advertisement (but before the packet flush), so that the receiver has
the full "shape" of the sender's commit graph. This will be needed for
the receiver to update its .git/shallow if necessary.
This breaks the protocol for all clients trying to push to a shallow
repo, or fetch from one. Which is basically the same end result as
today's "is_repository_shallow() && die()" in receive-pack and
upload-pack. New clients will be made aware of shallow upstream and
can make use of this information.
The sender must send all shallow commits that are sent in the
following pack. It may send more shallow commits than necessary.
upload-pack for example may choose to advertise no shallow commits if
it knows in advance that the pack it's going to send contains no
shallow commits. But upload-pack is the server, so we choose the
cheaper way, send full .git/shallow and let the client deal with it.
Smart HTTP is not affected by this patch. Shallow support on
smart-http comes later separately.
(*) A shallow commit is a commit that terminates the revision
walker. It is usually put in .git/shallow in order to keep the
revision walker from going out of bound because there is no
guarantee that objects behind this commit is available.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for recent regression while talking to upload-pack
in a repository with many symbolic refs.
* maint:
Revert "upload-pack: send non-HEAD symbolic refs"
This reverts commit 5e7dcad771cb873e278a0571b46910d7c32e2f6c; there
may be unbounded number of symbolic refs in the repository, but the
capability header line in the on-wire protocol has a rather low
length limit.
One long-standing flaw in the pack transfer protocol used by "git
clone" was that there was no way to tell the other end which branch
"HEAD" points at, and the receiving end needed to guess. A new
capability has been defined in the pack protocol to convey this
information so that cloning from a repository with more than one
branches pointing at the same commit where the HEAD is at now
reliably sets the initial branch in the resulting repository.
* jc/upload-pack-send-symref:
t5570: Update for clone-progress-to-stderr branch
t5570: Update for symref capability
clone: test the new HEAD detection logic
connect: annotate refs with their symref information in get_remote_head()
connect.c: make parse_feature_value() static
upload-pack: send non-HEAD symbolic refs
upload-pack: send symbolic ref information as capability
upload-pack.c: do not pass confusing cb_data to mark_our_ref()
t5505: fix "set-head --auto with ambiguous HEAD" test
One long-standing flaw in the pack transfer protocol used by "git
clone" was that there was no way to tell the other end which branch
"HEAD" points at, and the receiving end needed to guess. A new
capability has been defined in the pack protocol to convey this
information so that cloning from a repository with more than one
branches pointing at the same commit where the HEAD is at now
reliably sets the initial branch in the resulting repository.
* jc/upload-pack-send-symref:
t5570: Update for clone-progress-to-stderr branch
t5570: Update for symref capability
clone: test the new HEAD detection logic
connect: annotate refs with their symref information in get_remote_head()
connect.c: make parse_feature_value() static
upload-pack: send non-HEAD symbolic refs
upload-pack: send symbolic ref information as capability
upload-pack.c: do not pass confusing cb_data to mark_our_ref()
t5505: fix "set-head --auto with ambiguous HEAD" test
When there is no sufficient overlap between old and new history
during a "git fetch" into a shallow repository, objects that the
sending side knows the receiving end has were unnecessarily sent.
* nd/fetch-into-shallow:
Add testcase for needless objects during a shallow fetch
list-objects: mark more commits as edges in mark_edges_uninteresting
list-objects: reduce one argument in mark_edges_uninteresting
upload-pack: delegate rev walking in shallow fetch to pack-objects
shallow: add setup_temporary_shallow()
shallow: only add shallow graft points to new shallow file
move setup_alternate_shallow and write_shallow_commits to shallow.c
When running "fetch -q", a long silence while the sender side
computes the set of objects to send can be mistaken by proxies as
dropped connection. The server side has been taught to send a small
empty messages to keep the connection alive.
* jk/upload-pack-keepalive:
upload-pack: bump keepalive default to 5 seconds
upload-pack: send keepalive packets during pack computation
When there is no sufficient overlap between old and new history
during a fetch into a shallow repository, we unnecessarily sent
objects the sending side knows the receiving end has.
* nd/fetch-into-shallow:
Add testcase for needless objects during a shallow fetch
list-objects: mark more commits as edges in mark_edges_uninteresting
list-objects: reduce one argument in mark_edges_uninteresting
upload-pack: delegate rev walking in shallow fetch to pack-objects
shallow: add setup_temporary_shallow()
shallow: only add shallow graft points to new shallow file
move setup_alternate_shallow and write_shallow_commits to shallow.c
With the same mechanism as used to tell where "HEAD" points at to
the other end, we can tell the target of other symbolic refs as
well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One long-standing flaw in the pack transfer protocol was that there
was no way to tell the other end which branch "HEAD" points at.
With a capability "symref=HEAD:refs/heads/master", let the sender to
tell the receiver what symbolic ref points at what ref.
This capability can be repeated more than once to represent symbolic
refs other than HEAD, such as "refs/remotes/origin/HEAD").
Add an infrastructure to collect symbolic refs, format them as extra
capabilities and put it on the wire. For now, just send information
on the "HEAD" and nothing else.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callee does not use cb_data, and the caller is an intermediate
function in a callchain that later wants to use the cb_data for its
own use. Clarify the code by breaking the dataflow explicitly by
not passing cb_data down to mark_our_ref().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason not to turn on keepalives by default.
They take very little bandwidth, and significantly less than
the progress reporting they are replacing. And in the case
that progress reporting is on, we should never need to send
a keepalive anyway, as we will constantly be showing
progress and resetting the keepalive timer.
We do not necessarily know what the client's idea of a
reasonable timeout is, so let's keep this on the low side of
5 seconds. That is high enough that we will always prefer
our normal 1-second progress reports to sending a keepalive
packet, but low enough that no sane client should consider
the connection hung.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack has started pack-objects, there may be a quiet
period while pack-objects prepares the pack (i.e., counting objects
and delta compression). Normally we would see (and send to the
client) progress information, but if "--quiet" is in effect,
pack-objects will produce nothing at all until the pack data is
ready. On a large repository, this can take tens of seconds (or even
minutes if the system is loaded or the repository is badly packed).
Clients or intermediate proxies can sometimes give up in this
situation, assuming that the server or connection has hung.
This patch introduces a "keepalive" option; if upload-pack sees no
data from pack-objects for a certain number of seconds, it will send
an empty sideband data packet to let the other side know that we are
still working on it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack has a special revision walking code for shallow
recipients. It works almost like the similar code in pack-objects
except:
1. in upload-pack, graft points could be added for deepening;
2. also when the repository is deepened, the shallow point will be
moved further away from the tip, but the old shallow point will be
marked as edge to produce more efficient packs. See 6523078 (make
shallow repository deepening more network efficient - 2009-09-03).
Pass the file to pack-objects via --shallow-file. This will override
$GIT_DIR/shallow and give pack-objects the exact repository shape
that upload-pack has.
mark edge commits by revision command arguments. Even if old shallow
points are passed as "--not" revisions as in this patch, they will not
be picked up by mark_edges_uninteresting() because this function looks
up to parents for edges, while in this case the edge is the children,
in the opposite direction. This will be fixed in an later patch when
all given uninteresting commits are marked as edges.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me. This structure is not
about the local ref used by sha1-name API to name local objects.
It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values. It belongs to "remote.h" together
with "struct refspec".
While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the client sends a 'shallow' line for an object that the server does
not have, the server currently dies with the error: "did not find object
for shallow <obj-id>". The client may have truncated the history at
the commit by fetching shallowly from a different server, or the commit
may have been garbage collected by the server. In either case, this
unknown commit is not relevant for calculating the pack that is to be
sent and can be safely ignored, and it is not used when recomputing where
the updated history of the client is cauterised.
The documentation in technical/pack-protocol.txt has been updated to
remove the restriction that "Clients MUST NOT mention an obj-id which it
does not know exists on the server". This requirement is not realistic
because clients cannot know whether an object has been garbage collected
by the server.
Signed-off-by: Michael Heemskerk <mheemskerk@atlassian.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up pkt-line API, implementation and its callers to make them
more robust.
* jk/pkt-line-cleanup:
do not use GIT_TRACE_PACKET=3 in tests
remote-curl: always parse incoming refs
remote-curl: move ref-parsing code up in file
remote-curl: pass buffer straight to get_remote_heads
teach get_remote_heads to read from a memory buffer
pkt-line: share buffer/descriptor reading implementation
pkt-line: provide a LARGE_PACKET_MAX static buffer
pkt-line: move LARGE_PACKET_MAX definition from sideband
pkt-line: teach packet_read_line to chomp newlines
pkt-line: provide a generic reading function with options
pkt-line: drop safe_write function
pkt-line: move a misplaced comment
write_or_die: raise SIGPIPE when we get EPIPE
upload-archive: use argv_array to store client arguments
upload-archive: do not copy repo name
send-pack: prefer prefixcmp over memcmp in receive_status
fetch-pack: fix out-of-bounds buffer offset in get_ack
upload-pack: remove packet debugging harness
upload-pack: do not add duplicate objects to shallow list
upload-pack: use get_sha1_hex to parse "shallow" lines
Recent optimization broke shallow clones.
* jk/peel-ref:
upload-pack: load non-tip "want" objects from disk
upload-pack: make sure "want" objects are parsed
upload-pack: drop lookup-before-parse optimization
Allows requests to fetch objects at any tip of refs (including
hidden ones). It seems that there may be use cases even outside
Gerrit (e.g. $gmane/215701).
* jc/fetch-raw-sha1:
fetch: fetch objects by their exact SHA-1 object names
upload-pack: optionally allow fetching from the tips of hidden refs
fetch: use struct ref to represent refs to be fetched
parse_fetch_refspec(): clarify the codeflow a bit
It is a long-time security feature that upload-pack will not
serve any "want" lines that do not correspond to the tip of
one of our refs. Traditionally, this was enforced by
checking the objects in the in-memory hash; they should have
been loaded and received the OUR_REF flag during the
advertisement.
The stateless-rpc mode, however, has a race condition here:
one process advertises, and another receives the want lines,
so the refs may have changed in the interim. To address
this, commit 051e400 added a new verification mode; if the
object is not OUR_REF, we set a "has_non_tip" flag, and then
later verify that the requested objects are reachable from
our current tips.
However, we still die immediately when the object is not in
our in-memory hash, and at this point we should only have
loaded our tip objects. So the check_non_tip code path does
not ever actually trigger, as any non-tip objects would
have already caused us to die.
We can fix that by using parse_object instead of
lookup_object, which will load the object from disk if it
has not already been loaded.
We still need to check that parse_object does not return
NULL, though, as it is possible we do not have the object
at all. A more appropriate error message would be "no such
object" rather than "not our ref"; however, we do not want
to leak information about what objects are or are not in
the object database, so we continue to use the same "not
our ref" message that would be produced by an unreachable
object.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack receives a "want" line from the client, it
adds it to an object array. We call lookup_object to find
the actual object, which will only check for objects already
in memory. This works because we are expecting to find
objects that we already loaded during the ref advertisement.
We use the resulting object structs for a variety of
purposes. Some of them care only about the object flags, but
others care about the type of the object (e.g.,
ok_to_give_up), or even feed them to the revision parser
(when --depth is used), which assumes that objects it
receives are fully parsed.
Once upon a time, this was OK; any object we loaded into
memory would also have been parsed. But since 435c833
(upload-pack: use peel_ref for ref advertisements,
2012-10-04), we try to avoid parsing objects during the ref
advertisement. This means that lookup_object may return an
object with a type of OBJ_NONE. The resulting mess depends
on the exact set of objects, but can include the revision
parser barfing, or the shallow code sending the wrong set of
objects.
This patch teaches upload-pack to parse each "want" object
as we receive it. We do not replace the lookup_object call
with parse_object, as the current code is careful not to let
just any object appear on a "want" line, but rather only one
we have previously advertised (whereas parse_object would
actually load any arbitrary object from disk).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we receive a "have" line from the client, we want to
load the object pointed to by the sha1. However, we are
careful to do:
o = lookup_object(sha1);
if (!o || !o->parsed)
o = parse_object(sha1);
to avoid loading the object from disk if we have already
seen it. However, since ccdc603 (parse_object: try internal
cache before reading object db), parse_object already does
this optimization internally. We can just call parse_object
directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just write_or_die by another name. The one
distinction is that write_or_die will treat EPIPE specially
by suppressing error messages. That's fine, as we die by
SIGPIPE anyway (and in the off chance that it is disabled,
write_or_die will simulate it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you set the GIT_DEBUG_SEND_PACK environment variable,
upload-pack will dump lines it receives in the receive_needs
phase to a descriptor. This debugging harness is a strict
subset of what GIT_TRACE_PACKET can do. Let's just drop it
in favor of that.
A few tests used GIT_DEBUG_SEND_PACK to confirm which
objects get sent; we have to adapt them to the new output
format.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the client tells us it has a shallow object via
"shallow <sha1>", we make sure we have the object, mark it
with a flag, then add it to a dynamic array of shallow
objects. This means that a client can get us to allocate
arbitrary amounts of memory just by flooding us with shallow
lines (whether they have the objects or not). You can
demonstrate it easily with:
yes '0035shallow e83c5163316f89bfbde7d9ab23ca2e25604af290' |
git-upload-pack git.git
We already protect against duplicates in want lines by
checking if our flag is already set; let's do the same thing
here. Note that a client can still get us to allocate some
amount of memory by marking every object in the repo as
"shallow" (or "want"). But this at least bounds it with the
number of objects in the repository, which is not under the
control of an upload-pack client.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we receive a line like "shallow <sha1>" from the
client, we feed the <sha1> part to get_sha1. This is a
mistake, as the argument on a shallow line is defined by
Documentation/technical/pack-protocol.txt to contain an
"obj-id". This is never defined in the BNF, but it is clear
from the text and from the other uses that it is meant to be
a hex sha1, not an arbitrary identifier (and that is what
fetch-pack has always sent).
We should be using get_sha1_hex instead, which doesn't allow
the client to request arbitrary junk like "HEAD@{yesterday}".
Because this is just marking shallow objects, the client
couldn't actually do anything interesting (like fetching
objects from unreachable reflog entries), but we should keep
our parsing tight to be on the safe side.
Because get_sha1 is for the most part a superset of
get_sha1_hex, in theory the only behavior change should be
disallowing non-hex object references. However, there is
one interesting exception: get_sha1 will only parse
a 40-character hex sha1 if the string has exactly 40
characters, whereas get_sha1_hex will just eat the first 40
characters, leaving the rest. That means that current
versions of git-upload-pack will not accept a "shallow"
packet that has a trailing newline, even though the protocol
documentation is clear that newlines are allowed (even
encouraged) in non-binary parts of the protocol.
This never mattered in practice, though, because fetch-pack,
contrary to the protocol documentation, does not include a
newline in its shallow lines. JGit follows its lead (though
it correctly is strict on the parsing end about wanting a
hex object id).
We do not adjust fetch-pack to send newlines here, as it
would break communication with older versions of git (and
there is no actual benefit to doing so, except for
consistency with other parts of the protocol).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the server side to redact the refs/ namespace it shows to the
client.
Will merge to 'master'.
* jc/hidden-refs:
upload/receive-pack: allow hiding ref hierarchies
upload-pack: simplify request validation
upload-pack: share more code
With uploadpack.allowtipsha1inwant configuration option set, future
versions of "git fetch" that allow an exact object name (likely to
have been obtained out of band) on the LHS of the fetch refspec can
make a request with a "want" line that names an object that may not
have been advertised due to transfer.hiderefs configuration.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch --depth" was broken in at least three ways. The
resulting history was deeper than specified by one commit, it was
unclear how to wipe the shallowness of the repository with the
command, and documentation was misleading.
* nd/fetch-depth-is-broken:
fetch: elaborate --depth action
upload-pack: fix off-by-one depth calculation in shallow clone
fetch: add --unshallow for turning shallow repo into complete one
Long time ago, we used to punt on a large (read: asking for more
than 256 refs) fetch request and instead sent a full pack, because
we couldn't fit many refs on the command line of rev-list we run
internally to enumerate the objects to be sent. To fix this,
565ebbf (upload-pack: tighten request validation., 2005-10-24),
added a check to count the number of refs in the request and matched
with the number of refs we advertised, and changed the invocation of
rev-list to pass "--all" to it, still keeping us under the command
line argument limit.
However, these days we feed the list of objects requested and the
list of objects the other end is known to have via standard input,
so there is no longer a valid reason to special case a full clone
request. Remove the code associated with "create_full_pack" to
simplify the logic.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We mark the objects pointed at our refs with "OUR_REF" flag in two
functions (mark_our_ref() and send_ref()), but we can just use the
former as a helper for the latter.
Update the way mark_our_ref() prepares in-core object to use
lookup_unknown_object() to delay reading the actual object data,
just like we did in 435c833 (upload-pack: use peel_ref for ref
advertisements, 2012-10-04).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A minor consistency check patch that does not have much relevance
to the real world.
* nd/upload-pack-shallow-must-be-commit:
upload-pack: only accept commits from "shallow" line
The user can do --depth=2147483647 (*) for restoring complete repo
now. But it's hard to remember. Any other numbers larger than the
longest commit chain in the repository would also do, but some
guessing may be involved. Make easy-to-remember --unshallow an alias
for --depth=2147483647.
Make upload-pack recognize this special number as infinite depth. The
effect is essentially the same as before, except that upload-pack is
more efficient because it does not have to traverse to the bottom
anymore.
The chance of a user actually wanting exactly 2147483647 commits
depth, not infinite, on a repository with a history that long, is
probably too small to consider. The client can learn to add or
subtract one commit to avoid the special treatment when that actually
happens.
(*) This is the largest positive number a 32-bit signed integer can
contain. JGit and older C Git store depth as "int" so both are OK
with this number. Dulwich does not support shallow clone.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only allow cuts at commits, not arbitrary objects. upload-pack will
fail eventually in register_shallow if a non-commit is given with a
generic error "Object %s is a %s, not a commit". Check it early and
give a more accurate error.
This should never show up in an ordinary session. It's for buggy
clients, or when the user manually edits .git/shallow.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack advertises refs, we attempt to peel tags
and advertise the peeled version. We currently hand-roll the
tag dereferencing, and use as many optimizations as we can
to avoid loading non-tag objects into memory.
Not only has peel_ref recently learned these optimizations,
too, but it also contains an even more important one: it
has access to the "peeled" data from the pack-refs file.
That means we can avoid not only loading annotated tags
entirely, but also avoid doing any kind of object lookup at
all.
This cut the CPU time to advertise refs by 50% in the
linux-2.6 repo, as measured by:
echo 0000 | git-upload-pack . >/dev/null
best-of-five, warm cache, objects and refs fully packed:
[before] [after]
real 0m0.026s real 0m0.013s
user 0m0.024s user 0m0.008s
sys 0m0.000s sys 0m0.000s
Those numbers are irrelevantly small compared to an actual
fetch. Here's a larger repo (400K refs, of which 12K are
unique, and of which only 107 are unique annotated tags):
[before] [after]
real 0m0.704s real 0m0.596s
user 0m0.600s user 0m0.496s
sys 0m0.096s sys 0m0.092s
This shows only a 15% speedup (mostly because it has fewer
actual tags to parse), but a larger absolute value (100ms,
which isn't a lot compared to a real fetch, but this
advertisement happens on every fetch, even if the client is
just finding out they are completely up to date).
In truly pathological cases, where you have a large number
of unique annotated tags, it can make an even bigger
difference. Here are the numbers for a linux-2.6 repository
that has had every seventh commit tagged (so about 50K
tags):
[before] [after]
real 0m0.443s real 0m0.097s
user 0m0.416s user 0m0.080s
sys 0m0.024s sys 0m0.012s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of having the client advertise a particular version
number in the git protocol, we have managed extensions and
backwards compatibility by having clients and servers
advertise capabilities that they support. This is far more
robust than having each side consult a table of
known versions, and provides sufficient information for the
protocol interaction to complete.
However, it does not allow servers to keep statistics on
which client versions are being used. This information is
not necessary to complete the network request (the
capabilities provide enough information for that), but it
may be helpful to conduct a general survey of client
versions in use.
We already send the client version in the user-agent header
for http requests; adding it here allows us to gather
similar statistics for non-http requests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have been carefully choosing feature names used in the protocol
extensions so that the vocabulary does not contain a word that is a
substring of another word, so it is not a real problem, but we have
recently added "quiet" feature word, which would mean we cannot later
add some other word with "quiet" (e.g. "quiet-push"), which is awkward.
Let's make sure that we can eventually be able to do so by teaching the
clients and servers that feature words consist of non whitespace
letters. This parser also allows us to later add features with parameters
e.g. "feature=1.5" (parameter values need to be quoted for whitespaces,
but we will worry about the detauls when we do introduce them).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack advertises refs, it dereferences any tags
it sees, and shows the resulting sha1 to the client. It does
this by calling deref_tag. That function must load and parse
each tag object to find the sha1 of the tagged object.
However, it also ends up parsing the tagged object itself,
which is not strictly necessary for upload-pack's use.
Each tag produces two object loads (assuming it is not a
recursive tag), when it could get away with only a single
one. Dropping the second load halves the effort we spend.
The downside is that we are no longer verifying the
resulting object by loading it. In particular:
1. We never cross-check the "type" field given in the tag
object with the type of the pointed-to object. If the
tag says it points to a tag but doesn't, then we will
keep peeling and realize the error. If the tag says it
points to a non-tag but actually points to a tag, we
will stop peeling and just advertise the pointed-to
tag.
2. If we are missing the pointed-to object, we will not
realize (because we never even look it up in the object
db).
However, both of these are errors in the object database,
and both will be detected if a client actually requests the
broken objects in question. So we are simply pushing the
verification away from the advertising stage, and down to
the actual fetching stage.
On my test repo with 120K refs, this drops the time to
advertise the refs from ~3.2s to ~2.0s.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we advertise a ref, the first thing we do is parse the
pointed-to object. This gives us two things:
1. a "struct object" we can use to store flags
2. the type of the object, so we know whether we need to
dereference it as a tag
Instead, we can just use lookup_unknown_object to get an
object struct, and then fill in just the type field using
sha1_object_info (which, in the case of packed files, can
find the information without actually inflating the object
data).
This can save time if you have a large number of refs, and
the client isn't actually going to request those refs (e.g.,
because most of them are already up-to-date).
The downside is that we are no longer verifying objects that
we advertise by fully parsing them (however, we do still
know we actually have them, because sha1_object_info must
find them to get the type). While we might fail to detect a
corrupt object here, if the client actually fetches the
object, we will parse (and verify) it then.
On a repository with 120K refs, the advertisement portion of
upload-pack goes from ~3.4s to 3.2s (the failure to speed up
more is largely due to the fact that most of these refs are
tags, which need dereferenced to find the tag destination
anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fetch-verify:
fetch: verify we have everything we need before updating our ref
rev-list --verify-object
list-objects: pass callback data to show_objects()
The traverse_commit_list() API takes two callback functions, one to show
commit objects, and the other to show other kinds of objects. Even though
the former has a callback data parameter, so that the callback does not
have to rely on global state, the latter does not.
Give the show_objects() callback the same callback data parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>