A recent update broke "is this object available to us?" check for
well-known objects like an empty tree (which should yield "yes",
even when there is no on-disk object for an empty tree), which has
been corrected.
* jk/virtual-objects-do-exist:
rev-list: allow cached objects in existence check
This fixes a regression in 7c0fe330d5 (rev-list: handle missing tree
objects properly, 2018-10-05) where rev-list will now complain about the
empty tree when it doesn't physically exist on disk.
Before that commit, we relied on the traversal code in list-objects.c to
walk through the trees. Since it uses parse_tree(), we'd do a normal
object lookup that includes looking in the set of "cached" objects
(which is where our magic internal empty-tree kicks in).
After that commit, we instead tell list-objects.c not to die on any
missing trees, and we check them ourselves using has_object_file(). But
that function uses OBJECT_INFO_SKIP_CACHED, which means we won't use our
internal empty tree.
This normally wouldn't come up. For most operations, Git will try to
write out the empty tree object as it would any other object. And
pack-objects in a push or fetch will send the empty tree (even if it's
virtual on the sending side). However, there are cases where this can
matter. One I found in the wild:
1. The root tree of a commit became empty by deleting all files,
without using an index. In this case it was done using libgit2's
tree builder API, but as the included test shows, it can easily be
done with regular git using hash-object.
The resulting repo works OK, as we'd avoid walking over our own
reachable commits for a connectivity check.
2. Cloning with --reference pointing to the repository from (1) can
trigger the problem, because we tell the other side we already have
that commit (and hence the empty tree), but then walk over it
during the connectivity check (where we complain about it missing).
Arguably the workflow in step (1) should be more careful about writing
the empty tree object if we're referencing it. But this workflow did
work prior to 7c0fe330d5, so let's restore it.
This patch makes the minimal fix, which is to swap out a direct call to
oid_object_info_extended(), minus the SKIP_CACHED flag, instead of
calling has_object_file(). This is all that has_object_file() is doing
under the hood. And there's little danger of unrelated fallout from
other unexpected "cached" objects, since there's only one call site that
ends such a cached object, and it's in git-blame.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a test added in 51054177b3 ("index-pack: detect local
corruption in collision check", 2017-04-01) so that the repository
isn't left dirty at the end.
Due to the caveats explained in 720dae5a19 ("config doc: elaborate on
fetch.fsckObjects security", 2018-07-27) even a "fetch" that fails
will write to the local object store, so let's copy the bit-error test
directory before running this test.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().
If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
report that instead (just like we do if reading the rest of
the object content fails a few lines later).
Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).
Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When sha1_loose_object_info() finds that a loose object file
cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
error to the caller. However, if it found that the loose
object file is corrupt and the object data cannot be used
from it, it stuffs OBJ_BAD into "type" field of the
object_info, but returns zero (i.e., success), which can
confuse callers.
This is due to 052fe5eac (sha1_loose_object_info: make type
lookup optional, 2013-07-12), which switched the return to a
strict success/error, rather than returning the type (but
botched the return).
Callers of regular sha1_object_info() don't notice the
difference, as that function returns the type (which is
OBJ_BAD in this case). However, direct callers of
sha1_object_info_extended() see the function return success,
but without setting any meaningful values in the object_info
struct, leading them to access potentially uninitialized
memory.
The easiest way to see the bug is via "cat-file -s", which
will happily ignore the corruption and report whatever
value happened to be in the "size" variable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we manage to clone a remote repository but run into an
error in the checkout, it is probably sane to leave the repo
directory in place. That lets the user examine the situation
without spending time to re-clone from the remote (which may
be a lengthy process).
Rather than try to convert each die() from the checkout code
path into an error(), we simply set a flag that tells the
"remove_junk" atexit function to print a helpful message and
leave the repo in place.
Note that the test added in this patch actually passes
without the code change. The reason is that the cleanup code
is buggy; we chdir into the working tree for the checkout,
but still may use relative paths to remove the directories
(which means if you cloned into "foo", we would accidentally
remove "foo" from the working tree!). There's no point in
fixing it now, since this patch means we will never try to
remove anything after the chdir, anyway.
[jc: replaced the message with a more succinct version from
Jonathan]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fetch from a remote, we do a revision walk to make
sure that what we received is connected to our existing
history. We do not do the same check for clone, which should
be able to check that we received an intact history graph.
The upside of this patch is that it will make clone more
resilient against propagating repository corruption. The
downside is that we will now traverse "rev-list --objects
--all" down to the roots, which may take some time (it is
especially noticeable for a "--local --bare" clone).
Note that we need to adjust t5710, which tries to make such
a bogus clone. Rather than checking after the fact that our
clone is bogus, we can simplify it to just make sure "git
clone" reports failure.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When clone is populating the working tree, it ignores the
return status from unpack_trees; this means we may report a
successful clone, even when the checkout fails.
When checkout fails, we may want to leave the $GIT_DIR in
place, as it might be possible to recover the data through
further use of "git checkout" (e.g., if the checkout failed
due to a transient error, disk full, etc). However, we
already die on a number of other checkout-related errors, so
this patch follows that pattern.
In addition to marking a now-passing test, we need to adjust
t5710, which blindly assumed it could make bogus clones of
very deep alternates hierarchies. By using "--bare", we can
avoid it actually touching any objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We try not to let corruption pass unnoticed over fetches and
clones. For the most part, this works, but there are some
broken corner cases, including:
1. We do not detect missing objects over git-aware
transports. This is a little hard to test, because the
sending side will actually complain about the missing
object.
To fool it, we corrupt a repository such that we have a
"misnamed" object: it claims to be sha1 X, but is
really Y. This lets the sender blindly transmit it, but
it is the receiver's responsibility to verify that what
it got is sane (and it does not).
2. We do not detect missing or misnamed blobs during the
checkout phase of clone.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.
We can fix this by using bitwise-OR to accumulate errors in
our result variable.
While we're here, we can also simplify the error handling
with an early return, which makes it easier to see under
which circumstances we need to clean up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have many tests for handling corrupt objects. This
new test at least checks that we detect a byte error in a
corrupt blob object while streaming it out with cat-file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>