rev-list: disable commit graph with --verify-objects

Since the point of --verify-objects is to actually load and checksum the
bytes of each object, optimizing out reads using the commit graph runs
contrary to our goal.

The most targeted way to implement this would be for the revision
traversal code to check revs->verify_objects and avoid using the commit
graph. But it's difficult to be sure we've hit all of the correct spots.
For instance, I started this patch by writing the first of the included
test cases, where the corrupted commit is directly on rev-list's command
line. And that is easy to fix by teaching get_reference() to check
revs->verify_objects before calling lookup_commit_in_graph().

But that doesn't cover the second test case: when we traverse to a
corrupted commit, we'd parse the parent in process_parents(). So we'd
need to check there, too. And it keeps going. In handle_commit() we
sometimes parses commits, too, though I couldn't figure out a way to
trigger it that did not already parse via get_reference() or tag
peeling. And try_to_simplify_commit() has its own parse call, and so on.

So it seems like the safest thing is to just disable the commit graph
for the whole process when we see the --verify-objects option. We can do
that either in builtin/rev-list.c, where we use the option, or in
revision.c, where we parse it. There are some subtleties:

  - putting it in rev-list.c is less surprising in some ways, because
    there we know we are just doing a single traversal. In a command
    which does multiple traversals in a single process, it's rather
    unexpected to globally disable the commit graph.

  - putting it in revision.c is less surprising in some ways, because
    the caller does not have to remember to disable the graph
    themselves. But this is already tricky! The verify_objects flag in
    rev_info doesn't do anything by itself. The caller has to provide an
    object callback which does the right thing.

  - for that reason, in practice nobody but rev-list uses this option in
    the first place. So the distinction is probably not important either
    way. Arguably it should just be an option of rev-list, and not the
    general revision machinery; right now you can run "git log
    --verify-objects", but it does not actually do anything useful.

  - checking for a parsed revs.verify_objects flag in rev-list.c is too
    late. By that time we've already passed the arguments to
    setup_revisions(), which will have parsed the commits using the
    graph.

So this commit disables the graph as soon as we see the option in
revision.c. That's a pretty broad hammer, but it does what we want, and
in practice nobody but rev-list is using this flag anyway.

The tests cover both the "tip" and "parent" cases. Obviously our hammer
hits them both in this case, but it's good to check both in case
somebody later tries the more focused approach.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Jeff King 2022-09-06 17:04:35 -04:00 коммит произвёл Junio C Hamano
Родитель d6045294a9
Коммит b27ccae34b
2 изменённых файлов: 29 добавлений и 0 удалений

Просмотреть файл

@ -2398,6 +2398,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
revs->tree_objects = 1;
revs->blob_objects = 1;
revs->verify_objects = 1;
disable_commit_graph(revs->repo);
} else if (!strcmp(arg, "--unpacked")) {
revs->unpacked = 1;
} else if (starts_with(arg, "--unpacked=")) {

Просмотреть файл

@ -507,6 +507,34 @@ test_expect_success 'rev-list --verify-objects with bad sha1' '
test_i18ngrep -q "error: hash mismatch $(dirname $new)$(test_oid ff_2)" out
'
test_expect_success 'set up repository with commit-graph' '
git init corrupt-graph &&
(
cd corrupt-graph &&
test_commit one &&
test_commit two &&
git commit-graph write --reachable
)
'
corrupt_graph_obj () {
oid=$(git -C corrupt-graph rev-parse "$1") &&
obj=corrupt-graph/.git/objects/$(test_oid_to_path $oid) &&
test_when_finished 'mv backup $obj' &&
mv $obj backup &&
echo garbage >$obj
}
test_expect_success 'rev-list --verify-objects with commit graph (tip)' '
corrupt_graph_obj HEAD &&
test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
'
test_expect_success 'rev-list --verify-objects with commit graph (parent)' '
corrupt_graph_obj HEAD^ &&
test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
'
test_expect_success 'force fsck to ignore double author' '
git cat-file commit HEAD >basis &&
sed "s/^author .*/&,&/" <basis | tr , \\n >multiple-authors &&