2018-04-02 23:34:20 +03:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='commit graph'
|
|
|
|
. ./test-lib.sh
|
2023-10-09 23:59:51 +03:00
|
|
|
. "$TEST_DIRECTORY"/lib-chunk.sh
|
2018-04-02 23:34:20 +03:00
|
|
|
|
2020-04-06 19:59:55 +03:00
|
|
|
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
|
|
|
|
|
2021-08-23 15:30:20 +03:00
|
|
|
test_expect_success 'usage' '
|
2021-08-23 15:30:21 +03:00
|
|
|
test_expect_code 129 git commit-graph write blah 2>err &&
|
2021-08-23 15:30:20 +03:00
|
|
|
test_expect_code 129 git commit-graph write verify
|
|
|
|
'
|
|
|
|
|
2021-08-23 15:30:21 +03:00
|
|
|
test_expect_success 'usage shown without sub-command' '
|
|
|
|
test_expect_code 129 git commit-graph 2>err &&
|
2022-08-19 19:04:02 +03:00
|
|
|
grep usage: err
|
2021-08-23 15:30:21 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'usage shown with an error on unknown sub-command' '
|
|
|
|
cat >expect <<-\EOF &&
|
2022-08-19 19:04:02 +03:00
|
|
|
error: unknown subcommand: `unknown'\''
|
2021-08-23 15:30:21 +03:00
|
|
|
EOF
|
|
|
|
test_expect_code 129 git commit-graph unknown 2>stderr &&
|
|
|
|
grep error stderr >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2023-07-24 19:39:28 +03:00
|
|
|
objdir=".git/objects"
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'setup full repo' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git init full
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2020-04-29 20:36:38 +03:00
|
|
|
test_expect_success POSIXPERM 'tweak umask for modebit tests' '
|
|
|
|
umask 022
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:32 +03:00
|
|
|
test_expect_success 'verify graph with no graph file' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph verify
|
2018-06-27 16:24:32 +03:00
|
|
|
'
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'write graph with no packs' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write --object-dir $objdir &&
|
|
|
|
test_path_is_missing full/$objdir/info/commit-graph
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2019-08-05 11:02:40 +03:00
|
|
|
test_expect_success 'exit with correct error on bad input to --stdin-packs' '
|
2019-06-12 16:29:37 +03:00
|
|
|
echo doesnotexist >in &&
|
2023-07-24 19:39:28 +03:00
|
|
|
test_expect_code 1 git -C full commit-graph write --stdin-packs \
|
|
|
|
<in 2>stderr &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "error adding pack" stderr
|
2019-06-12 16:29:37 +03:00
|
|
|
'
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'create commits and repack' '
|
|
|
|
for i in $(test_seq 3)
|
|
|
|
do
|
2023-07-24 19:39:28 +03:00
|
|
|
test_commit -C full $i &&
|
|
|
|
git -C full branch commits/$i || return 1
|
2018-04-02 23:34:20 +03:00
|
|
|
done &&
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full repack
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2022-03-01 22:48:29 +03:00
|
|
|
. "$TEST_DIRECTORY"/lib-commit-graph.sh
|
2018-04-10 15:56:05 +03:00
|
|
|
|
|
|
|
graph_git_behavior 'no graph' full commits/3 commits/1
|
|
|
|
|
2020-05-14 00:59:51 +03:00
|
|
|
test_expect_success 'exit with correct error on bad input to --stdin-commits' '
|
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.
This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around
this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-14 00:59:55 +03:00
|
|
|
# invalid, non-hex OID
|
2023-07-24 19:39:28 +03:00
|
|
|
echo HEAD | test_expect_code 1 git -C full commit-graph write \
|
|
|
|
--stdin-commits 2>stderr &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "unexpected non-hex object ID: HEAD" stderr &&
|
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.
This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around
this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-14 00:59:55 +03:00
|
|
|
# non-existent OID
|
2023-07-24 19:39:28 +03:00
|
|
|
echo $ZERO_OID | test_expect_code 1 git -C full commit-graph write \
|
|
|
|
--stdin-commits 2>stderr &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "invalid object" stderr &&
|
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.
This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around
this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-14 00:59:55 +03:00
|
|
|
# valid commit and tree OID
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full rev-parse HEAD HEAD^{tree} >in &&
|
|
|
|
git -C full commit-graph write --stdin-commits <in &&
|
|
|
|
graph_read_expect -C full 3 generation_data
|
2020-05-14 00:59:51 +03:00
|
|
|
'
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'write graph' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 3 generation_data
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2020-04-29 20:36:38 +03:00
|
|
|
test_expect_success POSIXPERM 'write graph has correct permissions' '
|
2023-07-24 19:39:28 +03:00
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
2020-04-29 20:36:38 +03:00
|
|
|
echo "-r--r--r--" >expect &&
|
2023-07-24 19:39:28 +03:00
|
|
|
test_modebits full/$objdir/info/commit-graph >actual &&
|
2020-04-29 20:36:38 +03:00
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'graph exists' full commits/3 commits/1
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'Add more commits' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full reset --hard commits/1 &&
|
2018-04-02 23:34:20 +03:00
|
|
|
for i in $(test_seq 4 5)
|
|
|
|
do
|
2023-07-24 19:39:28 +03:00
|
|
|
test_commit -C full $i &&
|
|
|
|
git -C full branch commits/$i || return 1
|
2018-04-02 23:34:20 +03:00
|
|
|
done &&
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full reset --hard commits/2 &&
|
2018-04-02 23:34:20 +03:00
|
|
|
for i in $(test_seq 6 7)
|
|
|
|
do
|
2023-07-24 19:39:28 +03:00
|
|
|
test_commit -C full $i &&
|
|
|
|
git -C full branch commits/$i || return 1
|
2018-04-02 23:34:20 +03:00
|
|
|
done &&
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full reset --hard commits/2 &&
|
|
|
|
git -C full merge commits/4 &&
|
|
|
|
git -C full branch merge/1 &&
|
|
|
|
git -C full reset --hard commits/4 &&
|
|
|
|
git -C full merge commits/6 &&
|
|
|
|
git -C full branch merge/2 &&
|
|
|
|
git -C full reset --hard commits/3 &&
|
|
|
|
git -C full merge commits/5 commits/7 &&
|
|
|
|
git -C full branch merge/3 &&
|
|
|
|
git -C full repack
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2019-08-26 19:29:58 +03:00
|
|
|
test_expect_success 'commit-graph write progress off for redirected stderr' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write 2>err &&
|
2020-06-04 01:21:07 +03:00
|
|
|
test_must_be_empty err
|
2019-08-26 19:29:58 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph write force progress on for stderr' '
|
2023-07-24 19:39:28 +03:00
|
|
|
GIT_PROGRESS_DELAY=0 git -C full commit-graph write --progress 2>err &&
|
2019-08-26 19:29:58 +03:00
|
|
|
test_file_not_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph write with the --no-progress option' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write --no-progress 2>err &&
|
2020-06-04 01:21:07 +03:00
|
|
|
test_must_be_empty err
|
2019-08-26 19:29:58 +03:00
|
|
|
'
|
|
|
|
|
2020-06-01 21:01:31 +03:00
|
|
|
test_expect_success 'commit-graph write --stdin-commits progress off for redirected stderr' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full rev-parse commits/5 >in &&
|
|
|
|
git -C full commit-graph write --stdin-commits <in 2>err &&
|
2020-06-01 21:01:31 +03:00
|
|
|
test_must_be_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph write --stdin-commits force progress on for stderr' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full rev-parse commits/5 >in &&
|
|
|
|
GIT_PROGRESS_DELAY=0 git -C full commit-graph write --stdin-commits \
|
|
|
|
--progress <in 2>err &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "Collecting commits from input" err
|
2020-06-01 21:01:31 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph write --stdin-commits with the --no-progress option' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full rev-parse commits/5 >in &&
|
|
|
|
git -C full commit-graph write --stdin-commits --no-progress <in 2>err &&
|
2020-06-01 21:01:31 +03:00
|
|
|
test_must_be_empty err
|
2019-08-26 19:29:58 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph verify progress off for redirected stderr' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph verify 2>err &&
|
2020-06-04 01:21:07 +03:00
|
|
|
test_must_be_empty err
|
2019-08-26 19:29:58 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph verify force progress on for stderr' '
|
2023-07-24 19:39:28 +03:00
|
|
|
GIT_PROGRESS_DELAY=0 git -C full commit-graph verify --progress 2>err &&
|
2019-08-26 19:29:58 +03:00
|
|
|
test_file_not_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'commit-graph verify with the --no-progress option' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph verify --no-progress 2>err &&
|
2020-06-04 01:21:07 +03:00
|
|
|
test_must_be_empty err
|
2019-08-26 19:29:58 +03:00
|
|
|
'
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
# Current graph structure:
|
|
|
|
#
|
|
|
|
# __M3___
|
|
|
|
# / | \
|
|
|
|
# 3 M1 5 M2 7
|
|
|
|
# |/ \|/ \|
|
|
|
|
# 2 4 6
|
|
|
|
# |___/____/
|
|
|
|
# 1
|
|
|
|
|
|
|
|
test_expect_success 'write graph with merges' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 10 "generation_data extra_edges"
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
|
|
|
|
graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
|
|
|
|
graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'Add one more commit' '
|
2023-07-24 19:39:28 +03:00
|
|
|
test_commit -C full 8 &&
|
|
|
|
git -C full branch commits/8 &&
|
|
|
|
ls full/$objdir/pack | grep idx >existing-idx &&
|
|
|
|
git -C full repack &&
|
|
|
|
ls full/$objdir/pack| grep idx | grep -v -f existing-idx >new-idx
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
# Current graph structure:
|
|
|
|
#
|
|
|
|
# 8
|
|
|
|
# |
|
|
|
|
# __M3___
|
|
|
|
# / | \
|
|
|
|
# 3 M1 5 M2 7
|
|
|
|
# |/ \|/ \|
|
|
|
|
# 2 4 6
|
|
|
|
# |___/____/
|
|
|
|
# 1
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'write graph with new commit' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 11 "generation_data extra_edges"
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'write graph with nothing new' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 11 "generation_data extra_edges"
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-04-10 15:56:06 +03:00
|
|
|
test_expect_success 'build graph from latest pack with closure' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write --stdin-packs <new-idx &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 9 "generation_data extra_edges"
|
2018-04-10 15:56:06 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-04-10 15:56:07 +03:00
|
|
|
test_expect_success 'build graph from commits with closure' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full tag -a -m "merge" tag/merge merge/2 &&
|
|
|
|
git -C full rev-parse tag/merge >commits-in &&
|
|
|
|
git -C full rev-parse merge/1 >>commits-in &&
|
|
|
|
git -C full commit-graph write --stdin-commits <commits-in &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 6 "generation_data"
|
2018-04-10 15:56:07 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-04-10 15:56:08 +03:00
|
|
|
test_expect_success 'build graph from commits with append' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full rev-parse merge/3 >in &&
|
|
|
|
git -C full commit-graph write --stdin-commits --append <in &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 10 "generation_data extra_edges"
|
2018-04-10 15:56:08 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-06-27 16:24:45 +03:00
|
|
|
test_expect_success 'build graph using --reachable' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
test_path_is_file full/$objdir/info/commit-graph &&
|
|
|
|
graph_read_expect -C full 11 "generation_data extra_edges"
|
2018-06-27 16:24:45 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
|
|
|
|
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'setup bare repo' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git clone --bare --no-local full bare
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
|
|
|
|
graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_expect_success 'write graph in bare repo' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C bare commit-graph write &&
|
|
|
|
test_path_is_file bare/objects/info/commit-graph &&
|
|
|
|
graph_read_expect -C bare 11 "generation_data extra_edges"
|
2018-04-02 23:34:20 +03:00
|
|
|
'
|
|
|
|
|
2018-04-10 15:56:05 +03:00
|
|
|
graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
|
|
|
|
graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
|
|
|
|
|
2018-05-01 15:47:23 +03:00
|
|
|
test_expect_success 'perform fast-forward merge in full repo' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full checkout -b merge-5-to-8 commits/5 &&
|
|
|
|
git -C full merge commits/8 &&
|
|
|
|
git -C full show-ref -s merge-5-to-8 >output &&
|
|
|
|
git -C full show-ref -s commits/8 >expect &&
|
2018-05-01 15:47:23 +03:00
|
|
|
test_cmp expect output
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:46 +03:00
|
|
|
test_expect_success 'check that gc computes commit-graph' '
|
2023-07-24 19:39:28 +03:00
|
|
|
test_commit -C full --no-tag blank &&
|
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
cp full/$objdir/info/commit-graph commit-graph-before-gc &&
|
|
|
|
git -C full reset --hard HEAD~1 &&
|
|
|
|
test_config -C full gc.writeCommitGraph true &&
|
|
|
|
git -C full gc &&
|
|
|
|
cp full/$objdir/info/commit-graph commit-graph-after-gc &&
|
2018-08-13 14:52:43 +03:00
|
|
|
! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
test_cmp_bin commit-graph-after-gc full/$objdir/info/commit-graph
|
2018-06-27 16:24:46 +03:00
|
|
|
'
|
|
|
|
|
2018-08-20 21:24:27 +03:00
|
|
|
test_expect_success 'replace-objects invalidates commit-graph' '
|
|
|
|
test_when_finished rm -rf replace &&
|
|
|
|
git clone full replace &&
|
|
|
|
(
|
|
|
|
cd replace &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
test_path_is_file .git/objects/info/commit-graph &&
|
|
|
|
git replace HEAD~1 HEAD~2 &&
|
2021-10-15 02:37:16 +03:00
|
|
|
graph_git_two_modes "commit-graph verify" &&
|
2018-08-20 21:24:27 +03:00
|
|
|
git -c core.commitGraph=false log >expect &&
|
|
|
|
git -c core.commitGraph=true log >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
git -c core.commitGraph=false --no-replace-objects log >expect &&
|
|
|
|
git -c core.commitGraph=true --no-replace-objects log >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
rm -rf .git/objects/info/commit-graph &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
test_path_is_file .git/objects/info/commit-graph
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2018-08-20 21:24:30 +03:00
|
|
|
test_expect_success 'commit grafts invalidate commit-graph' '
|
|
|
|
test_when_finished rm -rf graft &&
|
2022-06-03 14:15:05 +03:00
|
|
|
git clone --template= full graft &&
|
2018-08-20 21:24:30 +03:00
|
|
|
(
|
|
|
|
cd graft &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
test_path_is_file .git/objects/info/commit-graph &&
|
|
|
|
H1=$(git rev-parse --verify HEAD~1) &&
|
|
|
|
H3=$(git rev-parse --verify HEAD~3) &&
|
2022-06-03 14:15:05 +03:00
|
|
|
mkdir .git/info &&
|
2018-08-20 21:24:30 +03:00
|
|
|
echo "$H1 $H3" >.git/info/grafts &&
|
|
|
|
git -c core.commitGraph=false log >expect &&
|
|
|
|
git -c core.commitGraph=true log >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
git -c core.commitGraph=false --no-replace-objects log >expect &&
|
|
|
|
git -c core.commitGraph=true --no-replace-objects log >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
rm -rf .git/objects/info/commit-graph &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
test_path_is_missing .git/objects/info/commit-graph
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'replace-objects invalidates commit-graph' '
|
|
|
|
test_when_finished rm -rf shallow &&
|
|
|
|
git clone --depth 2 "file://$TRASH_DIRECTORY/full" shallow &&
|
|
|
|
(
|
|
|
|
cd shallow &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
test_path_is_missing .git/objects/info/commit-graph &&
|
|
|
|
git fetch origin --unshallow &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
test_path_is_file .git/objects/info/commit-graph
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2020-08-17 17:04:47 +03:00
|
|
|
test_expect_success 'warn on improper hash version' '
|
|
|
|
git init --object-format=sha1 sha1 &&
|
|
|
|
(
|
|
|
|
cd sha1 &&
|
|
|
|
test_commit 1 &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
mv .git/objects/info/commit-graph ../cg-sha1
|
|
|
|
) &&
|
|
|
|
git init --object-format=sha256 sha256 &&
|
|
|
|
(
|
|
|
|
cd sha256 &&
|
|
|
|
test_commit 1 &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
mv .git/objects/info/commit-graph ../cg-sha256
|
|
|
|
) &&
|
|
|
|
(
|
|
|
|
cd sha1 &&
|
|
|
|
mv ../cg-sha256 .git/objects/info/commit-graph &&
|
|
|
|
git log -1 2>err &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "commit-graph hash version 2 does not match version 1" err
|
2020-08-17 17:04:47 +03:00
|
|
|
) &&
|
|
|
|
(
|
|
|
|
cd sha256 &&
|
|
|
|
mv ../cg-sha1 .git/objects/info/commit-graph &&
|
|
|
|
git log -1 2>err &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "commit-graph hash version 1 does not match version 2" err
|
2020-08-17 17:04:47 +03:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-03-01 22:48:30 +03:00
|
|
|
test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' '
|
2021-02-01 20:15:04 +03:00
|
|
|
UNIX_EPOCH_ZERO="@0 +0000" &&
|
2022-03-01 22:48:30 +03:00
|
|
|
FUTURE_DATE="@4147483646 +0000" &&
|
2023-07-24 19:39:28 +03:00
|
|
|
rm -f full/.git/objects/info/commit-graph &&
|
|
|
|
test_commit -C full --date "$FUTURE_DATE" future-1 &&
|
|
|
|
test_commit -C full --date "$UNIX_EPOCH_ZERO" old-1 &&
|
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
test_commit -C full --date "$FUTURE_DATE" future-2 &&
|
|
|
|
test_commit -C full --date "$UNIX_EPOCH_ZERO" old-2 &&
|
|
|
|
git -C full commit-graph write --reachable --split=no-merge &&
|
|
|
|
test_commit -C full extra &&
|
|
|
|
git -C full commit-graph write --reachable --split=no-merge &&
|
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
graph_read_expect -C full 16 \
|
|
|
|
"generation_data generation_data_overflow extra_edges" &&
|
|
|
|
mv full/.git/objects/info/commit-graph commit-graph-upgraded &&
|
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
graph_read_expect -C full 16 \
|
|
|
|
"generation_data generation_data_overflow extra_edges" &&
|
|
|
|
test_cmp full/.git/objects/info/commit-graph commit-graph-upgraded
|
2021-02-01 20:15:04 +03:00
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:33 +03:00
|
|
|
# the verify tests below expect the commit-graph to contain
|
|
|
|
# exactly the commits reachable from the commits/8 branch.
|
|
|
|
# If the file changes the set of commits in the list, then the
|
|
|
|
# offsets into the binary file will result in different edits
|
|
|
|
# and the tests will likely break.
|
|
|
|
|
2018-06-27 16:24:32 +03:00
|
|
|
test_expect_success 'git commit-graph verify' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full rev-parse commits/8 >in &&
|
|
|
|
git -C full -c commitGraph.generationVersion=1 commit-graph write \
|
|
|
|
--stdin-commits <in &&
|
|
|
|
git -C full commit-graph verify >output &&
|
|
|
|
graph_read_expect -C full 9 extra_edges 1
|
2018-06-27 16:24:32 +03:00
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:36 +03:00
|
|
|
NUM_COMMITS=9
|
2018-06-27 16:24:41 +03:00
|
|
|
NUM_OCTOPUS_EDGES=2
|
2018-09-13 08:17:42 +03:00
|
|
|
HASH_LEN="$(test_oid rawsz)"
|
2018-06-27 16:24:33 +03:00
|
|
|
GRAPH_BYTE_VERSION=4
|
|
|
|
GRAPH_BYTE_HASH=5
|
2018-06-27 16:24:34 +03:00
|
|
|
GRAPH_BYTE_CHUNK_COUNT=6
|
|
|
|
GRAPH_CHUNK_LOOKUP_OFFSET=8
|
|
|
|
GRAPH_CHUNK_LOOKUP_WIDTH=12
|
|
|
|
GRAPH_CHUNK_LOOKUP_ROWS=5
|
|
|
|
GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
|
|
|
|
GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
|
|
|
|
1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
|
|
|
|
GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
|
|
|
|
2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
|
2018-06-27 16:24:35 +03:00
|
|
|
GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
|
|
|
|
$GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
|
|
|
|
GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
|
|
|
|
GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
|
|
|
|
GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
|
|
|
|
GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
|
2018-06-27 16:24:36 +03:00
|
|
|
GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
|
2023-08-22 00:34:40 +03:00
|
|
|
GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
|
2018-06-27 16:24:37 +03:00
|
|
|
GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
|
|
|
|
GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
|
2018-06-27 16:24:38 +03:00
|
|
|
GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
|
|
|
|
GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
|
|
|
|
GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
|
2018-06-27 16:24:39 +03:00
|
|
|
GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
|
2023-08-22 00:34:40 +03:00
|
|
|
GRAPH_BYTE_COMMIT_GENERATION_LAST=$(($GRAPH_BYTE_COMMIT_GENERATION + $(($NUM_COMMITS - 1)) * $GRAPH_COMMIT_DATA_WIDTH))
|
2018-06-27 16:24:40 +03:00
|
|
|
GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
|
2018-06-27 16:24:41 +03:00
|
|
|
GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
|
|
|
|
$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
|
|
|
|
GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
|
2018-06-27 16:24:42 +03:00
|
|
|
GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
|
2018-06-27 16:24:33 +03:00
|
|
|
|
2019-02-22 01:37:46 +03:00
|
|
|
corrupt_graph_setup() {
|
2023-07-24 19:39:28 +03:00
|
|
|
test_when_finished mv commit-graph-backup full/$objdir/info/commit-graph &&
|
|
|
|
cp full/$objdir/info/commit-graph commit-graph-backup &&
|
|
|
|
chmod u+w full/$objdir/info/commit-graph
|
2019-02-22 01:37:46 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
corrupt_graph_verify() {
|
|
|
|
grepstr=$1
|
2023-07-24 19:39:28 +03:00
|
|
|
test_must_fail git -C full commit-graph verify 2>test_err &&
|
2019-02-22 01:37:46 +03:00
|
|
|
grep -v "^+" test_err >err &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "$grepstr" err &&
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 15:08:33 +03:00
|
|
|
if test "$2" != "no-copy"
|
|
|
|
then
|
2023-07-24 19:39:28 +03:00
|
|
|
cp full/$objdir/info/commit-graph commit-graph-pre-write-test
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 15:08:33 +03:00
|
|
|
fi &&
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full status --short &&
|
|
|
|
GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE=true git -C full commit-graph write &&
|
|
|
|
chmod u+w full/$objdir/info/commit-graph &&
|
|
|
|
git -C full commit-graph verify
|
2019-02-22 01:37:46 +03:00
|
|
|
}
|
|
|
|
|
2019-01-16 01:25:51 +03:00
|
|
|
# usage: corrupt_graph_and_verify <position> <data> <string> [<zero_pos>]
|
2018-06-27 16:24:33 +03:00
|
|
|
# Manipulates the commit-graph file at the position
|
2019-01-16 01:25:51 +03:00
|
|
|
# by inserting the data, optionally zeroing the file
|
|
|
|
# starting at <zero_pos>, then runs 'git commit-graph verify'
|
2018-06-27 16:24:33 +03:00
|
|
|
# and places the output in the file 'err'. Test 'err' for
|
|
|
|
# the given string.
|
|
|
|
corrupt_graph_and_verify() {
|
|
|
|
pos=$1
|
|
|
|
data="${2:-\0}"
|
|
|
|
grepstr=$3
|
2019-02-22 01:37:46 +03:00
|
|
|
corrupt_graph_setup &&
|
2023-07-24 19:39:28 +03:00
|
|
|
orig_size=$(wc -c <full/$objdir/info/commit-graph) &&
|
2019-01-16 01:25:51 +03:00
|
|
|
zero_pos=${4:-${orig_size}} &&
|
2023-07-24 19:39:28 +03:00
|
|
|
printf "$data" | dd of="full/$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
|
|
|
|
dd of="full/$objdir/info/commit-graph" bs=1 seek="$zero_pos" if=/dev/null &&
|
|
|
|
test-tool genzeros $(($orig_size - $zero_pos)) >>"full/$objdir/info/commit-graph" &&
|
2019-02-22 01:37:46 +03:00
|
|
|
corrupt_graph_verify "$grepstr"
|
|
|
|
|
2018-06-27 16:24:33 +03:00
|
|
|
}
|
|
|
|
|
2019-03-25 15:08:32 +03:00
|
|
|
test_expect_success POSIXPERM,SANITY 'detect permission problem' '
|
|
|
|
corrupt_graph_setup &&
|
2023-07-24 19:39:28 +03:00
|
|
|
chmod 000 full/$objdir/info/commit-graph &&
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 15:08:33 +03:00
|
|
|
corrupt_graph_verify "Could not open" "no-copy"
|
2019-03-25 15:08:32 +03:00
|
|
|
'
|
|
|
|
|
2019-02-22 01:37:47 +03:00
|
|
|
test_expect_success 'detect too small' '
|
|
|
|
corrupt_graph_setup &&
|
2023-07-24 19:39:28 +03:00
|
|
|
echo "a small graph" >full/$objdir/info/commit-graph &&
|
2019-02-22 01:37:47 +03:00
|
|
|
corrupt_graph_verify "too small"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:33 +03:00
|
|
|
test_expect_success 'detect bad signature' '
|
|
|
|
corrupt_graph_and_verify 0 "\0" \
|
|
|
|
"graph signature"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect bad version' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
|
|
|
|
"graph version"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect bad hash version' '
|
2019-12-21 22:49:24 +03:00
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_HASH "\03" \
|
2018-06-27 16:24:33 +03:00
|
|
|
"hash version"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:34 +03:00
|
|
|
test_expect_success 'detect low chunk count' '
|
commit-graph: fix parsing the Chunk Lookup table
The commit-graph file format specifies that the chunks may be in any
order. However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:
fatal: invalid commit position. commit-graph is likely corrupt
In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.
The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size. However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label. This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk). Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk. Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().
Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.
Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.
This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 16:00:24 +03:00
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
|
2021-02-18 17:07:35 +03:00
|
|
|
"final chunk has non-zero id"
|
2018-06-27 16:24:34 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect missing OID fanout chunk' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
|
2023-11-09 10:14:34 +03:00
|
|
|
"commit-graph required OID fanout chunk missing or corrupted"
|
2018-06-27 16:24:34 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect missing OID lookup chunk' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
|
2023-11-09 10:14:34 +03:00
|
|
|
"commit-graph required OID lookup chunk missing or corrupted"
|
2018-06-27 16:24:34 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect missing commit data chunk' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
|
2023-11-09 10:14:34 +03:00
|
|
|
"commit-graph required commit data chunk missing or corrupted"
|
2018-06-27 16:24:34 +03:00
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:35 +03:00
|
|
|
test_expect_success 'detect incorrect fanout' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
|
|
|
|
"fanout value"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect incorrect fanout final value' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
|
commit-graph: use fanout value for graph size
Commit-graph, midx, and pack idx files all have both a lookup table of
oids and an oid fanout table. In midx and pack idx files, we take the
final entry of the fanout table as the source of truth for the number of
entries, and then verify that the size of the lookup table matches that.
But for commit-graph files, we do the opposite: we use the size of the
lookup table as the source of truth, and then check the final fanout
entry against it.
As noted in 4169d89645 (commit-graph: check consistency of fanout
table, 2023-10-09), either is correct. But there are a few reasons to
prefer the fanout table as the source of truth:
1. The fanout entries are 32-bits on disk, and that defines the
maximum number of entries we can store. But since the size of the
lookup table is only bounded by the filesystem, it can be much
larger. And hence computing it as the commit-graph does means that
we may truncate the result when storing it in a uint32_t.
2. We read the fanout first, then the lookup table. If we're verifying
the chunks as we read them, then we'd want to take the fanout as
truth (we have nothing yet to check it against) and then we can
check that the lookup table matches what we already know.
3. It is pointlessly inconsistent with the midx and pack idx code.
Since the three have to do similar size and bounds checks, it is
easier to reason about all three if they use the same approach.
So this patch moves the assignment of g->num_commits to the fanout
parser, and then we can check the size of the lookup chunk as soon as we
try to load it.
There's already a test covering this situation, which munges the final
fanout entry to 2^32-1. In the current code we complain that it does not
agree with the table size. But now that we treat the munged value as the
source of truth, we'll complain that the lookup table is the wrong size
(again, either is correct). So we'll have to update the message we
expect (and likewise for an earlier test which does similar munging).
There's a similar test for this situation on the midx side, but rather
than making a very-large fanout value, it just truncates the lookup
table. We could do that here, too, but the very-large fanout value
actually shows an interesting corner case. On a 32-bit system,
multiplying to find the expected table size would cause an integer
overflow. Using st_mult() would detect that, but cause us to die()
rather than falling back to the non-graph code path. Checking the size
using division (as we do with existing chunk-size checks) avoids the
overflow entirely, and the test demonstrates this when run on a 32-bit
system.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09 10:24:35 +03:00
|
|
|
"OID lookup chunk is the wrong size"
|
2018-06-27 16:24:35 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect incorrect OID order' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
|
|
|
|
"incorrect OID order"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:36 +03:00
|
|
|
test_expect_success 'detect OID not in object database' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
|
|
|
|
"from object database"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:37 +03:00
|
|
|
test_expect_success 'detect incorrect tree OID' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
|
|
|
|
"root tree OID for commit"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:38 +03:00
|
|
|
test_expect_success 'detect incorrect parent int-id' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
|
|
|
|
"invalid parent"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect extra parent int-id' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
|
|
|
|
"is too long"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect wrong parent' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
|
|
|
|
"commit-graph parent for"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:39 +03:00
|
|
|
test_expect_success 'detect incorrect generation number' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
|
|
|
|
"generation for commit"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:40 +03:00
|
|
|
test_expect_success 'detect incorrect commit date' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
|
|
|
|
"commit date"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:41 +03:00
|
|
|
test_expect_success 'detect incorrect parent for octopus merge' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
|
|
|
|
"invalid parent"
|
|
|
|
'
|
|
|
|
|
2018-06-27 16:24:42 +03:00
|
|
|
test_expect_success 'detect invalid checksum hash' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
|
|
|
|
"incorrect checksum"
|
|
|
|
'
|
|
|
|
|
2019-01-16 01:25:51 +03:00
|
|
|
test_expect_success 'detect incorrect chunk count' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\377" \
|
commit-graph: simplify parse_commit_graph() #1
While we iterate over all entries of the Chunk Lookup table we make
sure that we don't attempt to read past the end of the mmap-ed
commit-graph file, and check in each iteration that the chunk ID and
offset we are about to read is still within the mmap-ed memory region.
However, these checks in each iteration are not really necessary,
because the number of chunks in the commit-graph file is already known
before this loop from the just parsed commit-graph header.
So let's check that the commit-graph file is large enough for all
entries in the Chunk Lookup table before we start iterating over those
entries, and drop those per-iteration checks. While at it, take into
account the size of everything that is necessary to have a valid
commit-graph file, i.e. the size of the header, the size of the
mandatory OID Fanout chunk, and the size of the signature in the
trailer as well.
Note that this necessitates the change of the error message as well,
and, consequently, have to update the 'detect incorrect chunk count'
test in 't5318-commit-graph.sh' as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 16:00:29 +03:00
|
|
|
"commit-graph file is too small to hold [0-9]* chunks" \
|
|
|
|
$GRAPH_CHUNK_LOOKUP_OFFSET
|
2019-01-16 01:25:51 +03:00
|
|
|
'
|
|
|
|
|
2023-08-22 00:34:40 +03:00
|
|
|
test_expect_success 'detect mixed generation numbers (non-zero to zero)' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION_LAST "\0\0\0\0" \
|
2023-08-22 00:34:42 +03:00
|
|
|
"both zero and non-zero generations"
|
2023-08-22 00:34:40 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'detect mixed generation numbers (zero to non-zero)' '
|
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\0\0\0\0" \
|
2023-08-22 00:34:42 +03:00
|
|
|
"both zero and non-zero generations"
|
2023-08-22 00:34:40 +03:00
|
|
|
'
|
|
|
|
|
2021-10-15 23:16:29 +03:00
|
|
|
test_expect_success 'git fsck (checks commit-graph when config set to true)' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full fsck &&
|
2018-06-27 16:24:43 +03:00
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
|
|
|
|
"incorrect checksum" &&
|
2023-07-24 19:39:28 +03:00
|
|
|
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
|
|
|
|
test_must_fail git -C full -c core.commitGraph=true fsck
|
2021-10-15 23:16:29 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full fsck &&
|
2021-10-15 23:16:29 +03:00
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
|
|
|
|
"incorrect checksum" &&
|
2023-07-24 19:39:28 +03:00
|
|
|
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
|
|
|
|
git -C full -c core.commitGraph=false fsck
|
2021-10-15 23:16:29 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'git fsck (checks commit-graph when config unset)' '
|
2023-07-24 19:39:28 +03:00
|
|
|
test_when_finished "git -C full config core.commitGraph true" &&
|
2021-10-15 23:16:29 +03:00
|
|
|
|
2023-07-24 19:39:28 +03:00
|
|
|
git -C full fsck &&
|
2021-10-15 23:16:29 +03:00
|
|
|
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
|
|
|
|
"incorrect checksum" &&
|
2023-07-24 19:39:28 +03:00
|
|
|
test_unconfig -C full core.commitGraph &&
|
|
|
|
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
|
|
|
|
test_must_fail git -C full fsck
|
2018-06-27 16:24:43 +03:00
|
|
|
'
|
|
|
|
|
fsck: suppress commit-graph output with `--no-progress`
Since e0fd51e1d7 (fsck: verify commit-graph, 2018-06-27), `fsck` runs
`git commit-graph verify` to check the integrity of any commit-graph(s).
Originally, the `git commit-graph verify` step would always print to
stdout/stderr, regardless of whether or not `fsck` was invoked with
`--[no-]progress` or not. But in 7371612255 (commit-graph: add
--[no-]progress to write and verify, 2019-08-26), the commit-graph
machinery learned the `--[no-]progress` option, though `fsck` was not
updated to pass this new flag (or not).
This led to seeing output from running `git fsck`, even with
`--no-progress` on repositories that have a commit-graph:
$ git.compile fsck --connectivity-only --no-progress --no-dangling
Verifying commits in commit graph: 100% (4356/4356), done.
Verifying commits in commit graph: 100% (131912/131912), done.
Ensure that `fsck` passes `--[no-]progress` as appropriate when calling
`git commit-graph verify`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-08 03:31:31 +03:00
|
|
|
test_expect_success 'git fsck shows commit-graph output with --progress' '
|
|
|
|
git -C "$TRASH_DIRECTORY/full" fsck --progress 2>err &&
|
|
|
|
grep "Verifying commits in commit graph" err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'git fsck suppresses commit-graph output with --no-progress' '
|
|
|
|
git -C "$TRASH_DIRECTORY/full" fsck --no-progress 2>err &&
|
|
|
|
! grep "Verifying commits in commit graph" err
|
|
|
|
'
|
|
|
|
|
2018-07-12 01:42:42 +03:00
|
|
|
test_expect_success 'setup non-the_repository tests' '
|
|
|
|
rm -rf repo &&
|
|
|
|
git init repo &&
|
|
|
|
test_commit -C repo one &&
|
|
|
|
test_commit -C repo two &&
|
|
|
|
git -C repo config core.commitGraph true &&
|
|
|
|
git -C repo rev-parse two | \
|
|
|
|
git -C repo commit-graph write --stdin-commits
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'parse_commit_in_graph works for non-the_repository' '
|
|
|
|
test-tool repository parse_commit_in_graph \
|
|
|
|
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
|
2018-08-13 03:30:10 +03:00
|
|
|
{
|
|
|
|
git -C repo log --pretty=format:"%ct " -1 &&
|
|
|
|
git -C repo rev-parse one
|
|
|
|
} >expect &&
|
2018-07-12 01:42:42 +03:00
|
|
|
test_cmp expect actual &&
|
|
|
|
|
|
|
|
test-tool repository parse_commit_in_graph \
|
|
|
|
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
|
2018-08-13 03:30:10 +03:00
|
|
|
git -C repo log --pretty="%ct" -1 one >expect &&
|
2018-07-12 01:42:42 +03:00
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'get_commit_tree_in_graph works for non-the_repository' '
|
|
|
|
test-tool repository get_commit_tree_in_graph \
|
|
|
|
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
|
2018-08-13 03:30:10 +03:00
|
|
|
git -C repo rev-parse two^{tree} >expect &&
|
2018-07-12 01:42:42 +03:00
|
|
|
test_cmp expect actual &&
|
|
|
|
|
|
|
|
test-tool repository get_commit_tree_in_graph \
|
|
|
|
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
|
2018-08-13 03:30:10 +03:00
|
|
|
git -C repo rev-parse one^{tree} >expect &&
|
2018-07-12 01:42:42 +03:00
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
commit-graph.c: handle commit parsing errors
To write a commit graph chunk, 'write_graph_chunk_data()' takes a list
of commits to write and parses each one before writing the necessary
data, and continuing on to the next commit in the list.
Since the majority of these commits are not parsed ahead of time (an
exception is made for the *last* commit in the list, which is parsed
early within 'copy_oids_to_commits'), it is possible that calling
'parse_commit_no_graph()' on them may return an error. Failing to catch
these errors before de-referencing later calls can result in a undefined
memory access and a SIGSEGV.
One such example of this is 'get_commit_tree_oid()', which expects a
parsed object as its input (in this case, the commit-graph code passes
'*list'). If '*list' causes a parse error, the subsequent call will
fail.
Prevent such an issue by checking the return value of
'parse_commit_no_graph()' to avoid passing an unparsed object to a
function which expects a parsed object, thus preventing a segfault.
It is worth noting that this fix is really skirting around the issue in
object.c's 'parse_object()', which makes it difficult to tell how
corrupt an object is without digging into it. Presumably one could
change the meaning of 'parse_object' returns, but this would require
adjusting each callsite accordingly. Instead of that, add an additional
check to the object parsed.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06 01:04:55 +03:00
|
|
|
test_expect_success 'corrupt commit-graph write (broken parent)' '
|
2019-09-06 01:04:53 +03:00
|
|
|
rm -rf repo &&
|
|
|
|
git init repo &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
empty="$(git mktree </dev/null)" &&
|
|
|
|
cat >broken <<-EOF &&
|
|
|
|
tree $empty
|
2020-02-07 03:52:49 +03:00
|
|
|
parent $ZERO_OID
|
2019-09-06 01:04:53 +03:00
|
|
|
author whatever <whatever@example.com> 1234 -0000
|
|
|
|
committer whatever <whatever@example.com> 1234 -0000
|
|
|
|
|
|
|
|
broken commit
|
|
|
|
EOF
|
|
|
|
broken="$(git hash-object -w -t commit --literally broken)" &&
|
|
|
|
git commit-tree -p "$broken" -m "good commit" "$empty" >good &&
|
|
|
|
test_must_fail git commit-graph write --stdin-commits \
|
|
|
|
<good 2>test_err &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "unable to parse commit" test_err
|
2019-09-06 01:04:53 +03:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-09-06 01:04:57 +03:00
|
|
|
test_expect_success 'corrupt commit-graph write (missing tree)' '
|
2019-09-06 01:04:53 +03:00
|
|
|
rm -rf repo &&
|
|
|
|
git init repo &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
tree="$(git mktree </dev/null)" &&
|
|
|
|
cat >broken <<-EOF &&
|
2020-02-07 03:52:49 +03:00
|
|
|
parent $ZERO_OID
|
2019-09-06 01:04:53 +03:00
|
|
|
author whatever <whatever@example.com> 1234 -0000
|
|
|
|
committer whatever <whatever@example.com> 1234 -0000
|
|
|
|
|
|
|
|
broken commit
|
|
|
|
EOF
|
|
|
|
broken="$(git hash-object -w -t commit --literally broken)" &&
|
|
|
|
git commit-tree -p "$broken" -m "good" "$tree" >good &&
|
|
|
|
test_must_fail git commit-graph write --stdin-commits \
|
|
|
|
<good 2>test_err &&
|
2023-10-31 08:23:30 +03:00
|
|
|
test_grep "unable to parse commit" test_err
|
2019-09-06 01:04:53 +03:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
commit-graph: implement generation data chunk
As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.
We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.
Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).
We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.
To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:
We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.
If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.
We test the overflow-related code with the following repo history:
F - N - U
/ \
U - N - U N
\ /
N - F - N
Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.
The largest offset observed is 2 ^ 31, just large enough to overflow.
[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 21:11:15 +03:00
|
|
|
# We test the overflow-related code with the following repo history:
|
|
|
|
#
|
|
|
|
# 4:F - 5:N - 6:U
|
|
|
|
# / \
|
|
|
|
# 1:U - 2:N - 3:U M:N
|
|
|
|
# \ /
|
|
|
|
# 7:N - 8:F - 9:N
|
|
|
|
#
|
|
|
|
# Here the commits denoted by U have committer date of zero seconds
|
|
|
|
# since Unix epoch, the commits denoted by N have committer date
|
|
|
|
# starting from 1112354055 seconds since Unix epoch (default committer
|
|
|
|
# date for the test suite), and the commits denoted by F have committer
|
|
|
|
# date of (2 ^ 31 - 2) seconds since Unix epoch.
|
|
|
|
#
|
|
|
|
# The largest offset observed is 2 ^ 31, just large enough to overflow.
|
|
|
|
#
|
|
|
|
|
|
|
|
test_expect_success 'set up and verify repo with generation data overflow chunk' '
|
|
|
|
UNIX_EPOCH_ZERO="@0 +0000" &&
|
|
|
|
FUTURE_DATE="@2147483646 +0000" &&
|
2023-07-24 19:39:28 +03:00
|
|
|
|
|
|
|
git init repo &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
test_commit --date "$UNIX_EPOCH_ZERO" 1 &&
|
|
|
|
test_commit 2 &&
|
|
|
|
test_commit --date "$UNIX_EPOCH_ZERO" 3 &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
graph_read_expect 3 generation_data &&
|
|
|
|
test_commit --date "$FUTURE_DATE" 4 &&
|
|
|
|
test_commit 5 &&
|
|
|
|
test_commit --date "$UNIX_EPOCH_ZERO" 6 &&
|
|
|
|
git branch left &&
|
|
|
|
git reset --hard 3 &&
|
|
|
|
test_commit 7 &&
|
|
|
|
test_commit --date "$FUTURE_DATE" 8 &&
|
|
|
|
test_commit 9 &&
|
|
|
|
git branch right &&
|
|
|
|
git reset --hard 3 &&
|
|
|
|
test_merge M left right &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
graph_read_expect 10 "generation_data generation_data_overflow" &&
|
|
|
|
git commit-graph verify
|
|
|
|
)
|
commit-graph: implement generation data chunk
As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.
We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.
Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).
We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.
To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:
We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.
If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.
We test the overflow-related code with the following repo history:
F - N - U
/ \
U - N - U N
\ /
N - F - N
Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.
The largest offset observed is 2 ^ 31, just large enough to overflow.
[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 21:11:15 +03:00
|
|
|
'
|
|
|
|
|
|
|
|
graph_git_behavior 'generation data overflow chunk repo' repo left right
|
|
|
|
|
commit-graph: fix corrupt upgrade from generation v1 to v2
The previous commit demonstrates a bug where a commit-graph using
generation v2 could enter a state where one of the GDA2 values has its
most-significant bit set (indicating that its value should be read from
the extended offset table in the GDO2 chunk) without having a GDO2 chunk
to read from.
This results in the following error message being displayed to the
caller:
fatal: commit-graph requires overflow generation data but has none
This bug arises in the following scenario:
- We decide to write a commit-graph using generation number v2, and
decide (correctly) that no GDO2 chunk is necessary (e.g., because
all of the commiter date offsets are no larger than 2^31-1).
- The v2 generation numbers are stored in the `->generation` member of
the commit slab holding `struct commit_graph_data`'s.
- Later on, `load_commit_graph_info()` is called, overwriting the
v2 generation data in the aforementioned slab with any existing v1
generation data.
Then, when the commit-graph code goes to write the GDA2 chunk via
`write_graph_chunk_generation_data()`, we use the overwritten generation
v1 data in a place where we expect to use a v2 generation number:
offset = commit_graph_data_at(c)->generation - c->date;
...because `commit_graph_data_at(c)->generation` used to hold the v2
generation data, but it was overwritten to contain the v1 generation
number via `load_commit_graph_info()`.
If the `offset` computation above overflows the v2 generation number
max, then `write_graph_chunk_generation_data()` will update its count of
large offsets and write the marker accordingly:
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
num_generation_data_overflows++;
}
and reads will look for the GDO2 chunk containing the overflowing v2
generation number, *after* the commit-graph code decided that no such
chunk was necessary.
The main problem is that the slab containing `struct commit_graph_data`
has a dual purpose. It is used to hold data that we are about to write
to disk while generating a commit-graph, as well as hold data that was
read from an existing commit-graph.
When the two mix, namely when the result of reading the commit-graph has
a side-effect that mixes poorly with an in-progress commit-graph write,
we end up with corrupt data.
A complete fix might be to introduce a new slab that is used exclusively
for writing, and gate access between the two slabs based on context
provided by the caller (e.g., whether this computation is part of a
"read" or "write" operation).
But a more minimal fix addresses the only known path which overwrites
the slab data, which is `compute_bloom_filters()` ->
`get_or_compute_bloom_filter()` -> `load_commit_graph_info()` ->
`fill_commit_graph_info()` by avoiding the last call which clobbers the
data altogether.
This path only needs to learn the graph position of a given commit so
that it can be used in `load_bloom_filter_from_graph()`. By replacing
the last steps of the above with one that records the graph position
into a temporary variable which is then used to load the existing Bloom
data, we eliminate the clobbering, removing the corruption.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-13 02:10:33 +03:00
|
|
|
test_expect_success 'overflow during generation version upgrade' '
|
2022-07-13 02:10:28 +03:00
|
|
|
git init overflow-v2-upgrade &&
|
|
|
|
(
|
|
|
|
cd overflow-v2-upgrade &&
|
|
|
|
|
|
|
|
# This commit will have a date at two seconds past the Epoch,
|
|
|
|
# and a (v1) generation number of 1, since it is a root commit.
|
|
|
|
#
|
|
|
|
# The offset will then be computed as 1-2, which will underflow
|
|
|
|
# to 2^31, which is greater than the v2 offset small limit of
|
|
|
|
# 2^31-1.
|
|
|
|
#
|
|
|
|
# This is sufficient to need a large offset table for the v2
|
|
|
|
# generation numbers.
|
|
|
|
test_commit --date "@2 +0000" base &&
|
|
|
|
git repack -d &&
|
|
|
|
|
|
|
|
# Test that upgrading from generation v1 to v2 correctly
|
|
|
|
# produces the overflow table.
|
|
|
|
git -c commitGraph.generationVersion=1 commit-graph write &&
|
|
|
|
git -c commitGraph.generationVersion=2 commit-graph write \
|
|
|
|
--changed-paths &&
|
|
|
|
|
|
|
|
git rev-list --all
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2023-10-09 23:59:51 +03:00
|
|
|
corrupt_chunk () {
|
|
|
|
graph=full/.git/objects/info/commit-graph &&
|
|
|
|
test_when_finished "rm -rf $graph" &&
|
|
|
|
git -C full commit-graph write --reachable &&
|
|
|
|
corrupt_chunk_file $graph "$@"
|
|
|
|
}
|
|
|
|
|
|
|
|
check_corrupt_chunk () {
|
|
|
|
corrupt_chunk "$@" &&
|
|
|
|
git -C full -c core.commitGraph=false log >expect.out &&
|
|
|
|
git -C full -c core.commitGraph=true log >out 2>err &&
|
|
|
|
test_cmp expect.out out
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'reader notices too-small oid fanout chunk' '
|
|
|
|
# make it big enough that the graph file is plausible,
|
|
|
|
# otherwise we hit an earlier check
|
|
|
|
check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) &&
|
|
|
|
cat >expect.err <<-\EOF &&
|
|
|
|
error: commit-graph oid fanout chunk is wrong size
|
2023-11-09 10:14:34 +03:00
|
|
|
error: commit-graph required OID fanout chunk missing or corrupted
|
2023-10-09 23:59:51 +03:00
|
|
|
EOF
|
|
|
|
test_cmp expect.err err
|
|
|
|
'
|
|
|
|
|
commit-graph: check consistency of fanout table
We use bsearch_hash() to look up items in the oid index of a
commit-graph. It also has a fanout table to reduce the initial range in
which we'll search. But since the fanout comes from the on-disk file, a
corrupted or malicious file can cause us to look outside of the
allocated index memory.
One solution here would be to pass the total table size to
bsearch_hash(), which could then bounds check the values it reads from
the fanout. But there's an inexpensive up-front check we can do, and
it's the same one used by the midx and pack idx code (both of which
likewise have fanout tables and use bsearch_hash(), but are not affected
by this bug):
1. We can check the value of the final fanout entry against the size
of the table we got from the index chunk. These must always match,
since the fanout is just slicing up the index.
As a side note, the midx and pack idx code compute it the other
way around: they use the final fanout value as the object count, and
check the index size against it. Either is valid; if they
disagree we cannot know which is wrong (a corrupted fanout value,
or a too-small table of oids).
2. We can quickly scan the fanout table to make sure it is
monotonically increasing. If it is, then we know that every value
is less than or equal to the final value, and therefore less than
or equal to the table size.
It would also be sufficient to just check that each fanout value is
smaller than the final one, but the midx and pack idx code both do
a full monotonicity check. It's the same cost, and it catches some
other corruptions (though not all; the checks done by "commit-graph
verify" are more complete but more expensive, and our goal here is
to be fast and memory-safe).
There are two new tests. One just checks the final fanout value (this is
the mirror image of the "too small oid lookup" case added for the midx
in the previous commit; it's flipped here because commit-graph considers
the oid lookup chunk to be the source of truth).
The other actually creates a fanout with many out-of-bounds entries, and
prior to this patch, it does cause the segfault you'd expect. But note
that the error is not "your fanout entry is out-of-bounds", but rather
"fanout value out of order". That's because we leave the final fanout
value in place (to get past the table size check), making the index
non-monotonic (the second-to-last entry is big, but the last one must
remain small to match the actual table).
We need adjustments to a few existing tests, as well:
- an earlier test in t5318 corrupts the fanout and runs "commit-graph
verify". Its message is now changed, since we catch the problem
earlier (during the load step, rather than the careful validation
step).
- in t5324, we test that "commit-graph verify --shallow" does not do
expensive verification on the base file of the chain. But the
corruption it uses (munging a byte at offset 1000) happens to be in
the middle of the fanout table. And now we detect that problem in
the cheaper checks that are performed for every part of the graph.
We'll push this back to offset 1500, which is only caught by the
more expensive checksum validation.
Likewise, there's a later test in t5324 which munges an offset 100
bytes into a file (also in the fanout table) that is referenced by
an alternates file. So we now find that corruption during the load
step, rather than the verification step. At the very least we need
to change the error message (like the case above in t5318). But it
is probably good to make sure we handle all parts of the
verification even for alternate graph files. So let's likewise
corrupt byte 1500 and make sure we found the invalid checksum.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10 00:04:58 +03:00
|
|
|
test_expect_success 'reader notices fanout/lookup table mismatch' '
|
|
|
|
check_corrupt_chunk OIDF 1020 "FFFFFFFF" &&
|
|
|
|
cat >expect.err <<-\EOF &&
|
commit-graph: use fanout value for graph size
Commit-graph, midx, and pack idx files all have both a lookup table of
oids and an oid fanout table. In midx and pack idx files, we take the
final entry of the fanout table as the source of truth for the number of
entries, and then verify that the size of the lookup table matches that.
But for commit-graph files, we do the opposite: we use the size of the
lookup table as the source of truth, and then check the final fanout
entry against it.
As noted in 4169d89645 (commit-graph: check consistency of fanout
table, 2023-10-09), either is correct. But there are a few reasons to
prefer the fanout table as the source of truth:
1. The fanout entries are 32-bits on disk, and that defines the
maximum number of entries we can store. But since the size of the
lookup table is only bounded by the filesystem, it can be much
larger. And hence computing it as the commit-graph does means that
we may truncate the result when storing it in a uint32_t.
2. We read the fanout first, then the lookup table. If we're verifying
the chunks as we read them, then we'd want to take the fanout as
truth (we have nothing yet to check it against) and then we can
check that the lookup table matches what we already know.
3. It is pointlessly inconsistent with the midx and pack idx code.
Since the three have to do similar size and bounds checks, it is
easier to reason about all three if they use the same approach.
So this patch moves the assignment of g->num_commits to the fanout
parser, and then we can check the size of the lookup chunk as soon as we
try to load it.
There's already a test covering this situation, which munges the final
fanout entry to 2^32-1. In the current code we complain that it does not
agree with the table size. But now that we treat the munged value as the
source of truth, we'll complain that the lookup table is the wrong size
(again, either is correct). So we'll have to update the message we
expect (and likewise for an earlier test which does similar munging).
There's a similar test for this situation on the midx side, but rather
than making a very-large fanout value, it just truncates the lookup
table. We could do that here, too, but the very-large fanout value
actually shows an interesting corner case. On a 32-bit system,
multiplying to find the expected table size would cause an integer
overflow. Using st_mult() would detect that, but cause us to die()
rather than falling back to the non-graph code path. Checking the size
using division (as we do with existing chunk-size checks) avoids the
overflow entirely, and the test demonstrates this when run on a 32-bit
system.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09 10:24:35 +03:00
|
|
|
error: commit-graph OID lookup chunk is the wrong size
|
|
|
|
error: commit-graph required OID lookup chunk missing or corrupted
|
commit-graph: check consistency of fanout table
We use bsearch_hash() to look up items in the oid index of a
commit-graph. It also has a fanout table to reduce the initial range in
which we'll search. But since the fanout comes from the on-disk file, a
corrupted or malicious file can cause us to look outside of the
allocated index memory.
One solution here would be to pass the total table size to
bsearch_hash(), which could then bounds check the values it reads from
the fanout. But there's an inexpensive up-front check we can do, and
it's the same one used by the midx and pack idx code (both of which
likewise have fanout tables and use bsearch_hash(), but are not affected
by this bug):
1. We can check the value of the final fanout entry against the size
of the table we got from the index chunk. These must always match,
since the fanout is just slicing up the index.
As a side note, the midx and pack idx code compute it the other
way around: they use the final fanout value as the object count, and
check the index size against it. Either is valid; if they
disagree we cannot know which is wrong (a corrupted fanout value,
or a too-small table of oids).
2. We can quickly scan the fanout table to make sure it is
monotonically increasing. If it is, then we know that every value
is less than or equal to the final value, and therefore less than
or equal to the table size.
It would also be sufficient to just check that each fanout value is
smaller than the final one, but the midx and pack idx code both do
a full monotonicity check. It's the same cost, and it catches some
other corruptions (though not all; the checks done by "commit-graph
verify" are more complete but more expensive, and our goal here is
to be fast and memory-safe).
There are two new tests. One just checks the final fanout value (this is
the mirror image of the "too small oid lookup" case added for the midx
in the previous commit; it's flipped here because commit-graph considers
the oid lookup chunk to be the source of truth).
The other actually creates a fanout with many out-of-bounds entries, and
prior to this patch, it does cause the segfault you'd expect. But note
that the error is not "your fanout entry is out-of-bounds", but rather
"fanout value out of order". That's because we leave the final fanout
value in place (to get past the table size check), making the index
non-monotonic (the second-to-last entry is big, but the last one must
remain small to match the actual table).
We need adjustments to a few existing tests, as well:
- an earlier test in t5318 corrupts the fanout and runs "commit-graph
verify". Its message is now changed, since we catch the problem
earlier (during the load step, rather than the careful validation
step).
- in t5324, we test that "commit-graph verify --shallow" does not do
expensive verification on the base file of the chain. But the
corruption it uses (munging a byte at offset 1000) happens to be in
the middle of the fanout table. And now we detect that problem in
the cheaper checks that are performed for every part of the graph.
We'll push this back to offset 1500, which is only caught by the
more expensive checksum validation.
Likewise, there's a later test in t5324 which munges an offset 100
bytes into a file (also in the fanout table) that is referenced by
an alternates file. So we now find that corruption during the load
step, rather than the verification step. At the very least we need
to change the error message (like the case above in t5318). But it
is probably good to make sure we handle all parts of the
verification even for alternate graph files. So let's likewise
corrupt byte 1500 and make sure we found the invalid checksum.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10 00:04:58 +03:00
|
|
|
EOF
|
|
|
|
test_cmp expect.err err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'reader notices out-of-bounds fanout' '
|
|
|
|
# Rather than try to corrupt a specific hash, we will just
|
|
|
|
# wreck them all. But we cannot just set them all to 0xFFFFFFFF or
|
|
|
|
# similar, as they are used for hi/lo starts in a binary search (so if
|
|
|
|
# they are identical, that indicates that the search should abort
|
|
|
|
# immediately). Instead, we will give them high values that differ by
|
|
|
|
# 2^24, ensuring that any that are used would cause an out-of-bounds
|
|
|
|
# read.
|
|
|
|
check_corrupt_chunk OIDF 0 $(printf "%02x000000" $(test_seq 0 254)) &&
|
|
|
|
cat >expect.err <<-\EOF &&
|
|
|
|
error: commit-graph fanout values out of order
|
2023-11-09 10:25:07 +03:00
|
|
|
error: commit-graph required OID fanout chunk missing or corrupted
|
commit-graph: check consistency of fanout table
We use bsearch_hash() to look up items in the oid index of a
commit-graph. It also has a fanout table to reduce the initial range in
which we'll search. But since the fanout comes from the on-disk file, a
corrupted or malicious file can cause us to look outside of the
allocated index memory.
One solution here would be to pass the total table size to
bsearch_hash(), which could then bounds check the values it reads from
the fanout. But there's an inexpensive up-front check we can do, and
it's the same one used by the midx and pack idx code (both of which
likewise have fanout tables and use bsearch_hash(), but are not affected
by this bug):
1. We can check the value of the final fanout entry against the size
of the table we got from the index chunk. These must always match,
since the fanout is just slicing up the index.
As a side note, the midx and pack idx code compute it the other
way around: they use the final fanout value as the object count, and
check the index size against it. Either is valid; if they
disagree we cannot know which is wrong (a corrupted fanout value,
or a too-small table of oids).
2. We can quickly scan the fanout table to make sure it is
monotonically increasing. If it is, then we know that every value
is less than or equal to the final value, and therefore less than
or equal to the table size.
It would also be sufficient to just check that each fanout value is
smaller than the final one, but the midx and pack idx code both do
a full monotonicity check. It's the same cost, and it catches some
other corruptions (though not all; the checks done by "commit-graph
verify" are more complete but more expensive, and our goal here is
to be fast and memory-safe).
There are two new tests. One just checks the final fanout value (this is
the mirror image of the "too small oid lookup" case added for the midx
in the previous commit; it's flipped here because commit-graph considers
the oid lookup chunk to be the source of truth).
The other actually creates a fanout with many out-of-bounds entries, and
prior to this patch, it does cause the segfault you'd expect. But note
that the error is not "your fanout entry is out-of-bounds", but rather
"fanout value out of order". That's because we leave the final fanout
value in place (to get past the table size check), making the index
non-monotonic (the second-to-last entry is big, but the last one must
remain small to match the actual table).
We need adjustments to a few existing tests, as well:
- an earlier test in t5318 corrupts the fanout and runs "commit-graph
verify". Its message is now changed, since we catch the problem
earlier (during the load step, rather than the careful validation
step).
- in t5324, we test that "commit-graph verify --shallow" does not do
expensive verification on the base file of the chain. But the
corruption it uses (munging a byte at offset 1000) happens to be in
the middle of the fanout table. And now we detect that problem in
the cheaper checks that are performed for every part of the graph.
We'll push this back to offset 1500, which is only caught by the
more expensive checksum validation.
Likewise, there's a later test in t5324 which munges an offset 100
bytes into a file (also in the fanout table) that is referenced by
an alternates file. So we now find that corruption during the load
step, rather than the verification step. At the very least we need
to change the error message (like the case above in t5318). But it
is probably good to make sure we handle all parts of the
verification even for alternate graph files. So let's likewise
corrupt byte 1500 and make sure we found the invalid checksum.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10 00:04:58 +03:00
|
|
|
EOF
|
|
|
|
test_cmp expect.err err
|
|
|
|
'
|
|
|
|
|
2023-10-10 00:05:36 +03:00
|
|
|
test_expect_success 'reader notices too-small commit data chunk' '
|
|
|
|
check_corrupt_chunk CDAT clear 00000000 &&
|
|
|
|
cat >expect.err <<-\EOF &&
|
|
|
|
error: commit-graph commit data chunk is wrong size
|
2023-11-09 10:14:34 +03:00
|
|
|
error: commit-graph required commit data chunk missing or corrupted
|
2023-10-10 00:05:36 +03:00
|
|
|
EOF
|
|
|
|
test_cmp expect.err err
|
|
|
|
'
|
|
|
|
|
commit-graph: detect out-of-bounds extra-edges pointers
If an entry in a commit-graph file has more than 2 parents, the
fixed-size parent fields instead point to an offset within an "extra
edges" chunk. We blindly follow these, assuming that the chunk is
present and sufficiently large; this can lead to an out-of-bounds read
for a corrupt or malicious file.
We can fix this by recording the size of the chunk and adding a
bounds-check in fill_commit_in_graph(). There are a few tricky bits:
1. We'll switch from working with a pointer to an offset. This makes
some corner cases just fall out naturally:
a. If we did not find an EDGE chunk at all, our size will
correctly be zero (so everything is "out of bounds").
b. Comparing "size / 4" lets us make sure we have at least 4 bytes
to read, and we never compute a pointer more than one element
past the end of the array (computing a larger pointer is
probably OK in practice, but is technically undefined
behavior).
c. The current code casts to "uint32_t *". Replacing it with an
offset avoids any comparison between different types of pointer
(since the chunk is stored as "unsigned char *").
2. This is the first case in which fill_commit_in_graph() may return
anything but success. We need to make sure to roll back the
"parsed" flag (and any parents we might have added before running
out of buffer) so that the caller can cleanly fall back to
loading the commit object itself.
It's a little non-trivial to do this, and we might benefit from
factoring it out. But we can wait on that until we actually see a
second case where we return an error.
As a bonus, this lets us drop the st_mult() call. Since we've already
done a bounds check, we know there won't be any integer overflow (it
would imply our buffer is larger than a size_t can hold).
The included test does not actually segfault before this patch (though
you could construct a case where it does). Instead, it reads garbage
from the next chunk which results in it complaining about a bogus parent
id. This is sufficient for our needs, though (we care that the fallback
succeeds, and that stderr mentions the out-of-bounds read).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10 00:05:38 +03:00
|
|
|
test_expect_success 'reader notices out-of-bounds extra edge' '
|
|
|
|
check_corrupt_chunk EDGE clear &&
|
|
|
|
cat >expect.err <<-\EOF &&
|
|
|
|
error: commit-graph extra-edges pointer out of bounds
|
|
|
|
EOF
|
|
|
|
test_cmp expect.err err
|
|
|
|
'
|
|
|
|
|
2023-10-10 00:05:44 +03:00
|
|
|
test_expect_success 'reader notices too-small generations chunk' '
|
|
|
|
check_corrupt_chunk GDA2 clear 00000000 &&
|
|
|
|
cat >expect.err <<-\EOF &&
|
|
|
|
error: commit-graph generations chunk is wrong size
|
|
|
|
EOF
|
|
|
|
test_cmp expect.err err
|
|
|
|
'
|
|
|
|
|
commit-graph: introduce envvar to disable commit existence checks
Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.
We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.
In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.
The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 10:16:13 +03:00
|
|
|
test_expect_success 'stale commit cannot be parsed when given directly' '
|
|
|
|
test_when_finished "rm -rf repo" &&
|
|
|
|
git init repo &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
test_commit A &&
|
|
|
|
test_commit B &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
|
|
|
|
oid=$(git rev-parse B) &&
|
|
|
|
rm .git/objects/"$(test_oid_to_path "$oid")" &&
|
|
|
|
|
|
|
|
# Verify that it is possible to read the commit from the
|
|
|
|
# commit graph when not being paranoid, ...
|
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 14:08:21 +03:00
|
|
|
git rev-list B &&
|
commit-graph: introduce envvar to disable commit existence checks
Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.
We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.
In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.
The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 10:16:13 +03:00
|
|
|
# ... but parsing the commit when double checking that
|
|
|
|
# it actually exists in the object database should fail.
|
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 14:08:21 +03:00
|
|
|
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git rev-list -1 B
|
commit-graph: introduce envvar to disable commit existence checks
Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.
We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.
In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.
The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 10:16:13 +03:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 10:16:18 +03:00
|
|
|
test_expect_success 'stale commit cannot be parsed when traversing graph' '
|
|
|
|
test_when_finished "rm -rf repo" &&
|
|
|
|
git init repo &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
test_commit A &&
|
|
|
|
test_commit B &&
|
|
|
|
test_commit C &&
|
|
|
|
git commit-graph write --reachable &&
|
|
|
|
|
|
|
|
# Corrupt the repository by deleting the intermediate commit
|
|
|
|
# object. Commands should notice that this object is absent and
|
|
|
|
# thus that the repository is corrupt even if the commit graph
|
|
|
|
# exists.
|
|
|
|
oid=$(git rev-parse B) &&
|
|
|
|
rm .git/objects/"$(test_oid_to_path "$oid")" &&
|
|
|
|
|
|
|
|
# Again, we should be able to parse the commit when not
|
|
|
|
# being paranoid about commit graph staleness...
|
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 14:08:21 +03:00
|
|
|
git rev-parse HEAD~2 &&
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 10:16:18 +03:00
|
|
|
# ... but fail when we are paranoid.
|
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 14:08:21 +03:00
|
|
|
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git rev-parse HEAD~2 2>error &&
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 10:16:18 +03:00
|
|
|
grep "error: commit $oid exists in commit-graph but not in the object database" error
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2018-04-02 23:34:20 +03:00
|
|
|
test_done
|