The 'read_generation_data' member of 'struct commit_graph' was
introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire
chain does, 2021-01-16). The intention was to avoid using corrected
commit dates if not all layers of a commit-graph had that data stored.
The logic in validate_mixed_generation_chain() at that point incorrectly
initialized read_generation_data to 1 if and only if the tip
commit-graph contained the Corrected Commit Date chunk.
This was "fixed" in 448a39e65 (commit-graph: validate layers for
generation data, 2021-02-02) to validate that read_generation_data was
either non-zero for all layers, or it would set read_generation_data to
zero for all layers.
The problem here is that read_generation_data is not initialized to be
non-zero anywhere!
This change initializes read_generation_data immediately after the chunk
is parsed, so each layer will have its value present as soon as
possible.
The read_generation_data member is used in fill_commit_graph_info() to
determine if we should use the corrected commit date or the topological
levels stored in the Commit Data chunk. Due to this bug, all previous
versions of Git were defaulting to topological levels in all cases!
This can be measured with some performance tests. Using the Linux kernel
as a testbed, I generated a complete commit-graph containing corrected
commit dates and tested the 'new' version against the previous, 'old'
version.
First, rev-list with --topo-order demonstrates a 26% improvement using
corrected commit dates:
hyperfine \
-n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \
-n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \
--warmup=10
Benchmark 1: old
Time (mean ± σ): 57.1 ms ± 3.1 ms
Range (min … max): 52.9 ms … 62.0 ms 55 runs
Benchmark 2: new
Time (mean ± σ): 45.5 ms ± 3.3 ms
Range (min … max): 39.9 ms … 51.7 ms 59 runs
Summary
'new' ran
1.26 ± 0.11 times faster than 'old'
These performance improvements are due to the algorithmic improvements
given by walking fewer commits due to the higher cutoffs from corrected
commit dates.
However, this comes at a cost. The additional I/O cost of parsing the
corrected commit dates is visible in case of merge-base commands that do
not reduce the overall number of walked commits.
hyperfine \
-n "old" "$OLD_GIT merge-base v4.8 v4.9" \
-n "new" "$NEW_GIT merge-base v4.8 v4.9" \
--warmup=10
Benchmark 1: old
Time (mean ± σ): 110.4 ms ± 6.4 ms
Range (min … max): 96.0 ms … 118.3 ms 25 runs
Benchmark 2: new
Time (mean ± σ): 150.7 ms ± 1.1 ms
Range (min … max): 149.3 ms … 153.4 ms 19 runs
Summary
'old' ran
1.36 ± 0.08 times faster than 'new'
Performance issues like this are what motivated 702110aac (commit-graph:
use config to specify generation type, 2021-02-25).
In the future, we could fix this performance problem by inserting the
corrected commit date offsets into the Commit Date chunk instead of
having that data in an extra chunk.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The graph_git_behavior helper is useful for testing that certain Git
commands behave the same when using the commit-graph and when not using
the commit-graph. Extract it to a new lib-commit-graph.sh file for use
in new test scripts that will split out from t5318.
While doing this extraction, also extract graph_read_expect and the
logic for priming the test_oid_cache.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>