The codepath to show progress meter while writing out commit-graph
file has been improved.
* ab/commit-graph-write-progress:
commit-graph write: emit a percentage for all progress
commit-graph write: add itermediate progress
commit-graph write: remove empty line for readability
commit-graph write: add more descriptive progress output
commit-graph write: show progress for object search
commit-graph write: more descriptive "writing out" output
commit-graph write: add "Writing out" progress output
commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily
commit-graph: rename "large edges" to "extra edges"
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.
* jt/fetch-v2-sideband:
tests: define GIT_TEST_SIDEBAND_ALL
{fetch,upload}-pack: sideband v2 fetch response
sideband: reverse its dependency on pkt-line
pkt-line: introduce struct packet_writer
pack-protocol.txt: accept error packets in any context
Use packet_reader instead of packet_read_line
Update the protocol message specification to allow only the limited
use of scaled quantities. This is ensure potential compatibility
issues will not go out of hand.
* js/filter-options-should-use-plain-int:
filter-options: expand scaled numbers
tree:<depth>: skip some trees even when collecting omits
list-objects-filter: teach tree:# how to handle >0
"git fetch --recurse-submodules" may not fetch the necessary commit
that is bound to the superproject, which is getting corrected.
* sb/submodule-recursive-fetch-gets-the-tip:
fetch: ensure submodule objects fetched
submodule.c: fetch in submodules git directory instead of in worktree
submodule: migrate get_next_submodule to use repository structs
repository: repo_submodule_init to take a submodule struct
submodule: store OIDs in changed_submodule_names
submodule.c: tighten scope of changed_submodule_names struct
submodule.c: sort changed_submodule_names before searching it
submodule.c: fix indentation
sha1-array: provide oid_array_filter
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a response to a fetch request has sideband support only while
the packfile is being sent, meaning that the server cannot send notices
until the start of the packfile.
Extend sideband support in protocol v2 fetch responses to the whole
response. upload-pack will advertise it if the
uploadpack.allowsidebandall configuration variable is set, and
fetch-pack will automatically request it if advertised.
If the sideband is to be used throughout the whole response, upload-pack
will use it to send errors instead of prefixing a PKT-LINE payload with
"ERR ".
This will be tested in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When communicating with a remote server or a subprocess, use
expanded numbers rather than numbers with scaling suffix in the
object filter spec (e.g. "limit:blob=1k" becomes
"limit:blob=1024").
Update the protocol docs to note that clients should always perform this
expansion, to allow for more compatibility between server
implementations.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.
Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.
Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All config extensions are described in technical/repository-version.txt.
I made a mistake of adding it in config.txt instead. This patch moves
it back to where it belongs.
Since repository-version.txt is not part of officially generated
documents (it's not even part of DOC_HTML target), it's only visible
to developers who read plain .txt files. Let's include it in
gitrepository-layout.5 for more visibility. Some minor asciidoc fixes
are required in repository-version.txt to make this happen.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are no users of OPT_DATE except for test-parse-options; its
only caller went away in 27ec394a97 (prune: introduce OPT_EXPIRY_DATE()
and use it, 2013-04-25).
It also has a bug: it does not specify PARSE_OPT_NONEG, but its callback
does not respect the "unset" flag, and will feed NULL to approxidate()
and segfault. Probably this should be marked with NONEG, or the callback
should set the timestamp to some sentinel value (e.g,. "0", or
"(time_t)-1").
But since there are no callers, deleting it means we don't even have to
think about what the right behavior should be.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new extension to the index file has been introduced, which allows
the file to be read in parallel.
* bp/read-cache-parallel:
read-cache: load cache entries on worker threads
ieot: add Index Entry Offset Table (IEOT) extension
read-cache: load cache extensions on a worker thread
config: add new index.threads config setting
eoie: add End of Index Entry (EOIE) extension
read-cache: clean up casting and byte decoding
read-cache.c: optimize reading index format v4
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".
* nd/the-index: (23 commits)
revision.c: reduce implicit dependency the_repository
revision.c: remove implicit dependency on the_index
ws.c: remove implicit dependency on the_index
tree-diff.c: remove implicit dependency on the_index
submodule.c: remove implicit dependency on the_index
line-range.c: remove implicit dependency on the_index
userdiff.c: remove implicit dependency on the_index
rerere.c: remove implicit dependency on the_index
sha1-file.c: remove implicit dependency on the_index
patch-ids.c: remove implicit dependency on the_index
merge.c: remove implicit dependency on the_index
merge-blobs.c: remove implicit dependency on the_index
ll-merge.c: remove implicit dependency on the_index
diff-lib.c: remove implicit dependency on the_index
read-cache.c: remove implicit dependency on the_index
diff.c: remove implicit dependency on the_index
grep.c: remove implicit dependency on the_index
diff.c: remove the_index dependency in textconv() functions
blame.c: rename "repo" argument to "r"
combine-diff.c: remove implicit dependency on the_index
...
Doc update.
* ma/commit-graph-docs:
Doc: refer to the "commit-graph file" with dash
git-commit-graph.txt: refer to "*commit*-graph file"
git-commit-graph.txt: typeset more in monospace
git-commit-graph.txt: fix bullet lists
The recently introduced commit-graph auxiliary data is incompatible
with mechanisms such as replace & grafts that "breaks" immutable
nature of the object reference relationship. Disable optimizations
based on its use (and updating existing commit-graph) when these
incompatible features are in use in the repository.
* ds/commit-graph-with-grafts:
commit-graph: close_commit_graph before shallow walk
commit-graph: not compatible with uninitialized repo
commit-graph: not compatible with grafts
commit-graph: not compatible with replace objects
test-repository: properly init repo
commit-graph: update design document
refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
refs.c: migrate internal ref iteration to pass thru repository argument
This patch enables addressing the CPU cost of loading the index by adding
additional data to the index that will allow us to efficiently multi-
thread the loading and conversion of cache entries.
It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file. To make
this work for V4 indexes, when writing the cache entries, it periodically
"resets" the prefix-compression by encoding the current entry as if the
path name for the previous entry is completely different and saves the
offset of that entry in the IEOT. Basically, with V4 indexes, it
generates offsets into blocks of prefix-compressed entries.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The End of Index Entry (EOIE) is used to locate the end of the variable
length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.
The EOIE extension is always written out to the index file including to
the shared index when using the split index feature. Because it is always
written out, the SHA checksums in t/t1700-split-index.sh were updated
to reflect its inclusion.
It is written as an optional extension to ensure compatibility with other
git implementations that do not yet support it. It is always written out
to ensure it is available as often as possible to speed up index operations.
Because it must be able to be loaded before the variable length cache
entries and other index extensions, this extension must be written last.
The signature for this extension is { 'E', 'O', 'I', 'E' }.
The extension consists of:
- 32-bit offset to the end of the index entries
- 160-bit SHA-1 over the extension types and their sizes (but not
their contents). E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:
SHA-1("TREE" + <binary representation of N> +
"REUC" + <binary representation of M>)
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sample code calls `get_revision()' followed by `graph_update()',
but the documentation and source code indicate that `get_revision()'
already calls `graph_update()' for you.
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file processed by `git commit-graph` is referred to as the
"commit-graph file", also with a dash. We have a few references to the
"commit graph file", though, without the dash. These occur in
git-commit-graph.txt as well as in Doc/technical/commit-graph.txt. Fix
them.
Do not change the references to the "commit graph" (without "... file")
as a data structure.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new variant repo_diff_setup() is added that takes 'struct repository *'
and diff_setup() becomes a thin macro around it that is protected by
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to NO_THE_INDEX_....
The plan is these macros will always be defined for all library files
and the macros are only accessible in builtin/
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify a part of technical documentation for rerere.
* tg/rerere-doc-updates:
rerere: add note about files with existing conflict markers
rerere: mention caveat about unmatched conflict markers
Fixes to "git rerere" corner cases, especially when conflict
markers cannot be parsed in the file.
* tg/rerere:
rerere: recalculate conflict ID when unresolved conflict is committed
rerere: teach rerere to handle nested conflicts
rerere: return strbuf from handle path
rerere: factor out handle_conflict function
rerere: only return whether a path has conflicts or not
rerere: fix crash with files rerere can't handle
rerere: add documentation for conflict normalization
rerere: mark strings for translation
rerere: wrap paths in output in sq
rerere: lowercase error messages
rerere: unify error messages when read_cache fails
4af3220 ("rerere: teach rerere to handle nested conflicts",
2018-08-05) introduced slightly better behaviour if the user commits
conflict markers and then gets another conflict in 'git rerere'.
However this is just a heuristic to punt on such conflicts better, and
doesn't deal with any unmatched conflict markers. Make that clearer
in the documentation.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As it exists right now, the commit-graph feature may provide
inconsistent results when combined with commit grafts, replace objects,
and shallow clones. Update the design document to discuss why these
interactions are difficult to reconcile and how we will avoid errors by
preventing updates to and reads from the commit-graph file when these
other features exist.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ds/multi-pack-index: (23 commits)
midx: clear midx on repack
packfile: skip loading index if in multi-pack-index
midx: prevent duplicate packfile loads
midx: use midx in approximate_object_count
midx: use existing midx when writing new one
midx: use midx in abbreviation calculations
midx: read objects from multi-pack-index
config: create core.multiPackIndex setting
midx: write object offsets
midx: write object id fanout chunk
midx: write object ids in a chunk
midx: sort and deduplicate objects from packfiles
midx: read pack names into array
multi-pack-index: write pack names in chunk
multi-pack-index: read packfile list
packfile: generalize pack directory list
t5319: expand test data
multi-pack-index: load into memory
midx: write header information to lockfile
multi-pack-index: add 'write' verb
...
Documentation update.
* ab/newhash-is-sha256:
doc hash-function-transition: pick SHA-256 as NewHash
doc hash-function-transition: note the lack of a changelog
Rendered documentation can be easier to read than raw text because
headings and emphasized phrases stand out. Add the missing markup and
Makefile rule required to render this design document using asciidoc.
Tested by running
make -C Documentation technical/partial-clone.html
and viewing the output in a browser.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256,
K12, and so on are all believed to have similar security properties.
All are good options from a security point of view.
SHA-256 has a number of advantages:
* It has been around for a while, is widely used, and is supported by
just about every single crypto library (OpenSSL, mbedTLS, CryptoNG,
SecureTransport, etc).
* When you compare against SHA1DC, most vectorized SHA-256
implementations are indeed faster, even without acceleration.
* If we're doing signatures with OpenPGP (or even, I suppose, CMS),
we're going to be using SHA-2, so it doesn't make sense to have our
security depend on two separate algorithms when either one of them
alone could break the security when we could just depend on one.
So SHA-256 it is. Update the hash-function-transition design doc to
say so.
After this patch, there are no remaining instances of the string
"NewHash", except for an unrelated use from 2008 as a variable name in
t/t9700/test.pl.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Acked-by: Dan Shumow <danshu@microsoft.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts. Do that by recursively calling the
'handle_conflict' function to normalize the conflict.
Note that a conflict like this would only be produced if a user
commits a file with conflict markers, and gets a conflict including
that in a susbsequent operation.
The conflict ID calculation here deserves some explanation:
As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.
The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output. e.g. say there's the following conflict:
<<<<<<< HEAD
1
=======
<<<<<<< HEAD
3
=======
2
>>>>>>> branch-2
>>>>>>> branch-3~
it would be recorde as follows in the preimage:
<<<<<<<
1
=======
<<<<<<<
2
=======
3
>>>>>>>
>>>>>>>
and the conflict ID would be calculated as
sha1(1<NUL><<<<<<<
2
=======
3
>>>>>>><NUL>)
Stripping out vs. leaving the conflict markers in place in the inner
conflict should have no practical impact, but it simplifies the
implementation.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some documentation for the logic behind the conflict normalization
in rerere.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" learns to make sure the optional commit-graph file is in
a sane state.
* ds/commit-graph-fsck: (23 commits)
coccinelle: update commit.cocci
commit-graph: update design document
gc: automatically write commit-graph files
commit-graph: add '--reachable' option
commit-graph: use string-list API for input
fsck: verify commit-graph
commit-graph: verify contents match checksum
commit-graph: test for corrupted octopus edge
commit-graph: verify commit date
commit-graph: verify generation number
commit-graph: verify parent list
commit-graph: verify root tree OIDs
commit-graph: verify objects exist
commit-graph: verify corrupt OID fanout and lookup
commit-graph: verify required chunks are present
commit-graph: verify catches corrupt signature
commit-graph: add 'verify' subcommand
commit-graph: load a root tree from specific graph
commit: force commit to parse from object database
commit-graph: parse commit from chosen graph
...
Unlike ref advertisement, client capabilities and the first want are
separated by SP, not NUL, in the implementation. Fix the documentation
to align with the implementation. pack-protocol.txt is already fixed.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changelog embedded in the document pre-dates the addition of the
document to git.git (it used to be a Google Doc), so it only goes up
to 752414ae43 ("technical doc: add a design doc for hash function
transition", 2017-09-27).
Since then I made some small edits to it, which would have been worthy
of including in this changelog (but weren't). Instead of amending it
to include these, just note that future changes will be noted in the
log.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Partial clone support of "git clone" has been updated to correctly
validate the objects it receives from the other side. The server
side has been corrected to send objects that are directly
requested, even if they may match the filtering criteria (e.g. when
doing a "lazy blob" partial clone).
* jt/partial-clone-fsck-connectivity:
clone: check connectivity even if clone is partial
upload-pack: send refs' objects despite "filter"
"git fetch" failed to correctly validate the set of objects it
received when making a shallow history deeper, which has been
corrected.
* jt/connectivity-check-after-unshallow:
fetch-pack: write shallow, then check connectivity
fetch-pack: implement ref-in-want
fetch-pack: put shallow info in output parameter
fetch: refactor to make function args narrower
fetch: refactor fetch_refs into two functions
fetch: refactor the population of peer ref OIDs
upload-pack: test negotiation with changing repository
upload-pack: implement ref-in-want
test-pkt-line: add unpack-sideband subcommand
"git rebase" behaved slightly differently depending on which one of
the three backends gets used; this has been documented and an
effort to make them more uniform has begun.
* en/rebase-consistency:
git-rebase: make --allow-empty-message the default
t3401: add directory rename testcases for rebase and am
git-rebase.txt: document behavioral differences between modes
directory-rename-detection.txt: technical docs on abilities and limitations
git-rebase.txt: address confusion between --no-ff vs --force-rebase
git-rebase: error out when incompatible options passed
t3422: new testcases for checking when incompatible options passed
git-rebase.sh: update help messages a bit
git-rebase.txt: document incompatible options
The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.
We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.
Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.
Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>