Граф коммитов

9 Коммитов

Автор SHA1 Сообщение Дата
Jonathan Nieder c846e41078 vcs-svn: let deltas use data from preimage
The copyfrom_source instruction appends data from the preimage buffer
to the end of output.  Its arguments are a length and an offset
relative to the beginning of the source view.

With this change, the delta applier is able to reproduce all 5,636,613
blobs in the early history of the ASF repository.  Tested with

	mkfifo backflow
	svn-fe <svn-asf-public-r0:940166 3<backflow |
	git fast-import --cat-blob-fd=3 3>backflow

with svn-asf-public-r0:940166 produced by whatever version of
Subversion the dumps in /dump/ on svn.apache.org use (presumably
1.6.something).

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-28 00:33:48 -05:00
Jonathan Nieder d3f131b57e vcs-svn: let deltas use data from postimage
The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output.  (The offset
argument is relative to the beginning of output produced in the
current window.)

The region copied is allowed to run past the end of the existing
output.  To support that case, copy one character at a time rather
than calling memcpy or memmove.  This allows copyfrom_target to be
used once to repeat a string many times.  For example:

	COPYFROM_DATA 2
	COPYFROM_OUTPUT 10, 0
	DATA "ab"

would produce the output "ababababababababababab".

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:28:27 -05:00
Jonathan Nieder 4c9b93ed76 vcs-svn: verify that deltas consume all inline data
By constraining the format of deltas, we can more easily detect
corruption and other breakage.

Requiring deltas not to provide unconsumed data also opens the
possibility of ignoring the declared amount of novel data and simply
streaming the data as needed to fulfill copyfrom_data requests.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:28:02 -05:00
Jonathan Nieder ec71aa2e1f vcs-svn: implement copyfrom_data delta instruction
The copyfrom_data instruction copies a few bytes verbatim from the
novel text section of a window to the postimage.

[jn: with memory leak fix from David]

Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:27:55 -05:00
Jonathan Nieder ef2ac77e9f vcs-svn: read instructions from deltas
Buffer the instruction section upon encountering it for later
interpretation.

An alternative design would involve parsing the instructions
at this point and buffering them in some processed form.  Using
the unprocessed form is simpler.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:02:05 -05:00
Jonathan Nieder fc4ae43b2c vcs-svn: read inline data from deltas
Each window of an svndiff0-format delta includes a section for novel
text to be copied to the postimage (in the order it appears in the
window, possibly interspersed with other data).

Slurp in this data when encountering it.  It is not actually necessary
to do so --- it would be just as easy to copy from delta to output
as part of interpreting the relevant instructions --- but this way,
the code that interprets svndiff0 instructions can proceed very
quickly because it does not require I/O.

Subversion's svndiff0 parser rejects deltas that do not consume all
the novel text that was provided.  Omit that check for now so we can
test the new functionality right away, rather than waiting to learn
instructions that consume data.

Do check for truncated data sections.  Subversion's parser rejects
deltas that end in the middle of a declared novel-text section, so it
should be safe for us to reject them, too.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 22:51:00 -05:00
Jonathan Nieder bcd254621f vcs-svn: read the preimage when applying deltas
The source view offset heading each svndiff0 window represents a
number of bytes past the beginning of the preimage.  Together with the
source view length, it dictates to the delta applier what portion of
the preimage instructions will refer to.  Read that portion right away
using the sliding window code.

Maybe some day we will use mmap to read data more lazily.

Subversion's implementation tolerates source view offsets pointing
past the end of the preimage file but we do not, for simplicity.

This does not teach the delta applier to read instructions or copy
data from the source view.  Deltas that could produce nonempty output
will still be rejected.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 22:50:01 -05:00
Jonathan Nieder 252712111f vcs-svn: parse svndiff0 window header
Each window in a subversion delta (svndiff0-format file) starts with a
window header, consisting of five integers with variable-length
representation:

	source view offset
	source view length
	output length
	instructions length
	auxiliary data length

Parse it.  The result is not usable for deltas with nonempty postimage
yet; in fact, this only adds support for deltas without any
instructions or auxiliary data.  This is a good place to stop, though,
since that little support lets us add some simple passing tests
concerning error handling to the test suite.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:42:53 -05:00
Jonathan Nieder ddcc8c5b46 vcs-svn: skeleton of an svn delta parser
A delta in the subversion delta (svndiff0) format consists of the
magic bytes SVN\0 followed by a sequence of windows of a certain well
specified format (starting with five integers).

Add an svndiff0_apply function and test-svn-fe -d commandline tool to
parse such a delta in the special case of not including any windows.

Later patches will add features to turn this into a fully functional
delta applier for svn-fe to use to parse the streams produced by
"svnrdump dump" and "svnadmin dump --deltas".

The content of symlinks starts with the word "link " in Subversion's
worldview, so we need to be able to prepend that text to input for the
sake of delta application.  So initialization of the input state of
the delta preimage is left to the calling program, giving callers a
chance to seed the buffer with text of their choice.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:41:38 -05:00