Граф коммитов

49 Коммитов

Автор SHA1 Сообщение Дата
Jonathan Nieder 150f75467c vcs-svn: allow import of > 4GiB files
There is no reason in principle that an svn-format dump would not be
able to represent a file whose length does not fit in a 32-bit
integer.  Use off_t consistently to represent file lengths (in place
of using uint32_t in some contexts) so we can handle that.

Most svn-fe code is already ready to do that without this patch and
passes values of type off_t around.  The type mismatch from stragglers
was noticed with gcc -Wtype-limits.

While at it, tighten the parsing of the Text-content-length field to
make sure it is a number and does not overflow, and tighten other
overflow checks as that value is passed around and manipulated.

Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:03:30 -08:00
Junio C Hamano d475536658 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn into jn/svn-fe
This simplifies svn-fe a great deal and fulfills a longstanding wish:
support for dumps with deltas in them, and incremental imports.

The cost is that commandline usage of the svn-fe tool becomes a little
more complicated since it no longer keeps state itself but instead reads
blobs back from fast-import in order to copy them between revisions and
apply deltas to them.

Also removes a couple of custom data structures and replaces them with
strbufs like other parts of Git.

* 'svn-fe' of git://repo.or.cz/git/jrn: (32 commits)
  vcs-svn: reset first_commit_done in fast_export_init
  vcs-svn: do not initialize report_buffer twice
  vcs-svn: avoid hangs from corrupt deltas
  vcs-svn: guard against overflow when computing preimage length
  vcs-svn: cap number of bytes read from sliding view
  test-svn-fe: split off "test-svn-fe -d" into a separate function
  vcs-svn: implement text-delta handling
  vcs-svn: let deltas use data from preimage
  vcs-svn: let deltas use data from postimage
  vcs-svn: verify that deltas consume all inline data
  vcs-svn: implement copyfrom_data delta instruction
  vcs-svn: read instructions from deltas
  vcs-svn: read inline data from deltas
  vcs-svn: read the preimage when applying deltas
  vcs-svn: parse svndiff0 window header
  vcs-svn: skeleton of an svn delta parser
  vcs-svn: make buffer_read_binary API more convenient
  vcs-svn: learn to maintain a sliding view of a file
  Makefile: list one vcs-svn/xdiff object or header per line
  vcs-svn: avoid using ls command twice
  ...

Conflicts:
	Makefile
	contrib/svn-fe/svn-fe.txt
2012-01-27 11:20:00 -08:00
David Barr 7a75e661c5 vcs-svn: implement text-delta handling
Handle input in Subversion's dumpfile format, version 3.  This is the
format produced by "svnrdump dump" and "svnadmin dump --deltas", and
the main difference between v3 dumpfiles and the dumpfiles already
handled is that these can include nodes whose properties and text are
expressed relative to some other node.

To handle such nodes, we find which node the text and properties are
based on, handle its property changes, use the cat-blob command to
request the basis blob from the fast-import backend, use the
svndiff0_apply() helper to apply the text delta on the fly, writing
output to a temporary file, and then measure that postimage file's
length and write its content to the fast-import stream.

The temporary postimage file is shared between delta-using nodes to
avoid some file system overhead.

The svn-fe interface needs to be more complicated to accomodate the
backward flow of information from the fast-import backend to svn-fe.
The backflow fd is not needed when parsing streams without deltas,
though, so existing scripts using svn-fe on v2 dumps should
continue to work.

NEEDSWORK: generalize interface so caller sets the backflow fd, close
temporary file before exiting

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-05-26 02:28:04 -05:00
Jonathan Nieder c19d653c4f Merge branch 'db/svn-fe-code-purge' into svn-fe
* db/svn-fe-code-purge:
  vcs-svn: drop obj_pool
  vcs-svn: drop treap
  vcs-svn: drop string_pool
  vcs-svn: pass paths through to fast-import

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/repo_tree.h
	vcs-svn/string_pool.c
	vcs-svn/svndump.c
	vcs-svn/trp.txt
2011-05-26 02:12:14 -05:00
Jonathan Nieder 9ecfa8ae4c Merge branch 'db/vcs-svn-incremental' into svn-fe
This teaches svn-fe to incrementally import into an existing
repository (at last!) at the expense of less convenient UI.  Think of
it as growing pains.  This opens the door to many excellent things,
and it would be a bad idea to discourage people from building on it
for much longer.

* db/vcs-svn-incremental:
  vcs-svn: avoid using ls command twice
  vcs-svn: use mark from previous import for parent commit
  vcs-svn: handle filenames with dq correctly
  vcs-svn: quote paths correctly for ls command
  vcs-svn: eliminate repo_tree structure
  vcs-svn: add a comment before each commit
  vcs-svn: save marks for imported commits
  vcs-svn: use higher mark numbers for blobs
  vcs-svn: set up channel to read fast-import cat-blob response

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-05-26 02:02:44 -05:00
Ramsay Jones c51477229e sparse: Fix some "symbol not declared" warnings
In particular, sparse issues the "symbol 'a_symbol' was not declared.
Should it be static?" warnings for the following symbols:

    attr.c:468:12: 'git_etc_gitattributes'
    attr.c:476:5:  'git_attr_system'
    vcs-svn/svndump.c:282:6: 'svndump_read'
    vcs-svn/svndump.c:417:5: 'svndump_init'
    vcs-svn/svndump.c:432:6: 'svndump_deinit'
    vcs-svn/svndump.c:445:6: 'svndump_reset'

The symbols in attr.c only require file scope, so we add the static
modifier to their declaration.

The symbols in vcs-svn/svndump.c are external symbols, and they
already have extern declarations in the "svndump.h" header file,
so we simply include the header in svndump.c.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-22 10:04:27 -07:00
Michael Witten 9e113988d3 vcs-svn: a void function shouldn't try to return something
As v1.7.4-rc0~184 (2010-10-04) and C99 §6.8.6.4.1 remind us, standard
C does not permit returning an expression of type void, even for a
tail call.

Noticed with gcc -pedantic:

 vcs-svn/svndump.c: In function 'handle_node':
 vcs-svn/svndump.c:213:3: warning: ISO C forbids 'return' with expression,
  in function returning void [-pedantic]

[jn: with simplified log message]

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-29 14:47:02 -05:00
David Barr 43155cfe14 vcs-svn: avoid using ls command twice
Currently there are two functions to retrieve the mode and content
at a path:

	const char *repo_read_path(const uint32_t *path);
	uint32_t repo_read_mode(const uint32_t *path)

Replace them with a single function with two return values.  This
means we can use one round-trip to get the same information from
fast-import that previously took two.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 01:00:05 -05:00
Jonathan Nieder 195b7ca6f2 vcs-svn: handle log message with embedded NUL
Pass the log message by strbuf instead of as a C-style string and use
fwrite instead of printf to write it to fast-import so embedded '\0'
bytes can be preserved.

Currently "git log" doesn't show the embedded NULs but "git cat-file
commit" can.

While at it, stop including system headers from repo_tree.h.  git
source files need to include git-compat-util.h (or cache.h or
builtin.h) sooner to ensure the appropriate feature test macros are
defined.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:49:37 -05:00
Jonathan Nieder 4c3169b03e vcs-svn: avoid unnecessary copying of log message and author
Use strbuf_swap when storing the svn:log and svn:author properties, so
pointers to rather than the contents of buffers get copied.  The main
effect should be to make the code a little easier to read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:41:38 -05:00
Jonathan Nieder e7d04ee147 vcs-svn: make reading of properties binary-safe
svn-fe errors out on revision 59151 of the ASF repository:

 fatal: invalid dump: unexpected end of file

The proximate cause is a property with an embedded NUL character.
Previously such anomalies were ignored but commit c9d1c8ba
(2010-12-28) introduced a check strlen(val) == len to avoid reading
uninitialized data when a property list ends early and unfortunately
this test does not distinguish between "foo" followed by EOF and the
string "foo\0bar\0baz".

Fix it by using buffer_read_binary to read to a strbuf and checking
the actual length read.  Most consumers of properties still use
C-style strings, so in practice an author or log message with embedded
NULs will be truncated, but a least this way svn-fe won't error out
(fixing the regression).

Reported-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:15:10 -05:00
Jonathan Nieder 41b9dd9d4f Merge branch 'db/length-as-hash' into svn-fe
* db/length-as-hash:
  vcs-svn: use strchr to find RFC822 delimiter
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys

Conflicts:
	vcs-svn/svndump.c
2011-03-22 18:44:49 -05:00
David Barr 030879718f vcs-svn: pass paths through to fast-import
Now that there is no internal representation of the repo, it is not
necessary to tokenise paths.  Use strbuf instead and bypass
string_pool.

This means svn-fe can handle arbitrarily long paths (as long as a
strbuf can fit them), with arbitrarily many path components.

While at it, since we now treat paths in their entirety, only quote
when necessary.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:32:58 -05:00
Jonathan Nieder fa6c4bceab Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purge
* db/strbufs-for-metadata:
  vcs-svn: use strbuf for author, UUID, and URL
  vcs-svn: use strbuf for revision log

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-03-22 18:19:46 -05:00
Jonathan Nieder 5c674860eb Merge branch 'db/length-as-hash' (early part) into db/svn-fe-code-purge
* 'db/length-as-hash' (early part):
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys
  vcs-svn: improve reporting of input errors
  vcs-svn: make buffer_copy_bytes return length read
  vcs-svn: make buffer_skip_bytes return length read
  vcs-svn: improve support for reading large files

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/svndump.c
2011-03-22 18:11:59 -05:00
David Barr f1602054e3 vcs-svn: use strchr to find RFC822 delimiter
This is a small optimisation (4% reduction in user time) but is the
largest artifact within the parsing portion of svndump.c

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:05 -05:00
David Barr 90c0a3cfe3 vcs-svn: implement perfect hash for top-level keys
Instead of interning property names and comparing their string_pool
keys, look them up in a table by string length, which should be about
as fast.

Another small step towards removing dependence on string_pool
altogether.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:05 -05:00
David Barr 044ad2906a vcs-svn: implement perfect hash for node-prop keys
Instead of interning property names and comparing their string_pool
keys, look them up in a table by string length, which should be about
as fast.

This is a small step towards removing dependence on string_pool.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:02 -05:00
David Barr 7c5817d3ba vcs-svn: use strbuf for author, UUID, and URL
Use strbufs and strings instead of interned strings for values of rev,
dump, and node fields that happen to be strings.  After this change,
the only remaining string_pool use is for paths in the repo_tree API
and internals.

Functional change: treat an empty author, UUID, or URL as none at all.
So for example, in repos where the first revision has an empty
svn:author property, the first rev will be treated as by "nobody"
rather than by a person with empty name and email address created by
prepending an @ sign to the repository UUID.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:01:48 -05:00
David Barr dce33c9c18 vcs-svn: use strbuf for revision log
obj_pool is overkill for this application: all that is needed is a
buffer that can resize from rev to rev to accomodate differently-sized
strings.  In the spirit of commit deadcef4 (2010-11-06), use a strbuf
instead.

This is a small step towards removing dependence on obj_pool.h.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:41:36 -05:00
Jonathan Nieder c9d1c8ba05 vcs-svn: improve reporting of input errors
Catch input errors and exit early enough to print a reasonable
diagnosis based on errno.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:41:09 -05:00
Jonathan Nieder 723b7a2789 vcs-svn: eliminate repo_tree structure
Rely on fast-import for information about previous revs.

This requires always setting up backward flow of information, even for
v2 dumps.  On the plus side, it simplifies the code by quite a bit and
opens the door to further simplifications.

[db: adjusted to support final version of the cat-blob patch]
[jn: avoiding hard-coding git's name for the empty tree for
 portability to other backends]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder 7e11902c99 vcs-svn: add a comment before each commit
Current svn-fe produces output like this:

	blob
	mark :7382321
	data 5
	hello

	blob
	mark :7382322
	data 5
	Hello

	commit
	mark :3
[...]
	M 100644 :7382321 hello.c
	M 100644 :7382322 hello2.c

This means svn-fe has to keep track of the paths modified in each
commit and the corresponding marks, instead of dealing with each file
as it arrives in input and then forgetting about it.  A better
strategy would be to use inline blobs:

	commit
	mark :3
[...]
	M 100644 inline hello.c
	data 5
	hello
[...]

As a first step towards that, teach svn-fe to notice when the
collection of blobs for each commit starts and write a comment
("# commit 3.") there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
David Barr 41529bbce4 vcs-svn: set up channel to read fast-import cat-blob response
Set up some plumbing: teach the svndump lib to pass a file descriptor
number to the fast_export lib, representing where cat-blob/ls
responses can be read from, and add a get_response_line helper
function to the fast_export lib to read a line from that file.

Unfortunately this means that svn-fe needs file descriptor 3 to be
redirected from somewhere (preferrably the cat-blob stream of a
fast-import backend); otherwise it will fail:

	$ svndump <path> | svn-fe
	fatal: cannot read from file descriptor 3: Bad file descriptor

For the moment, "svn-fe 3</dev/null" works as a workaround but it
will not work for very long.  A fast-import backend that can retrieve
old commits is needed in order to be able to fulfill svn
"Node-copyfrom-rev" requests that refer to revs from a previous run.

[jn: with new change description]

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder e75316de53 vcs-svn: simplify repo_modify_path and repo_copy
Restrict the repo_tree API to functions that are actually needed.

 - decouple reading the mode and content of dirents from other
   operations.
 - remove repo_modify_path.  It is only used to read the mode from
   dirents.
 - remove the ability to use repo_read_mode on a missing path.  The
   existing code only errors out in that case, anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder 5a38b186d3 vcs-svn: handle_node: use repo_read_path
svn-fe processes each commit in two stages: first decide on the
correct content for all paths and export the relevant blobs, then
export a commit with the result.

But we can keep less state and simplify svn-fe a great deal by
exporting the commit in one step: use 'inline' blobs for each path and
remember nothing.  This way, the repo_tree structure could be
eliminated, and we would get support for incremental imports 'for
free'.

Reorganize handle_node along these lines.  This is just a code
cleanup; the changes in repo_tree and handle_revision will come later.

[db: backported to apply without text delta support]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder a62bbf8f01 Merge commit 'jn/svn-fe' of git://github.com/gitster/git into svn-fe
* git://github.com/gitster/git:
  vcs-svn: Allow change nodes for root of tree (/)
  vcs-svn: Implement Prop-delta handling
  vcs-svn: Sharpen parsing of property lines
  vcs-svn: Split off function for handling of individual properties
  vcs-svn: Make source easier to read on small screens
  vcs-svn: More dump format sanity checks
  vcs-svn: Reject path nodes without Node-action
  vcs-svn: Delay read of per-path properties
  vcs-svn: Combine repo_replace and repo_modify functions
  vcs-svn: Replace = Delete + Add
  vcs-svn: handle_node: Handle deletion case early
  vcs-svn: Use mark to indicate nodes with included text
  vcs-svn: Unclutter handle_node by introducing have_props var
  vcs-svn: Eliminate node_ctx.mark global
  vcs-svn: Eliminate node_ctx.srcRev global
  vcs-svn: Check for errors from open()
  vcs-svn: Allow simple v3 dumps (no deltas yet)

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/svndump.c
2011-02-26 05:21:29 -06:00
Jonathan Nieder e5e45ca1e3 vcs-svn: teach line_buffer to handle multiple input files
Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

svn-fe's delta applier will use this to stream a delta from svnrdump
and the preimage it applies to from fast-import at the same time.

The tests don't take advantage of the new features, but I think that's
okay.  It is easier to find lingering examples of nonreentrant code by
searching for "static" in line_buffer.c.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:59 -06:00
Ramsay Jones 5ee5f5a65d svndump.c: Fix a printf format compiler warning
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

        CC vcs-svn/svndump.o
    vcs-svn/svndump.c: In function `svndump_read':
    vcs-svn/svndump.c:215: warning: int format, uint32_t arg (arg 2)

In order to suppress the warning we use the C99 format specifier
macro PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 16:48:47 -08:00
Jonathan Nieder 9e8c532108 vcs-svn: Allow change nodes for root of tree (/)
It is not uncommon for a svn repository to include change records for
properties at the top level of the tracked tree:

	Node-path:
	Node-kind: dir
	Node-action: change
	Prop-delta: true
	Prop-content-length: 43
	Content-length: 43

	K 10
	svn:ignore
	V 11
	build-area

	PROPS-END

Unfortunately a recent svn-fe change (vcs-svn: More dump format sanity
checks, 2010-11-19) causes such nodes to be rejected with the error
message

	fatal: invalid dump: path to be modified is missing

The repo_tree module does not keep a dirent for the root of the tree.
Add a block to the dump parser to take care of this case.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07 16:04:56 -08:00
David Barr 6b01b67658 vcs-svn: Implement Prop-delta handling
The rules for what file is used as delta source for each file are not
documented in dump-load-format.txt.  Luckily, the Apache Software
Foundation repository has rich enough examples to figure out most of
the rules:

Node-action: replace implies the empty property set and empty text as
preimage for deltas.  Otherwise, if a copyfrom source is given, that
node is the preimage for deltas.  Lastly, if none of the above applies
and the node path exists in the current revision, then that version
forms the basis.

[jn: refactored, with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:59 -08:00
Jonathan Nieder 6263c06d49 vcs-svn: Sharpen parsing of property lines
Prepare to add a new type of property line (the 'D' line) to
handle property deltas.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:58 -08:00
Jonathan Nieder 2a48afe1c2 vcs-svn: Split off function for handling of individual properties
The handle_property function is the part of read_props that would be
interesting for most people: semantics of properties rather than the
algorithm for parsing them.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:58 -08:00
Jonathan Nieder 3f3e676d6e vcs-svn: Make source easier to read on small screens
Remove some newlines from handle_node() that are not needed for
clarity.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:58 -08:00
Jonathan Nieder c7dbf35e91 vcs-svn: More dump format sanity checks
Node-action: change is not appropriate when switching between file and
directory or adding a new file.  Current svn-fe silently accepts such
nodes and the resulting tree has missing files in the "changed when
meant to add" case.

Node-action: add requires some content (text or directory); there is
no such thing as an "intent to add" node in svn dumps.  Current svn-fe
accepts such contentless adds but produces an invalid fast-import
stream that refers to nonexistent mark :0 in response.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:52:51 -08:00
Jonathan Nieder 414e569e45 vcs-svn: Reject path nodes without Node-action
It would be better to flag such errors and let the import proceed
anyway, but for now it is simpler not to worry about recovery
from such weird cases.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:52:47 -08:00
Jonathan Nieder 1c7bb31616 vcs-svn: Delay read of per-path properties
The mode for each file in an svn-format dump is kept in the properties
section.  The properties section is read as soon as possible to allow
the correct mode to be filled in when registering the file with the
repo_tree lib.

To support nodes with a missing properties section, svn-fe determines
the mode in three stages:

 - The kind (directory or file) of the node is read from the dump and
   used to make an initial estimate (040000 or 100644).
 - Properties are read in and allowed to override this for symlinks
   and executables.
 - If there is no properties section, the mode from the previous
   content of the path is left alone, overriding the above
   considerations.

This is a bit of a mess, and worse, it would get even more complicated
once we start to support property deltas.  If we could only register
the file with a provisional value for mode and then change it later
when properties say so, the procedure would be much simpler.

... oh, right, we can.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder 08c39b5c44 vcs-svn: Combine repo_replace and repo_modify functions
There are two functions to change the staged content for a path in the
svn importer's active commit: repo_replace, which changes the text and
returns the mode, and repo_modify, which changes the text and mode and
returns nothing.

Worse, there are more subtle differences:

 - A mark of 0 passed to repo_modify means "use the existing content".
   repo_replace uses it as mark :0 and produces a corrupt stream.

 - When passed a path that is not part of the active commit,
   repo_replace returns without doing anything.  repo_modify
   transparently adds a new directory entry.

Get rid of both and introduce a new function with the best features of
both: repo_modify_path modifies the mode, content, or both for a path,
depending on which arguments are zero.  If no such dirent already
exists, it does nothing and reports the error by returning 0.
Otherwise, the return value is the resulting mode.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder 6ee4a9be48 vcs-svn: Replace = Delete + Add
Simplify by reducing the "Node-action: replace" case to "Node-action:
add".  This way, the main part of handle_node() only has to deal with
"add" and "change" nodes.

Functional change: replacing a symlink or executable without setting
properties will reset the mode.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder 5af8fae2df vcs-svn: handle_node: Handle deletion case early
Take care of "Node-action: delete" as soon as possible, so we can stop
worrying about that case in the rest of the function.

Functional change: catch deletion nodes with features that would not
apply to them (text, properties, or origin data) and error out for
those cases.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder 462e1f51a5 vcs-svn: Use mark to indicate nodes with included text
Allocate a mark if needed as soon as possible so later code can use
"if (mark)" to check if this node has text attached rather than
explicitly checking for Text-content-length.

While at it, reject directory nodes with text attached; the presence
of such a node would indicate a bug in the dump generator or svn-fe's
understanding.  In the long term, it would be nice to be able to
continue parsing and save the error for later, but for now it is
simpler to error out right away.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder d6e81a0315 vcs-svn: Unclutter handle_node by introducing have_props var
It is possible for a path node in an SVN-format dump file to leave out
the properties section.  svn-fe handles this by carrying over the
properties (in particular, file type) from the old version of that
node.

To support this, handle_node tests several times whether a
Prop-content-length field is present.  Ancient Subversion actually
leaves out the Prop-content-length field even for nodes with
properties, so that's not quite the right check.  Besides, this detail
of mechanism is distracting when the question at hand is instead what
content the new node should have.

So introduce a local have_props variable.  The semantics are the same
as before; the adaptations to support ancient streams that leave out
the prop-content-length can wait until someone needs them.

Signed-off-by: Jonathan Nieder <jrnieer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
Jonathan Nieder da3e217447 vcs-svn: Eliminate node_ctx.mark global
The mark variable is only used in handle_node().  Its life is
very short and simple: first, a new mark number is allocated if
this node has text attached, then that mark is recorded in the
in-core tree being built up, and lastly the mark is communicated
to fast-import in the stream along with the associated text.

A new reader may worry about interaction with other code, especially
since mark is not initialized to zero in handle_node() itself.
Disperse such worries by making it local.  No functional change
intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
Jonathan Nieder 1d13e9f600 vcs-svn: Eliminate node_ctx.srcRev global
The srcRev variable is only used in handle_node(); its purpose
is to hold the old mode for a path, to only be used if properties
are not being changed.  Narrow its scope to make its meaningful
lifetime more obvious.

No functional change intended.  Add some tests as a sanity-check
for the simplest case (no renames).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
Jonathan Nieder 5c28a8b054 vcs-svn: Check for errors from open()
test-svn-fe segfaults when passed a bogus path.  Simplify debugging by
exiting with a meaningful error message instead.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
David Barr 1f05d07c45 vcs-svn: Allow simple v3 dumps (no deltas yet)
Since the dumpfile version 1 days, the Subversion dump format
gained some new fields:

 - a unique identifier for the repository (version 2 format)
 - whether the text and properties for a node should be
   interpreted as deltas
 - checksums for a delta's preimage
 - SHA-1 sums as alternatives to the existing MD5 checksums for
   copy source and the payload (delta).

For now what is relevant to us is the Text-delta and Prop-delta
fields, since not noticing these causes a dump file to be
misinterpreted (see the previous commit).

[jn: with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:48:54 -08:00
Jonathan Nieder b3e5bce1aa vcs-svn: Error out for v3 dumps
By ignoring the Text-Delta and Prop-Delta node fields, current svn-fe
happily mistakes deltas for full text and instead of cleanly erroring
out, it produces a valid but semantically bogus fast-import stream
when fed a dump file in the modern "svnadmin dump --deltas" format.

Dump file parsers are supposed to ignore header fields they don't
understand (to allow for backward-compatible extensions), but they are
also supposed to check the SVN-fs-dump-format-version header to
prevent misinterpretation of non backward-compatible extensions.
Do so.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:48:52 -08:00
Ramsay Jones 5418d96ddc vcs-svn: Fix some printf format compiler warnings
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

      CC vcs-svn/fast_export.o
  vcs-svn/fast_export.c: In function `fast_export_modify':
  vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3)
  vcs-svn/fast_export.c: In function `fast_export_commit':
  vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5)
  vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c: In function `fast_export_blob':
  vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3)
      CC vcs-svn/svndump.o
  vcs-svn/svndump.c: In function `svndump_read':
  vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3)

In order to suppress the warnings we use the C99 format specifier
macros PRIo32 and PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12 10:24:55 -07:00
David Barr 21746aa34f SVN dump parser
svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

[rr: allow input from files other than stdin]
[jn: with test, more error reporting]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00