Some improvement and corrections.

Thanks to Joe Brockmeier <zonker@opensuse.org> for proofreading.
This commit is contained in:
Andreas Schneider 2008-12-23 10:26:21 +01:00
Родитель 0f6a55bb23
Коммит 69bb51d7af
2 изменённых файлов: 119 добавлений и 55 удалений

Просмотреть файл

@ -3,10 +3,16 @@ CSYNC User Guide
Andreas Schneider <mail@cynapses.org>
:Author Initials: ADS
csync is a bidirectional file synchronizer for Linux and allows to keep two
copies of files and directories in sync. It uses uses widly adopted protocols
like smb or sftp so that there is no need for a server component of csync. It
is a user-level program which means you don't need to be a superuser.
csync is a lightweight utility to synchronize files between two directories
on a system, or between multiple systems.
It synchronizes bidirectional and allows to keep two copies of files and
directories in sync. It uses uses widly adopted protocols, like smb or sftp so
that there is no need for a server component of csync. It is a user-level
program which means you don't need to be a superuser.
Together with a PAM modules the intention is to provide Roaming Home
Directories for Linux.
Introduction
------------
@ -15,10 +21,10 @@ It is often the case that we have multiple copies (called replicas) of a
filesystem or part of a filesystem (for example on a notebook and on a desktop
computer). Changes to each replica are often made independently and as a
result they do not contain the same information. In that case a file
synchronizer is used to make them consistent again, without loosing any
synchronizer is used to make them consistent again, without losing any
information.
The goal is to detect conflicting <<X13, updates>> (files which has been
The goal is to detect conflicting updates (files which has been
modified) and propagate non-conflicting updates to each replica. If there
are no conflicts left we are done and the replicas are identical.
@ -36,11 +42,9 @@ just a sequence of names separated by '/'.
NOTE: The path separator is always a forward slash '/', even for Windows.
csync is always using the absolute path. This could be '/home/gladiac' or
csync always uses the absolute path. This could be '/home/gladiac' or
for sftp 'sftp://gladiac:secret@myserver/home/gladiac'.
[[X13]]
What is an update?
~~~~~~~~~~~~~~~~~~
The contents of a path could be a file, a directory or a symbolic link
@ -50,7 +54,7 @@ to:
- a regular file, the the contents of the file are the byte stream and the
metatdata of the file.
- a directory, then the content is the metadata of the directory.
- a symbolic link, then the content is the string where the link points to.
- a symbolic link, then the content is the named file the link points to.
csync keeps a record of each path which has been successfully synchronized. The
path gets compared with the record and if it has changed since the last
@ -70,10 +74,10 @@ File Synchronization
The main goal of a file synchronizer is correctness. It changes whole or
separated pieces of a users file system. So a user is not able to monitor the
complete file synchronization process. So the synchronizer is in a position
complete file synchronization process, and the synchronizer is in a position
where it can damage the file system. It is important that the implementation
behaves correctly under all conditions, even if there is an unexpected error
(for example disk full).
(for example, disk full).
On problem concerning correctness is the handling of conflicts. Each file
synchronizer tries to propagate conflicting changes to the other replica. At
@ -103,14 +107,14 @@ been renamed.
Reconciliation
~~~~~~~~~~~~~~
The most improtant component is the update detector cause the reconciler depends
on it. The correctness of reconciler is mandatory cause it can damage a
The most improtant component is the update detector, because the reconciler depends
on it. The correctness of reconciler is mandatory because it can damage a
filesystem. It decides which file:
* keeps untouched
* has a conflict
* gets synchronized
* or gets *deleted*
* Stays untouched
* Has a conflict
* Gets synchronized
* or is *deleted*
A wrong decision of the reconciler leads in most cases to a loss of data. So there
are several conditions a the file synchronizer has to follow.
@ -153,6 +157,7 @@ replica. This has the advantage that we can check if file which has been copied
to the opposite replica has been transfered successfully. If the connection
gets interruppted during the transfer we still have the orignal states of the
file. This means no data will be lost.
In the second phase the the file on the opposite replica will be overwritten by
the temporary file.
@ -165,8 +170,8 @@ Robustness
~~~~~~~~~~
This is a really important topic. The file synchronizer should not crash and if
it crashed, there should be no loss of data. To achieve this goal there are
several mechanism to prevent this. These mechnanism will be discussed in the
it has crashed, there should be no loss of data. To achieve this goal there are
several mechanisms to prevent this. These mechnanisms will be discussed in the
following sections.
Crash resistance
@ -187,6 +192,8 @@ invariant:
IMPORTANT: At every moment of the synchronization each file has either its
original content or its correct final content.
This means, that the original content can not be incorrect, no data can't be
lost until we overwrite it after a successful synchronization.
So each interupted synchronization process is a partial sync and can be
continued and completed by simply running csync again. The only problem could
be an error of the filesystem. So we reach this invariant only approximatly.
@ -195,11 +202,12 @@ Transfer errors
^^^^^^^^^^^^^^^
With the Two-Phase-Commit we check the file size after the file has
transferred. So we can detect transfer erros. Better would be a transfer
protocol with checksums. This could possibly done in the future.
transferred. So we can detect transfer errors. Better would be a transfer
protocol with checksums. We may do this in the future.
Future filesystems like btrfs will help to compare checksums instead of the
filesize. This will make the synchronization itself safer.
Future filesystems, like btrfs, will help to compare checksums instead of the
filesize. This will make the synchronization safer. This doen't mean that it is
unsafe now, but checkums are better then just filesize checks.
Database loss
^^^^^^^^^^^^^
@ -208,9 +216,10 @@ It could be possible, that the state database get corrupted. If this happens
all files get evaluated. In this case the file synchronizer wont delete any
file, but it could occur that deleted files will be restored from the other
replica.
To prevent a corruption or loss of the database if an error occurs or the user
forces an abort, the synchronizer is working on a copy of the database and will
use a 2-Phase-Commit to save it at the end.
use a Two-Phase-Commit to save it at the end.
Getting started
---------------
@ -233,6 +242,29 @@ DESTINATION can be a local directory or a remote file server.
csync /home/csync scheme://user:password@server:port/full/path
Examples
^^^^^^^^
To synchronize two local directories:
csync /home/csync/replica1 /home/csync/relplica2
Two synchronizer a local directory with an smb server, use
csync /home/csync smb://rupert.galaxy.site/Users/csync
If you use kerberos you don't have to specify a username or a password. If you
don't use kerberos, the commandline client will ask about the user and the
password.
If you don't want to be ask, you can specify it on the commandline:
csync /home/csync smb://csync:secret@rupert.galaxy.site/Users/csync
If you use the sftp protocol and want to specify a port, you do it the
following way:
csync /home/csync sftp://csync@krikkit.galaxy.site:2222/home/csync
The remote destination is supported by plugins. By default csync ships with smb
and sftp support. For more information, see the manpage of csync(1).

Просмотреть файл

@ -427,10 +427,14 @@ function generateToc(toclevels) {
</div>
<div id="preamble">
<div class="sectionbody">
<div class="para"><p>csync is a bidirectional file synchronizer for Linux and allows to keep two
copies of files and directories in sync. It uses uses widly adopted protocols
like smb or sftp so that there is no need for a server component of csync. It
is a user-level program which means you don't need to be a superuser.</p></div>
<div class="para"><p>csync is a lightweight utility to synchronize files between two directories
on a system, or between multiple systems.</p></div>
<div class="para"><p>It synchronizes bidirectional and allows to keep two copies of files and
directories in sync. It uses uses widly adopted protocols, like smb or sftp so
that there is no need for a server component of csync. It is a user-level
program which means you don't need to be a superuser.</p></div>
<div class="para"><p>Together with a PAM modules the intention is to provide Roaming Home
Directories for Linux.</p></div>
</div>
</div>
<h2 id="_introduction">1. Introduction</h2>
@ -439,9 +443,9 @@ is a user-level program which means you don't need to be a superuser.</p></div>
filesystem or part of a filesystem (for example on a notebook and on a desktop
computer). Changes to each replica are often made independently and as a
result they do not contain the same information. In that case a file
synchronizer is used to make them consistent again, without loosing any
synchronizer is used to make them consistent again, without losing any
information.</p></div>
<div class="para"><p>The goal is to detect conflicting <a href="#X13">updates</a> (files which has been
<div class="para"><p>The goal is to detect conflicting updates (files which has been
modified) and propagate non-conflicting updates to each replica. If there
are no conflicts left we are done and the replicas are identical.</p></div>
</div>
@ -461,9 +465,9 @@ just a sequence of names separated by <em>/</em>.</p></div>
<td class="content">The path separator is always a forward slash <em>/</em>, even for Windows.</td>
</tr></table>
</div>
<div class="para"><p>csync is always using the absolute path. This could be <em>/home/gladiac</em> or
<div class="para"><p>csync always uses the absolute path. This could be <em>/home/gladiac</em> or
for sftp <em>sftp://gladiac:secret@myserver/home/gladiac</em>.</p></div>
<h3 id="X13">2.2. What is an update?</h3><div style="clear:left"></div>
<h3 id="_what_is_an_update">2.2. What is an update?</h3><div style="clear:left"></div>
<div class="para"><p>The contents of a path could be a file, a directory or a symbolic link
(symbolic links are not supported yet). To be more precise, if the path refers
to:</p></div>
@ -481,7 +485,7 @@ a directory, then the content is the metadata of the directory.
</li>
<li>
<p>
a symbolic link, then the content is the string where the link points to.
a symbolic link, then the content is the named file the link points to.
</p>
</li>
</ul></div>
@ -513,10 +517,10 @@ its contents in are not identical.
<div class="sectionbody">
<div class="para"><p>The main goal of a file synchronizer is correctness. It changes whole or
separated pieces of a users file system. So a user is not able to monitor the
complete file synchronization process. So the synchronizer is in a position
complete file synchronization process, and the synchronizer is in a position
where it can damage the file system. It is important that the implementation
behaves correctly under all conditions, even if there is an unexpected error
(for example disk full).</p></div>
(for example, disk full).</p></div>
<div class="para"><p>On problem concerning correctness is the handling of conflicts. Each file
synchronizer tries to propagate conflicting changes to the other replica. At
the end both replicas should be identical. There are different strategies to
@ -539,28 +543,28 @@ store in the statedb too. If we don't find the file by the name in the database
we search for the inode number. If the inode number is found then the file has
been renamed.</p></div>
<h3 id="_reconciliation">3.2. Reconciliation</h3><div style="clear:left"></div>
<div class="para"><p>The most improtant component is the update detector cause the reconciler depends
on it. The correctness of reconciler is mandatory cause it can damage a
<div class="para"><p>The most improtant component is the update detector, because the reconciler depends
on it. The correctness of reconciler is mandatory because it can damage a
filesystem. It decides which file:</p></div>
<div class="ilist"><ul>
<li>
<p>
keeps untouched
Stays untouched
</p>
</li>
<li>
<p>
has a conflict
Has a conflict
</p>
</li>
<li>
<p>
gets synchronized
Gets synchronized
</p>
</li>
<li>
<p>
or gets <strong>deleted</strong>
or is <strong>deleted</strong>
</p>
</li>
</ul></div>
@ -588,8 +592,8 @@ operation.</p></div>
replica. This has the advantage that we can check if file which has been copied
to the opposite replica has been transfered successfully. If the connection
gets interruppted during the transfer we still have the orignal states of the
file. This means no data will be lost.
In the second phase the the file on the opposite replica will be overwritten by
file. This means no data will be lost.</p></div>
<div class="para"><p>In the second phase the the file on the opposite replica will be overwritten by
the temporary file.</p></div>
<div class="para"><p>After a successfull propagation we have to merge the trees to reflect the
current state of the filesystem tree. This updated tree will be written as a
@ -597,8 +601,8 @@ journal into a database. The database is called the state database. It will be
used during the update detection of the next synchronization. See above.</p></div>
<h3 id="_robustness">3.4. Robustness</h3><div style="clear:left"></div>
<div class="para"><p>This is a really important topic. The file synchronizer should not crash and if
it crashed, there should be no loss of data. To achieve this goal there are
several mechanism to prevent this. These mechnanism will be discussed in the
it has crashed, there should be no loss of data. To achieve this goal there are
several mechanisms to prevent this. These mechnanisms will be discussed in the
following sections.</p></div>
<h4 id="_crash_resistance">3.4.1. Crash resistance</h4>
<div class="para"><p>The synchronization process can be interrupted by different events, this can
@ -641,23 +645,26 @@ invariant:</p></div>
original content or its correct final content.</td>
</tr></table>
</div>
<div class="para"><p>So each interupted synchronization process is a partial sync and can be
<div class="para"><p>This means, that the original content can not be incorrect, no data can't be
lost until we overwrite it after a successful synchronization.
So each interupted synchronization process is a partial sync and can be
continued and completed by simply running csync again. The only problem could
be an error of the filesystem. So we reach this invariant only approximatly.</p></div>
<h4 id="_transfer_errors">3.4.2. Transfer errors</h4>
<div class="para"><p>With the Two-Phase-Commit we check the file size after the file has
transferred. So we can detect transfer erros. Better would be a transfer
protocol with checksums. This could possibly done in the future.</p></div>
<div class="para"><p>Future filesystems like btrfs will help to compare checksums instead of the
filesize. This will make the synchronization itself safer.</p></div>
transferred. So we can detect transfer errors. Better would be a transfer
protocol with checksums. We may do this in the future.</p></div>
<div class="para"><p>Future filesystems, like btrfs, will help to compare checksums instead of the
filesize. This will make the synchronization safer. This doen't mean that it is
unsafe now, but checkums are better then just filesize checks.</p></div>
<h4 id="_database_loss">3.4.3. Database loss</h4>
<div class="para"><p>It could be possible, that the state database get corrupted. If this happens
all files get evaluated. In this case the file synchronizer wont delete any
file, but it could occur that deleted files will be restored from the other
replica.
To prevent a corruption or loss of the database if an error occurs or the user
replica.</p></div>
<div class="para"><p>To prevent a corruption or loss of the database if an error occurs or the user
forces an abort, the synchronizer is working on a copy of the database and will
use a 2-Phase-Commit to save it at the end.</p></div>
use a Two-Phase-Commit to save it at the end.</p></div>
</div>
<h2 id="_getting_started">4. Getting started</h2>
<div class="sectionbody">
@ -676,6 +683,31 @@ DESTINATION can be a local directory or a remote file server.</p></div>
<div class="content">
<pre><tt>csync /home/csync scheme://user:password@server:port/full/path</tt></pre>
</div></div>
<h4 id="_examples">4.2.1. Examples</h4>
<div class="para"><p>To synchronize two local directories:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>csync /home/csync/replica1 /home/csync/relplica2</tt></pre>
</div></div>
<div class="para"><p>Two synchronizer a local directory with an smb server, use</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>csync /home/csync smb://rupert.galaxy.site/Users/csync</tt></pre>
</div></div>
<div class="para"><p>If you use kerberos you don't have to specify a username or a password. If you
don't use kerberos, the commandline client will ask about the user and the
password.
If you don't want to be ask, you can specify it on the commandline:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>csync /home/csync smb://csync:secret@rupert.galaxy.site/Users/csync</tt></pre>
</div></div>
<div class="para"><p>If you use the sftp protocol and want to specify a port, you do it the
following way:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>csync /home/csync sftp://csync@krikkit.galaxy.site:2222/home/csync</tt></pre>
</div></div>
<div class="para"><p>The remote destination is supported by plugins. By default csync ships with smb
and sftp support. For more information, see the manpage of <tt>csync(1)</tt>.</p></div>
<h3 id="_the_pam_module">4.3. The PAM module</h3><div style="clear:left"></div>
@ -693,7 +725,7 @@ directory).</p></div>
</div>
<div id="footer">
<div id="footer-text">
Last updated 2008-12-17 15:38:27 CEST
Last updated 2008-12-23 10:15:54 CEST
</div>
</div>
</body>