desktop/doc/architecture.rst

142 lines
7.2 KiB
ReStructuredText
Raw Normal View History

Architecture
============
.. index:: architecture
The ownCloud project provides desktop sync clients to synchronize the
contents of local directories on the desktop machines to the ownCloud.
The syncing is done with csync_, a bidirectional file synchronizing tool which
provides both a command line client as well as a library. A special module for
csync was written to synchronize with ownClouds built-in WebDAV server.
The ownCloud sync client is based on a tool called mirall initially written by
Duncan Mac Vicar. Later Klaas Freitag joined the project and enhanced it to work
with ownCloud server. Both mirall and ownCloud Client (oCC) build from the same
source, currently hosted in the ownCloud source repo on gitorious.
oCC is written in C++ using the `Qt Framework`_. As a result oCC runs on the
three important platforms Linux, Windows and MacOS.
.. _csync: http://www.csync.org
.. _`Qt Framework`: http://www.qt-project.org
The Sync Process
----------------
First it is important to recall what syncing is. Syncing tries to keep the files
on both repositories the same. That means if a file is added to one repository
it is going to be copied to the other repository. If a file is changed on one
repository, the change is propagated to the other repository. Also, if a file
is deleted on one side, it is deleted on the other. As a matter of fact, in
ownCloud syncing we do not have a typical client/server system where the
server is always master.
This is the major difference to other systems like a file backup where just
changes and new files are propagated but files never get deleted.
The oCC checks both repositories for changes frequently after a certain time
span. That is refered to as a sync run. In between the local repository is
monitored by a file system monitor system that starts a sync run immediately
if something was edited, added or removed.
Sync by Time versus ETag
------------------------
.. index:: time stamps, file times, etag, unique id
Until the release of ownCloud 4.5 and ownCloud Client 1.1, ownCloud employed
a single file property to decide which file is newer and hence needs to be
synced to the other repository: the files modification time.
The *modification timestamp* is part of the files metadata. It is available on
every relevant filesystem and is the natural indicator for a file change.
Modification timestamps do not require special action to create and have
a general meaning. One design goal of csync is to not require a special server
component, thats why it was chosen as the backend component.
To compare the modification times of two files from different systems,
it is needed to operate on the same base. Before version 1.1.0,
csync requires both sides running on the exact same time, which can
be achieved through enterprise standard `NTP time synchronisation`_ on all
machines.
Since this strategy is rather fragile without NTP, ownCloud 4.5 introduced a
unique number, which changes whenever the file changes. Although it is a unique
value, it is not a hash of the file, but a randomly chosen number, which it will
transmit in the Etag_ field. Since the file number is guaranteed to change if the
file changes, it can now be used to determine if one of the files has changed.
.. note:: oCC 1.1 and newer require file ID capabilities on the ownCloud server,
hence using them with a server earlier than 4.5.0 is not supported.
Before the 1.3.0 release of the client the sync process might create faux conflict
files if time deviates. The original and the conflict files only differed in the
timestamp, but not in content. This behaviour was changed towards a binary check
if the files are different.
Just like files, directories also hold a unique id, which changes whenever
one of the contained files or directories gets modified. Since this is a
recursive process, it significantly reduces the effort required for a sync
cycle, because the client will only walk directories with a modified unique id.
This table outlines the different sync methods attempted depending
on server/client combination:
.. index:: compatiblity table
+--------------------+-------------------+----------------------------+
| Server Version | Client Version | Sync Methods |
+====================+===================+============================+
| 4.0.x or earlier | 1.0.5 or earlier | Time Stamp |
+--------------------+-------------------+----------------------------+
| 4.0.x or earlier | 1.1 or later | n/a (incompatible) |
+--------------------+-------------------+----------------------------+
| 4.5 or later | 1.0.5 or earlier | Time Stamp |
+--------------------+-------------------+----------------------------+
| 4.5 or later | 1.1 or later | File ID, Time Stamp |
+--------------------+-------------------+----------------------------+
It is highly recommended to upgrade to ownCloud 4.5 or later with ownCloud
Client 1.1 or later, since the time stamp-based sync mechanism can
lead to data loss in certain edge-cases, especially when multiple clients
are involved and one of them is not in sync with NTP time.
.. _`NTP time synchronisation`: http://en.wikipedia.org/wiki/Network_Time_Protocol
.. _Etag: http://en.wikipedia.org/wiki/HTTP_ETag
Comparison and Conflict Cases
----------------------------
In a sync run the client first has to detect if one of the two repositories have
changed files. On the local repository, the client traverses the file
tree and compares the modification time of each file with the value it was
before. The previous value is stored in the client's database. If it is not, it
means that the file has been added to the local repository. Note that on
the local side, the modificaton time a good attribute to detect changes because
it does not depend on time shifts and such.
For the remote (ie. ownCloud) repository, the client compares the ETag of each
file with it's previous value. Again the previous value is queried from the
database. If the ETag is still the same, the file has not changed.
So what happens if a file has changed on both, the local and the remote repository
since the last sync run? That means it can not easily be decided which version
of the file is the one that should be used. Moreover, changes to any side must
not be lost. That is called the conflict case and the client solves it by creating
a conflict file of the older of the two files and save the newer one under the
original file name. Conflict files are always created on the client and never on
the server. The conflict file has the same name as the original file appended
with the timestamp of the conflict detection.
The Sync Journal
----------------
The client stores the ETag number in a per-directory database, called the journal.
It is located in the application directory (until version 1.1) or as a hidden file
right in the directory to be synced (later versions).
If the journal database gets removed, oCC's CSync backend will rebuild the database
by comparing the files and their modification times. Thus it should be made sure
that both server and client synchronized to NTP time before restarting the client
after a database removal.
The oCC also provides a button in the Settings Dialog that allows to "reset" the
journal. That can be used to recreate the journal database.