Doc: Add sync algorithm overview and comments (#5277)

* Doc: Add sync algorithm overview and comments
This commit is contained in:
ckamm 2016-11-08 16:10:55 +01:00 committed by GitHub
parent f1f27221a7
commit 36d61ef3a9
4 changed files with 122 additions and 2 deletions

View File

@ -65,7 +65,21 @@ static c_rbnode_t *_csync_check_ignored(c_rbtree_t *tree, const char *path, int
}
}
/*
/**
* The main function in the reconcile pass.
*
* It's called for each entry in the local and remote rbtrees by
* csync_reconcile()
*
* Before the reconcile phase the trees already know about changes
* relative to the sync journal. This function's job is to spot conflicts
* between local and remote changes and adjust the nodes accordingly.
*
* See doc/dev/sync-algorithm.md for an overview.
*
*
* Older detail comment:
*
* We merge replicas at the file level. The merged replica contains the
* superset of files that are on the local machine and server copies of
* the replica. In the case where the same file is in both the local

View File

@ -158,7 +158,19 @@ static bool _csync_mtime_equal(time_t a, time_t b)
return false;
}
/**
* The main function of the discovery/update pass.
*
* It's called (indirectly) by csync_update(), once for each entity in the
* local filesystem and once for each entity in the server data.
*
* It has two main jobs:
* - figure out whether anything happened compared to the sync journal
* and set (primarily) the instruction flag accordingly
* - build the ctx->local.tree / ctx->remote.tree
*
* See doc/dev/sync-algorithm.md for an overview.
*/
static int _csync_detect_update(CSYNC *ctx, const char *file,
const csync_vio_file_stat_t *fs, const int type) {
uint64_t h = 0;

84
doc/dev/sync-algorithm.md Normal file
View File

@ -0,0 +1,84 @@
Sync Algorithm
==============
Overview
--------
This is a technical description of the synchronization (sync) algorithm used by the ownCloud client.
The sync algorithm is the thing that looks at the local and remote file system trees and the sync journal and decides which steps need to be taken to bring the two trees into synchronization. It's different from the propagator, whose job it is to actually execute these steps.
Definitions
-----------
- local tree: The files and directories on the local file system that shall be kept in sync with the remote tree.
- remote tree: The files and directories on the ownCloud server that shall be kept in sync with the local tree.
- sync journal (journal): A snapshot of file and directory metadata that the sync algorithm uses as a baseline to detect local or remote changes. Typically stored in a database.
- file and directory metadata:
- mtimes
- sizes
- inodes (journal and local only): Representation of filesystem object. Useful for rename detection.
- etags (journal and remote only): The server assigns a new etag when a file or directory changes.
- checksums (journal and remote only): Checksum algorithm applied to a file's contents.
- permissions (journal and remote only)
Phases
------
### Discovery (aka Update)
The discovery phase collects file and directory metadata from the local and remote trees, detecting differences between each tree and the journal.
Afterwards, we have two trees that tell us what happened relative to the journal. But there may still be conflicts if something happened to an entity both locally and on the remote.
- Input: file system, server data, journal
- Output: two c_rbtree_t*, representing the local and remote trees
- Note on remote discovery: Since a change to a file on the server causes the etags of all parent folders to change, folders with an unchanged etag can be read from the journal directly and don't need to be walked into.
- Details
- csync_update() uses csync_ftw() on the local and remote trees, one after the other.
- csync_ftw() iterates through the entities in a tree and calls csync_walker() for each.
- csync_walker() calls _csync_detect_update() on each.
- _csync_detect_update() compares the item to its journal entry (if any) to detect new, changed or renamed files. This is the main function of this pass.
### Reconcile
The reconcile phase compares and adjusts the local and remote trees (in both directions), detecting conflicts.
Afterwards, there are still two trees, but conflicts are marked in them.
- Input: c_rbtree_t* for the local and remote trees, journal (for some rename-related queries)
- Output: changes c_rbtree_t* in-place
- Details
- csync_reconcile() runs csync_reconcile_updates() for the local and remote trees, one after the other.
- csync_reconcile_updates() uses c_rbtree_walk() to iterate through the entries, calling _csync_merge_algorithm_visitor() for each.
- _csync_merge_algorithm_visitor() checks whether the other tree also has an entry for that node and merges the actions, detecting conflicts. This is the main function of this pass.
### Post-Reconcile
The post-reconcile phase merges the two trees into one set of SyncFileItems.
Afterwards, there is a list of items that can tell the propagator what needs to be done.
- Input: c_rbtree_t* for the local and remote trees
- Output: QMap<QString, SyncFileItemPtr>
- Note that some "propagations", specifically cheap metadata-only updates, are already done at this stage.
- Details
- csync_walk_local_tree() and csync_walk_remote_tree() are called.
- They use _csync_walk_tree() to run SyncEngine::treewalkFile() on each entry.
- treewalkFile() creates and fills SyncFileItems for each entry, ensuring that each file only has a single instance. This is the main function of this pass.
See Also
--------
An overview of the propagation steps is still missing. The sync protocol is documented at https://github.com/cernbox/smashbox/blob/master/protocol/protocol.md.

View File

@ -334,6 +334,16 @@ int SyncEngine::treewalkRemote( TREE_WALK_FILE* file, void *data )
return static_cast<SyncEngine*>(data)->treewalkFile( file, true );
}
/**
* The main function in the post-reconcile phase.
*
* Called on each entry in the local and remote trees by
* csync_walk_local_tree()/csync_walk_remote_tree().
*
* It merges the two csync rbtrees into a single map of SyncFileItems.
*
* See doc/dev/sync-algorithm.md for an overview.
*/
int SyncEngine::treewalkFile( TREE_WALK_FILE *file, bool remote )
{
if( ! file ) return -1;