Algebraic File Synchronization: Adequacy and Completeness

Keywords: file synchronization, algebraic approach, distributed systems, reconciliation, data replication

Abstract

With distributed computing and mobile applications, synchronizing diverging replicas of data structures is a more and more common problem. We use algebraic methods to reason about filesystem operations, and introduce a simplified definition of conflicting updates to filesystems. We also define algorithms for update detection and reconciliation and present rigorous proofs that they not only work as intended, but also cannot be improved on.

To achieve this, we introduce a novel, symmetric set of filesystem commands with higher information content, which removes edge cases and increases the predictive powers of our algebraic model. We also present a number of generally useful classes and properties of sequences of commands.

These results are often intuitive, but providing exact proofs for them is far from trivial. They contribute to our understanding of this special type of algebraic model, and toward building more complete algebras of filesystem trees and extending algebraic approaches to other data storage protocols. They also form a theoretical basis for specifying and guaranteeing the error-free operation of applications that implement an algebraic approach to synchronization.

Full paper

Read the full paper on arXiv.org »

See also

A number of other papers extending these results: An Algebraic Approach to File Synchronization (2001); Algebra of Data Reconciliation (2021), in Studia (2022); Data Synchronization: A Complete Theoretical Solution for Filesystems (2022), in Future Internet (2022); Synchronizing Many Filesystems in Near Linear Time (2023), in Future Internet (2023).