Implement a file syncing algorithm for two computers over a low-bandwidth network. What if we know the files in the two computers are mostly the same?
Since it is given that the network is of low-bandwidth and unreliable, it indicates that syncing of files as they are being edited is not possible or is error prone. We need a way to reliable and efficient way to sync the entire directory structure.
A point to note is that most of the directory structure is going to same, so we should prevent syncing the parts of the directory structure that are not changed.
We are going to use a data structure called Merkle Tree here.
Merkel Tree is a tree in which each leaf is a hash of the data block it represents and all the parent nodes are the hashes of the collated hashes of their children.
Overview of the algorithm looks as follows:
1. Whenever a change is made a file, we calculate the hashes of the entire directory branch that file part of.
2. To sync the changes, we compare the Merkle Tree of both the systems.
3. The nodes that have the same hashes can be ignored from the syncing process as the entire directory structure below them must also be the same.
The above approach can be implemented as follows: