erictaneda

New Members
  • Posts

    2
  • Joined

  • Last visited

erictaneda's Achievements

New User

New User (1/3)

  1. Dear Bittorrent Sync Developers, I have almost 30 years of experience and about 20 programming languages, so I have a good intuitive sense of the logic behind a program by observing how it works. I have also managed multiple projects, where I supply a programmer with an outline description of how a program should function, to get back a good functional program. I am impressed with the glue that binds the nodes together in Bittorrent Sync, i.e., using the Bittorrent protocol to establish a private cluster, and want to help in any way with the concept of applying this to synchronizing files across this cluster. I humbly suggest the algorithm currently in use to determine what should be authoritative in synchronization may have systemic flaws, respectfully suggest it may be easier (for just this part) to start this logic from scratch rather than trying to do a line item fix. In this spirit, and based on my experience, I am submitting this for your consideration: Eric Taneda's Suggested Synchronization Algorithm 1. Establish current time independent of local clock over the Internet, based on UTC, and keep this independent time available. 2. We need a metadata hidden file (or an existing metadata file to perform the function) - to keep a list of every file in the current directory - and the version timestamp for the file valid within the cluster - which is a timestamp independent of the local computer. When a file is transferred in from the cluster, then the cluster timestamp follows the file and gets written to the metadata file - so the timestamp matches exactly across multiple nodes. 3. (Only) when a file is observed changing in real-time, i.e., the file is known to exist in a covered directory, and this known file was observed being changed, i.e., by means of seeing the modified date changing to the current local clock time, then change the timestamp in the metadata file using the independent clock's time, which is picked up by the synchronization part of the program to notice that a newer version exists on the cluster and needs to be distributed. This means if any changes occur while the synchronization program is not running, any changes are treated as "stale" and once the synchronization comes back online, this stale file is overwritten by whatever file is considered the most recent version in the cluster, and this is by design. 4. Put an option in the synchronization program "Temporarily treat files moved into sync directory as authoritatively being the latest version" which when enabled, causes the metafile to be written to with the time of the file being observed to have entered into a synchronized directory, temporarily overruling the general rule that the modify timestamp must be observed changing to the current local time. This option should be off by default, and when enabled, only stays enabled for a short time (like 5 minutes), so it provides a way to manually override what is considered a current file on a network-wide basis, but requires this manual procedure so it can only be done intentionally, and not by accident. Also, because this causes the current actual time to be written into the metadata, it does not need to affect the timestamp as it appears on the files themselves, and it does not cause confusion when a node that has been offline comes back online. 5. For synchronization purposes, only compare timestamps written into the metadata file in the synchronization directory. Whichever file has the latest timestamp according to this metadata file should be authoritative in terms of which files are the most current, and should overwrite older versions elsewhere. 6. Write a detailed log of synchronization activity as follows: a. when a file receives a metadata timestamp (from the independent clock, in the metafile), which should only occur when i. the file is received from another node in the cluster, ii. when the modify timestamp is observed as changing the local clock timestamp in real-time while the synchronization program is running, or iii. when a file is observed being moved into a synchronization directory while the option to recognize this as authoritative is enabled b. when a file is transferred into a node, the following information should be recorded in the log: i. whether or not the file existed on this node prior to it being received from the cluster, ii. if it existed, what its timestamp was (according to the entry in the metadata file, not local file system timestamp) before being overwritten iii. also if it existed, the sha256 hash of the older file that was overwritten iv. the cluster timestamp of the received file v. the sha256 hash of the file that replaced the older file 7. Provide a tab in the synchronization program that allows accessing older versions of files - it would show a tree view of the directory structure for a given synchronized directory on the left side, then a file list on the right side, and for each file with older versions available, a button next to it showing text such as "13 versions" which when clicked, shows the older versions in a list, which can be selected and restored. When an older version is restored, it saves it as the original filename, with text appended "(from 2013-12-18 07 33 45 UTC)" to indicate which version, using cluster timestamp. Older versions of files can be saved using the sha256 value of the file rather than filename, and the list of older versions can be constructed from the detailed log of synchronization activity above. This procedure avoids common problems such as: 1. The user has the time zone set wrong, so the underlying UTC value is obscured and hidden from the user, the clock appears correct, but is actually off by hours (e.g., if the user has a misconfigured computer with EST instead of PST, and has adjusted the clock to be right, then even though the system clock appears to show the right time, it is in fact off by 3 hours). 2. The user's clock is off by several minutes, which causes problems with accurately identifying which file is the most recent, in cases where a file is often edited from multiple nodes. 3. The user's clock is wildly wrong, which if not detected, causes problems with accurately identifying which file is the most recent. 4. The user does not have the synchronization program running, makes a bunch of changes intentionally or unintentionally, or makes changes while the clock is wrong, and causes unexpected replication when the node comes back online (i.e., the synchronization program is up and running again). 5. If a node goes offline and a bunch of changes get made, and comes back online, it does not result in the bunch of random changes being synchronized across the cluster. Only changes that occur while a node is online get replicated to other nodes. If someone wants to intentionally replicate things, they can move the files out of the replicated directory, enable the option to treat files moved into the directory as new, then move the files back in, to force them to be recognized as authoritative copies. These steps are hard to do accidentally, so the general rule is that only files that get changed using stringent rules (modify timestamp gets changed to local clock), gets flagged as having changed (using the universal clock as the authoritative timestamp, not the local clock); and only by explicit procedures, that are hard to do accidentally, a set of files can be treated as being authoritative on the cluster. 6. If there is a need to get an older version of a file due to accidental overwrite, it provides a relatively user-friendly way to do it. If you find this approach to be promising and would like my further help, please do not hesitate to contact me - I am happy to help in any way I can - because I believe strongly in the BT Sync concept - and would like to help make it successful in any way I can.
  2. I have two computers (Windows Server 2003, and Windows XP) linked by BT Sync. I have a WordPad document in the shared folder, with a Date Modified timestamp "12/21/2013 2:25 a.m., which is synchronized across both computers. When I open this file in Windows XP, make changes, and save it, the Modified timestamp has changed to "12/21/2013 3:34 a.m." A few seconds later, the timestamp for this file changes back to "12/21/2013 2:25 a.m.", as it copied the file from the other computer. I confirm both computers have synchronized clocks. I got curious and started looking at this more closely, and I believe I have identified the cause of this, and I believe it is a serious bug. The "Date Created" timestamp on Windows XP is set to "12/21/2013 2:18 a.m." and remains on this time even when I modify it. The "Date Created" timestapm on the Windows Server 2003 is set to "12/21/2013 2:21 a.m." So it is evident that what is being compared is the "Date Created" attribute and not the "Date Modified" attribute. What this means is that if I open a document, modify it, and save it, Windows updates the "Date Modified" value to current time, but "Date Created" remains the same, and BT Sync notices that the files are different, and believes that since "Date Created" is an older value on the newly modified file, it grabs the contents for the older file (which has a newer "Date Created" value, because it was the recipient of an eariler synchronization). I suggest this behavior is not correct, and if a node modifies a file, and the "Date Modified" stamp is newer on file B than file A, then B should overwrite A, and not the other way around. Please advise if this description is unclear, or if I can be help the developer reproduce this problem with more clarity. Thank you, Eric