Reference Node: Master Nodes Fighting With Each Other


stanha

Recommended Posts

This topic has to do with the situation when there are multiple master (r/w) nodes and some of them do things to their subfolders or files that do not reconcile with other master nodes.

The r/o nodes' logic is quite complex. But we have addressed it already and so far, there has been no arguments or objections to it for whatever reason.

But the master node logic is even more complex and its effects are more far reaching.

Consider the situation where there are several master nodes.

One of them, for whatever reason, decides to modify some subfolder name.

What happens?

Well, what happens, whether he realized it or not, but he created a new subfolder, and it MUST be propagated, including to the master nodes. And even worse than that, the old folder that he thought he renamed, will still be redownloaded to his node from the other nodes. And his newly named folder will be propagated to all other nodes, r/w and r/o. Thus, everyone gets 2 differently named folders with the same contents.

According to current logic, the reference collection is the union of all the master node collections. That is, if one master node does NOT have some files that some other master node does, then it downloads these files, so the resulting collection becomes bigger.

The only way one master may prevent downloading and propagating a folder with duplicate contents (if other master node made a mistake and renamed some folder) is for the master node to place that folder name into its .SyncIngnore file. But that does not prevent all other master nodes from getting a duplicate copy of that folder.

So, currently, there is no way for the master node to tell the other master nodes that some folder is a duplicate.

There is an interesting case of r/w nodes "fighting". If you delete some files or folders on one r/w node, it won't help anything if you have multiple r/w nodes. Even deleting it locally, will not prevent it from being re-downloaded again in few seconds. Because it still exists on other master nodes. As soon as they come on line, you will get your deleted stuff downloaded from them again. So, you MUST use .SyncIgnore if you decide to delete some files/folders and you have multiple r/w nodes and some of them have already synced with those files that you have deleted, unless I am mistaken, of course.

The other alternative for the case when some master node deleted some directory, is for other master nodes to also delete it. This is an "and" logic.

Only if this AND that node contain the same data, then that data persists and databases would reconcile between the master nodes. So, if "and" logic was used, then if one master node changed a directory name by merely replacing the underline characters "_" with a blank " ", then the contents of both of these directories would disappear. From the information standpoint this would be a "loss of fidelity". The situation would cause loss of some information.

So, the "or" logic or union operation seems to be more facilitating to the information presence or persistence. As far as information goes, logically, the "rule" is: no information should disappear. Duplicate copies are not as bad as information disappearance, at least for most cases.

And if we end up with 2 copies of the same information because some master node decided to change the folder name into something "more suitable" in HIS opinion, while he, in fact, just created a 2nd copy of the same information, from the informational standpoint, this could be considered as noise and could be considered as nothing more than nuisance that does not cause the loss of information. According to the information theory, noise is something that does not carry information.

This all means that the conflicts between the master nodes containing different information, as it stands now, are resolved via notion of a union to maintain the database consistency across all the r/w nodes. It could also be resolved via exclusion (the "and" operation), so all the nodes have the minimal amount of information present across all r/w nodes. But that would lead to information loss.

One way to resolve it could be to have an external "arbiter", who would dictate HIS view on the information set, and any node, master or slave (r/o), has to comply to its reference view on the total information set. But in that case, there must be only one arbiter, probably the original creator of the information set, which also has a logical problem: if that creator sooner or later permanently leaves this share, then who becomes the arbiter from then on?

It could probably be resolved by a rule: the arbiter is the one that has the most complete set of information.

And if we have more than one arbiter, then we are back with the same problem, only now arbiters start fighting for the "right view" on aggregate information set.

We are talking about one more type of node, the Reference Node. But then there is one other problem: what if the reference node is not on line? Do we create a notion of a dynamic or "current" reference node?

But any way you look at it, this problem seems to be even more complex logically than even the problem with r/o nodes and it could turn out eventually that the current solution of using the .SyncIgnore is the most viable solution to accommodate for all the conflicting situations in the long run, and we just have to live with some unpleasant side effects if some r/w node does something inconsistent.

This seems to be a tough cookie to crack...

Link to comment
Share on other sites

One of them, for whatever reason, decides to modify some subfolder name.
What happens?

 

 

It depends on OS. In some cases, BTSync detects the fact of the folder rename and propagates only rename event to other peers. If no - it reports on folder deletion and on new folder creation, which will be synced to other peers. We are working to improve this mechanism to make it more reliable and avoid unnecessary file transfer.

 

There is an interesting case of r/w nodes "fighting". If you delete some files or folders on one r/w node, it won't help anything if you have multiple r/w nodes. Even deleting it locally, will not prevent it from being re-downloaded again in few seconds. Because it still exists on other master nodes. As soon as they come on line, you will get your deleted stuff downloaded from them again. So, you MUST use .SyncIgnore if you decide to delete some files/folders and you have multiple r/w nodes and some of them have already synced with those files that you have deleted, unless I am mistaken, of course.

 

The fact that file is deleted is recorded in the database. So, even when file is gone, there is some information in DB related to the fact that it was deleted. As soon as other peers are coming online, they'll see that file was deleted an when it was deleted, so they can decide if to remove the file to SyncArchive or to declare that their file is newer than deletion fact and re-distribute it back.

 

If you encounter the event that file gets restored - after deletion and it were not modified on another peer - most likely you met a bug.

Link to comment
Share on other sites

The issue:

Consider the situation where there are several master nodes.

One of them, for whatever reason, decides to modify some subfolder name.

What happens in this case?

It depends on OS. In some cases, BTSync detects the fact of the folder rename and propagates only rename event to other peers. If no - it reports on folder deletion and on new folder creation, which will be synced to other peers. We are working to improve this mechanism to make it more reliable and avoid unnecessary file transfer.

I would like to understand this issue better in the context of events.

When the master r/w node completely syncs with another r/w node, and that other r/w node goes off line, and, meanwhile, the master r/w node renames some folder and could have taken more steps, such as file deletion or modification,

then the logic would get really complicated in a hurry.

If wish there would be a design document about BTSync somewhere where it explains all its logic. Because it can get really complex and the effects could be drastic if you don't understand exactly the rules of behavior.

The fact that file is deleted is recorded in the database. So, even when file is gone, there is some information in DB related to the fact that it was deleted. As soon as other peers are coming online, they'll see that file was deleted an when it was deleted, so they can decide if to remove the file to SyncArchive or to declare that their file is newer than deletion fact and re-distribute it back.

If you encounter the event that file gets restored - after deletion and it were not modified on another peer - most likely you met a bug.

Again, I do not know the rules of behavior and how the events are stored in the database and processed when some other nodes come back on line.

But what I am seeing here is that I have a duplicate of a pretty large folder with videos which at some point was renamed. By whom and at what exact time I can not possibly remember. And the only way to stop that old folder from appearing on the reference master node was to include it in .SyncIgnore.

Again, if you, guys, would consider releasing the detailed rules of behavior, that would certainly help. I am kind of concerned about covering some issues in the detailed manual being written, not knowing if my understanding of it is correct or not. About the last thing I'd like to see is people getting the wrong picture of how it works in detail just because I do not quite see how it works.

And my sense is that the issue of detailed description of logic is not going to go away, but would keep creating more and more misery in people because they do not realize what exact thing or action brings about what results.

Anyway, thanks for your information.

Link to comment
Share on other sites

stanha

 

The events are not stored in the database. When you moved / renamed folder while BTSync is off, it won't receive notifications from OS, so there is a little chance that BTSync recognizes the move/rename.

 

When BTSync is running, it gets notification from OS about rename and likely is going to process it correctly. There are some improvements can be done in move/rename recognition, which we'll do in upcoming release.

 

Again, if you, guys, would consider releasing the detailed rules of behavior, that would certainly help.

 

 

We would really like to make a detailed documentation, but now Sync team is concentrating on releasing of new versions.

Link to comment
Share on other sites

stanha

The events are not stored in the database. When you moved / renamed folder while BTSync is off, it won't receive notifications from OS, so there is a little chance that BTSync recognizes the move/rename.

Well, that is what I was afraid about. My original reply which you are following here did have more extensive analysis of logic, and the conclusion was that it is not going to be easy to make the logic work.

Let me just mention a few things even though I have not seen your design spec, (if you have one to begin with).

I can see that everything should work just fine when nodes are on-line, the "reference" r/w node, as I call it, and the 2nd - "the evil" r/w node, that only comes once in a while and his database does not contain all the modifications, deletions or file exclusions that happened when it was off-line.

When nodes are on line, you are pretty much guaranteed that the events will be received. But if the event is tied directly to the O/S level and are generated as a result of O/S events, on reference node, that means to me that the events are "one time only".

So, if the "evil" r/w node comes back a week later, he won't get the event, and, therefore, his database would be inconsistent with the reference node.

So, if reference node did a folder rename while the evil node was off line, then, when it comes back on line it is pretty simple for it to get the renamed folder with NEW name. But it has no clue about what happened to the old folder and can not process it properly, and, probably, because the old folder was marked as "dead" on the reference node, except I do not quite see yet the whole logic between the r/w node updates.

So, that is why I was thinking that in order for the state to be appropriately updated across the sessions, the event list has to be stored in the database. In that case, the 2nd node would be guaranteed to receive the event.

But then we have even more complicated problem: are the events stored on per client basis? That is, how do we know which events were received and processed by every client, that might have come and gone?

In a general public application, you might have thousands of clients coming and going, never to come back again. So, even if you decided to store the event lists in the database to survive across the sessions for each client individually, which looks like a royal mess, then again you have yet another problem. If reference node gets damaged for whatever reason and decides to delete and re-add the folder, then it is going to be reindexed with what you have on the disk and all the even memory would be gone. In which case, with multiple r/w nodes, you have a total database inconsistency.

But if you do NOT store the events in the database, the very fact that you use a concept of the events, would imply that events might be lost and mod actions do not survive across the client disconnects.

But what blew my mind is that after I wiped out the contents of the folder and reloaded it with the source files and that "evil" node was on line at the time, I could not believe my eyes when I got all the files back from him that were deleted on the reference node, even though the time stamps on the files were certainly fresher then his.

From what I see, since the reference node was reindexed afresh, it did no longer have any idea of the previous version of the database and its database did not know anything about the files that were renamed or deleted. But on the "evil" node, those files were still there, and, even though his file time stamps were ancient, it still fed the reference node with files that no longer should exist on the share as they were deleted from the reference node.

The only solution was to include those files or folders that were renamed or deleted from the reference node to its .SyncIgnore.

Well, what can I say but to wish you, guys, good luck in trying to resolve this logic.

But, no matter how you cut it, you, guys, did a great job so far on BTSync and I applaud you for what you were able to get accomplished by now. Yes, some things are extremely complex logic-wise, and that is, partially, a reason for me to keep going through all the myriads of logical conditions arising out of various user actions and connect/disconnect conditions and so on.

I just hope that it might stimulate you to create the precise operation tables that cover every single condition logically conceivable and see if you can provide the universal logic what would be consistent and reconcile with all other user actions and so on.

When BTSync is running, it gets notification from OS about rename and likely is going to process it correctly.

Well, if I recall correctly, it states somewhere that BTSync is most likely to detect the O/S level file modification events. But I had the impression that it only helps to resync the mods with all other nodes as soon as possible.

But it was not part of the logic. And now what I am hearing that the consistency across the nodes does indeed rely upon events, and those events may or may not arrive for variety of reasons.

That means to me that the events are just performance level tricks that do not fundamentally affect the logic of database consistency level issues.

That is why I proposed a wile back in one of my posts to add more bits in the database about the states of some file, such as "does not exist on disk during the last sync attempt", "updatabe" and "propagatable", that, in combination, give you much more precise control and deliver a more precise state of the system regardless of operation.

From what I recall, currently you have only one or two bits and "update" (downloadable) and "propagate" are essentially merged into one - "dead file" or "alive file", and "dead" files - those that were either deleted or renamed, become permanently dead and their states can no longer be restored, at least as far as r/o nodes are concerned.

The "bottom line" is that you can not rely upon events in terms of database consistency across the nodes and disconnects.

There needs to be a different and more non-volatile mechanism of assuring consistency.

There are some improvements can be done in move/rename recognition, which we'll do in upcoming release.

I am really happy to hear that. The problem is too complex and its effects are much more severe than it might look at the first glance. Because once the user looses confidence that ALL his files are in tip-top condition, then we really have quite a problem to resolve, and to restore confidence is much harder then to create it initially.

We would really like to make a detailed documentation, but now Sync team is concentrating on releasing of new versions.

Well, that is a typical developer's problem. They are too concerned with releasing more and more code and "features", but the understanding of all the possible cases is put on the back-burner.

But, in case of BTSync where logic is so complex that it is not quite easy to even enumerate all possible or "impossible" cases, creating documentation may force and help the developers to clarify the issues in their minds instead of constantly adding new and new bells and whistles, while the basic system is not quite "up to snuff" as it turns out in those cases that they might not have even considered when they were creating them.

Well, any way you cut it, I wish you good luck and I am doing all I can to make sure every single bit of logic is covered in all the possible permutations, and that is precisely the purpose of my detailed manual.

Because the very idea behind BTSync is a dynamite idea and it is urgently needed for all sorts of uses today. In my mind, there is probably nothing more important in modern information world than p2p dynamic information distribution. Torrents are static and that is main disadvantage. The information can not be extended or even modified. So they get "stale".

But syncing files on dynamic basis gives you the real-time updates, which is exactly what is urgently needed today.

So, considering all the complexities, there is simply a dire need to clarify every single permutation of of every possible condition. There should be no "magic" in ANY situation from what I see.

Otherwise, the forums are busy with all sorts of questions and issues that should not even arise if everything is described in detail in the operations manual. Because it is just a royal waste of everyone's energy that could be used on productive things.

That is about ALL I can do, unless I see your design doc with extensive and detailed coverage, so I don't have to sit here guessing and saying "wow, look at that one! What planet did it come from?".

:)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.