stanha

Node Updating Stops Until Btsync Is Restarted

Recommended Posts

I have noticed this problem quite a while back. Some nodes do not update. I see some particular nodes that I know for fact are fully in sync, but whenever they go off line and come back on they show the amount of upload equal to the total size of the share, like they are empty and they never get updated until the node goes off and then back on line and I just saw a more specific example with perfectly working node.

Today I saw the main r/w node stop uploading new files to the newly created r/o node after about 50 megs out of total of 400 megs size share.

The r/w node is Win. 7 and r/o node is Linux Ubuntu 13.10.

Both running BTSync 1.3.94.

The 3rd node (r/w) was uploading about 150 megs more to that share and, meanwhile, I have added one r/o node that I know for fact works fine with other shares and updates without problems. It started downloading and downloaded about 50 megs and then stopped getting the new files. It just went dead in respect to downloading even though you can see it. There are several other shares on that node and they all work fine. Have not ever seen a single problem with them not updating other shares, at least eventually.

At the same time the 3rd node was still uploading to the r/w share more stuff. But the r/o node would stop downloading even though it still had hundreds of megs to download. I waited for at least half an hour to see if it restarts, but it would not.

Then I restarted the BTSync on the main r/w node and it restarted uploading to the newly created r/o node and upload completed 100%.

That means to me that even though you can see the node, somehow it looses the transfer part of the connection, like something got stuck and refuses to continue.

This problem can not be a result of the r/w node not having enough to upload to the r/o node as at the moment when that share was added to the new node, the r/w node had already had about 250-300 megs of files.

Again, I am consistently seeing two particular nodes that are fully synced but when they go off and back on line, they do not update and still show the entire share to be uploaded to them. These nodes already sit there for days and exhibit the same exact behavior. Sometimes, after they come back on line they do show being fully in sync and do get updated, but most of the time they simply sit there for days and do not update.

I also noticed similar behavior with several other new r/o nodes. Once they are created, they never start downloading. EVER. They just sit there dead, for hours. You can see them, but they never start downloading.

Eventually, some of them leave forever, never to be seen again, probably giving up on the whole thing.

Interestingly enough, when they appear "dead", there are no log entries for them. You can wait for hours and you won't see a trace of those nodes in the log files, like they do not exist, if you try to find them by their node names in the log.

Since I do not know their IP unless I can look in the log, I have no idea of whether there are some log entries for them, but none of them with their node names.

In this respect, you should have an option "resolve IP", like it is done in uTorrent, for example, which you can check and switch the display back and forth from the IP to the node name.

Share this post


Link to post
Share on other sites

Sometimes I see one particular fully synced node that shows being empty on one of two shares it is connected to while going off and back on line.

On one share it shows (from r/w node) as fully synced, which is correct. But on the other share it shows as being empty which is incorrect because it is also fully synced on that one as well. And it will sit there showing the incorrect node state for hours and days until he goes off and then back on line.

And again, when it sits there dead, there are no log entries for it if you try to find it in the log by the node name. And it also does not update in that state, like it is locked in some state.

So, the problem seems to be on per share basis and not on program-wide basis. It could be that the update thread gets either locked because of synchronization object issue or it is in a dead loop.

In this respect, it would be useful if BTSync shows the exact status of the situation in the Transfers tab. When a new node just created a share, obviously, it does not start syncing immediately. On a large share I have seen it taking more than 10 minutes for the node to start downloading files. Meanwhile, the user has no clue as to what is going on and I have seen some users simply go away forever because they did not see the transfer starting for several minutes.

This means when node connects to the share and has to do some work before it actually starts updating, it should probably show some message, like "indexing, please wait" or whatever the appropriate message for the situation is. But it should not simply sit there like it is dead while it is doing some preparation work under the hood.

Generally, it is a good idea to always display a descriptive message in a message window of the program that shows the progress regardless of operation. Because it tells the user that something is working and things are progressing. If program has to do something in order to actually start the transfer, it should not simply sit there like it is dead for minutes while it has to do some work, but it should show some status, a state, a progress of operation.

While some file is being transferred, it would be very useful to see the running percentage of the amount of progress updated in real time on per file basis. This way, especially on huge files that may take minutes if not hours to download, you see the progress and it gives you a sense that program is actually running and making progress. Otherwise, it simply sits there stuck, like it is dead, while the progress of transfer is in fact going on.

But what is interesting about the update issue is that when node reconnects to that share and stays in a dead state showing being empty (from the r/w node) it does not seem to communicate to the r/w node to get whatever it has to get, probably hashes, in order to discover its state relative to the r/w node.

So, if transfer or communication thread is common for both cases, that would explain some things. But if these processes run from the separate threads then we are dealing with something else here from what I see.

Share this post


Link to post
Share on other sites

Today I had one more interesting case.

There is one share of 180 megs in size.

One new node have joined that share today. It downloaded all but about 16 megs, and then stopped downloading and sits there for hours without making any progress.

I tried to restart the r/w node (Win 7), just as I did in the original case, trying to see if he completes, just as the other node did in the original case a few days ago.

After I restarted BTSync, than node no longer shows that it has only 16 megs left to download. Now it shows that the node is totally empty and has to download the entire share. And for about an hour since BTSync restart, I do see that node but it does not update its amount that it already has downloaded and shows as an empty node.

Nor does it continue to complete the sync.

We are both running v. 1.3.94.

I do have a dump file from a running process from the r/w node (Win 7).

And I also have a log (from the r/w node) that seems to have some relevant information.

Let me know if you want those and we'll arrange for you to get them.

Share this post


Link to post
Share on other sites

Today I had one more interesting case.

There is one share of 180 megs in size.

One new node have joined that share today. It downloaded all but about 16 megs, and then stopped downloading and sits there for hours without making any progress.

I tried to restart the r/w node (Win 7), just as I did in the original case, trying to see if he completes, just as the other node did in the original case a few days ago.

After I restarted BTSync, than node no longer shows that it has only 16 megs left to download. Now it shows that the node is totally empty and has to download the entire share. And for about an hour since BTSync restart, I do see that node but it does not update its amount that it already has downloaded and shows as an empty node.

Nor does it continue to complete the sync.

We are both running v. 1.3.94.

There is an interesting aspect to this that may shed some light on it.

There is the third node on the same share and I know for fact that node works perfectly well and the r/w node has that node on its predefined host list.

Now, interestingly enough, the node that is "stuck" and does not want to continue finishing the download as seen from the r/w node does not seem to complete its download via 3rd node either. When I look at that share via that third node it definitely shows the amount left to upload of about 16 megs out of total of 180 megs and it does not show any transfer in progress as shown by upload speed.

Therefore, the node is certainly "stuck". This means to me that the problem lies in the node that is "stuck" and neither in other nodes.

That node is sitting there showing as utterly empty from the r/w nodes for several hours now. And it shows in the middle of being updated, but stuck on the 3rd node on which BTSync was not restarted while it shows utterly empty on the r/w node on which BTSync WAS restarted a couple of times just to make sure it is not the r/w node that is stuck.

How could this be possible if it was able to start the download and completed it up to 90% and then got "stuck"?

This seems might be a problem with locked synchronization object, like mutex, semaphore or lock, whatever the thread is using and that is probably why the node does not go through proper synchronization sequence.

In that respect the core dump might provide some useful information and reveal if some thread is indeed stuck or locked in a dead loop.

I do not understand how is this possible for node to be seen but not to perform the transfers?

Share this post


Link to post
Share on other sites

@stanha

 

No dump needed. First of all I need debug logs from your peers - lets see what is happening inside. Please send them to syncapp@bittorrent.com, also mention this topic in message body.

Share this post


Link to post
Share on other sites

@stanha

No dump needed. First of all I need debug logs from your peers - lets see what is happening inside. Please send them to syncapp@bittorrent.com, also mention this topic in message body.

I do not have debug logs from the node that is stuck. This is a general public situation and I have no clue about that node.

I can provide you logs from the r/w node and from the 3rd (r/o) node that is sitting right now trying to complete the upload.

Btw, is debug.txt (on Linux Ubuntu) being sampled in real time or I have to restart BTSync on that r/o node? Because if I have to restart, the state of the node will change as far as I can see.

Let me know if those logs that I can provide might be useful to you. Else, I have no time to waste on this.

Thanx.

Share this post


Link to post
Share on other sites

@stanha

 

I need debug logs from the peer that is hanging - and, yes, debug logging starts only when you restart the BTSync on peer. Other logs might not be useful.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.