Noel Hibbard

Packet Loss on Win8x64

Recommended Posts

I have been having very bad packet loss on my machine the past 3 weeks and have tried everything under the sun to find the cause. New cable, different port on the switch, different NIC, firewall changes and on and on. I finally figured out what was causing the problem. BTSync! As soon as I close BTSync the loss goes away. Start it back up and I am back to lost packets. I am not talking about a packet here and there, I am talking about 60 - 70% loss. I am extremely excited to find the source but not be able to use BTSync anymore.

Has anyone ever experienced this and if so, any ideas?

This is on Win8x64 and client version 1.1.69.

Share this post


Link to post
Share on other sites

I don't have any problems transferring files. The problem is packet loss on my whole computer. BTSync works fine otherwise. This isn't the first machine I have experienced this on either. I was running it on a server that is running Server 2008 R2 x64 and it was doing the same thing. It would drop packets so bad you could hardly even connect to the machine via RDP. As soon as you would kill BTSync the machine would go back to normal.

What is odd too is BTSync causes me to drop packets even when it is totally in sync and idle. BTSync isn't creating any CPU activity or disk activity either.

I have never seen a windows app cause packet loss before. Bad drivers, bad NIC, bad cable, bad port on switch or bad stitch, but never an application.

Share this post


Link to post
Share on other sites

One thing I want to add to this. I am using PingPlotter from another machine on our network to monitor the packet loss on my machine. This packet loss is all local. This isn't a case of flooding all my outgoing bandwidth or anything like that.

Share this post


Link to post
Share on other sites

Like UDP, ICMP (ping), is an un-acknowledged protocol. By measuring packet loss from another machine via ICMP-ping does not necessarily indicate packet loss at the machine running BTS. It can be loss at any intermediate switch or router. Packet loss is normal on ethernet (or most other L2 transports) and IP networks. Higher level protocols like TCP take into account congestion to try to avoid packet loss and retry when it does happen. UDP and ICMP don't. This is a very simple explanation and I don't know what your network infrastructure looks like so can't help more. Just be careful in the conclusions you are drawing. It may be your network is congested.

That being said, it is odd that even when BTS is in sync and idle that you see this problem. I could understand it if your network was under load, but it sounds like it is not. There might be a BTS client that is babbling somewhere on your network.

Are all your machines on a LAN or other?

Share this post


Link to post
Share on other sites

Yeah I understand ICMP isn't guaranteed, the thing is I will be doing something like remote controlling another machine where it is very obvious that packets are dropped because the whole screen freezes for a few seconds and then I look over at my second monitor to see PingPlotter showing a bunch of red. Then it recovers and my RDP session starts responding again. Some goes for websites. I click a link and it just sits. Or if I have a file transfer going (via HTTP) and it will just hang for a few secs and then resume. Shut down BTSync and all goes back to normal.

I thought maybe it was related to other machines on my LAN so I shut the one and only client on the LAN other than my own. I also shot down one that was running on my machine at the house which is connected via IPSec just to rule that out too. The problem persists. Then I started removing shares one by one and one they are all gone the packet loss goes away. I tried adding them one at a time and in different orders trying to find one share that may have an older client with a bug or something but the packet loss comes back the second I add a share.

I have watched the traffic (with shares but all in sync and idle) with Wireshark and there is still a lot of traffic, but really any traffic in Wireshark looks like a ton. hahaha. I need to dig in a little closer with Wireshark and see if I can find anything consistent.

So right now I am just starting BTSync on demand. Kind of a pain. :(

Share this post


Link to post
Share on other sites

Okay... I think I am getting somewhere. As I said, packet loss is still happening even when syncing is paused (not to be confused with idle). So with syncing paused I decided to take a look at the traffic with Wireshark and I see that the only traffic going on now is broadcasts and what looks like some talk with trackers. So I then went into all my shares and disabled the "Search LAN" option. So now while syncing is paused I am only seeing traffic with trackers. The broadcast traffic stopped. After making that change I saw the packet loss go away. To be sure I left it for about 10mins (really 10secs is long enough seeing how bad it was before). It stayed clean the whole time. So I just added back all of my shares which should dramatically increase the load. When I first added them all back I saw some packet loss again but after going into these shares and disabling the "Search LAN" option the loss has gone away again.

I am going to let it run for several hours and see how it goes. It has been clean for another 10mins so I think this is the answer. So now the question is why would the "Search LAN" option cause my machine to have packet loss that is so bad I can hardly get anything done?

Share this post


Link to post
Share on other sites

my machine at the house which is connected via IPSec

Could it be that the IPSec implementation you are using at either end is misconfigured or buggy in a way that it is not properly dealing with multicast? Thus causing a multicast reflection storm of sorts. Just a theory, but if you can disable your IPSec connectivity and see what happens.

Share this post


Link to post
Share on other sites

That is a good idea. So I shutdown my tunnel and now that peer can't be reached (without the relay). When looking at the debug log I see these lines:

[2013-08-28 12:47:30] Peer 7: 76.123.123.123:38820 003B8615EF1D57ED65734960A43EB8C9DA0D3222
[2013-08-28 12:47:30] Peer 7: local IP 10.18.1.1:49525

I masked my public IP in this log but you can see the port number on the public IP is different from the private. 49525 is the port that I have set for the listening port and is auto forwarded by UPnP on my router. I have no clue where port 38820 came from and how it expects people to communicate on that port if it isn't requesting that port to be opened via UPnP. So my peer at home is going through the relay now. I manually added 76.123.123.123:49525 (again, a fake IP) to my peer list and now it is direct again.

I guess I am a little confused about how peers discover each other.

But I guess I am getting a little off topic. I will report back my findings related to the packet loss stuff.

Share this post


Link to post
Share on other sites

@Noel:

BTSync sends broadcasts if you enable the "Search LAN" feature. I guess you have some kind of network loop. As this loop seams to be triggered by IP broadcast but not the usual ethernet broadcast (ARP), I guess you have one computer that acts as a bridge in this screnario.

Do you by any chance have a computer that is connect via two interfaces, e.g. via two LAN ports or via WIFI and LAN at the same time?

Share this post


Link to post
Share on other sites

When you ask if I have any machines with two NICs I assume you mean machines running BTSync. If so, no. There are however a bunch of machines on the network with two NICs and possibly even both connected (WiFi and Ethernet) but none of those machines are running BTSync. At this point there is only one machine on this network running BTSync and that is my own. The only other place I have BTSync running is at the house on another subnet which is connected via IPSec.

I shut down the IPSec tunnel which broke the peer that was running at the house but I figured it would recover and start routing over the internet rather than trying to route through the IPSec tunnel but it never did (without the help of the relay). I finally just turned the IPSec tunnel back on but even then my peers refused to talk to each other. I removed the shares and readded them a few times (on both peers) and restarted BTSync a few times. I even left it alone while I went out for lunch. Came back and it still wasn't talking. So I shut down BTSync and wiped out my <UserProfile>\AppData\Roaming\BitTorrent Sync folder and then relaunched BTSync, added my shares and it instantly started syncing and isn't dropping packets either.

I am seriously confused. Hahaha

Can't wait until we get an API for BTSync so I can write a tool to export and import all my shares so I don't have to do it by hand every time.

Share this post


Link to post
Share on other sites

No, I meant any computer connected twice. If this computer forwards the broadcast from one interface to the other that could lead to a forwarding loop. You could test this theory if you connect each computer only once and reenable the search LAN feature.

If I am right your problem is not caused by btsync but by your network setup and just triggered by broadcast packages.

Share this post


Link to post
Share on other sites

I just did some closer examination of the broadcast traffic with Wireshark and I am seeing some malformed UDP packets. The length of the data on some of these packets doesn't match the payload header. Some of these multicast packets are truncated.

Here is a snippet of a few good packets along with a few bad ones:

https://dl.dropbox.c...k/Sample.pcapng

Right now I have 4 shares. If I have "Search LAN" enabled on all of them I get packet loss and in Wireshark I see these truncated multicast packets. I have a filter set in Wireshark to ip.dst == 239.192.0.0 so I can easily see the stuff I am conserned about. I then started turning off "Search LAN" one share at a time and found one share that isn't producing these truncated packets. So I can actually enable "Search LAN" on this one share and don't get any packet loss or truncated packets. But on any of these other shares I get instant truncated packets and packet loss if I turn "Search LAN" on.

I shot a video of my desktop (both monitors) while I was running through several scenarios. You can view it here:

http://www.ustream.tv/recorded/37963169

The video contradicts what I typed up here though. It seems very inconsistent. I typed that post up, then had LAN search disabled on all shares for several minutes while I tracked down some screen recording software. Then when I got back to it I could reproduce the exact same results. Now it sees no single share will produce the malformed packets. You have to enable it on two or more to generate the malformed packets. But the one thing that is consistent is that the packet loss coincides with the malformed packets.

Share this post


Link to post
Share on other sites

I have duplicated this situation on three machines on the office LAN so far. I just tested it on my LAN at home and none of the machines have this (malformed packet) problem. How could the network cause BTSync to build malformed packets?

Share this post


Link to post
Share on other sites

Ok that is interesting.

The packets are broken in a very special way, the length field in the IP header is correct but the length field in the UDP header is off by 4 bytes. That either means that you have something in the network that chops off 4 bytes and corrects the IP header but not the UDP header or somehow these packages are corrupted while being sent.

Looking at the packet contents it seems that there are actually 4 bytes missing and I think the last byte is "e". Are you capturing the packages at the source or somewhere else in the network?

Share this post


Link to post
Share on other sites

Yes you are right, if you look at the data in my packets it should look something like this:

d<RecordLength>:<Record><RecordLength>:<Record>e

But you can see in my data the packets are truncated so it is something like:

d4:test10:012345

When it should look something like:

d4:test10:01234567890e

I am capturing at the source which is odd, what could possibly be manipulating the packets before they leave the machine? For the other machines that I tested I was just capturing traffic from a remote machine. So hardware in between could have been a factor.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.