Using Windows Btsync For Dfs Between 10-100 Nodes In Amazon Ec2 Cloud


klytus

Recommended Posts

We run a suite of grid computing applications (sadly, all Windows only) in the EC2 cloud to produce computer graphic rendered content for film and television. Most recently we used our EC2 system to produce several shots for an upcoming episode of the new Cosmos series.

 

In doing so we ran up against severe IO problems as 80 nodes attempted to access a 1.3GB particle cache file from the EC2 instance acting as a server, as it was a several thousand frame sequence, every machine needed to access this file dozens of times each.

 

We solved the issue by burning the cache into the C: drive of a one-off Amazon Machine Image that was then spun up into 80 nodes to complete the work. This work-around is fine, but time consuming to setup so we decided to implement a Distributed File System between all the nodes using BTSync.

 

BTSync works wonderfully between our studio and the cloud, we are in love with the software already. However once in the cloud, we run into the following problems:

 

1: We have found in EC2 that any application that searches the cloud subnet for other running instances of itself will not find any other nodes. Amazon don't allow certain kinds of packets across their network. BTSync works fine if each node is explicitly setup with the IPs and ports of the other nodes, but no matter how we configure the firewall rules on the EC2 side, searches of the local subnet don't work, the nodes can't find each other and nothing syncs.

 

2. Because every render node is spun up from the same master image, they all come up with the exact same BTsync device name which seems to cause problems. Because there is no command line interface in the Windows version of BTSync there is no automated way to set this. Manually editing the config files (which don't appear to be ASCII) always fails because it disrupts a checksum and the config file is renamed by the app to .bad.

 

While there is probably no way around problem 1, for problem 2 the addition of a command line interface would be immensely useful to us because we could run a startup script to set the BTSync device name for each node to it's unique EC2 machine name, and also as part of the same script create sync shares with the appropriate secret with the IPs and ports of the other BTsync nodes on the local subnet.

 

As an alternative to a command line interface, a build of Windows BTSync that uses plain-text config files would be hugely welcome, as we could set those up with scripts.

 

As I said, we love the software, and these problems notwithstanding  we see it could have great potential for us with large-scale cloud applications.

Edited by klytus
Link to comment
Share on other sites

Hi klytus,

 

First of all, there is a possible solution to reconfigure peers with script after rolling out the image. BTSync can be controlled with configuration file. For windows, you need to launch it with /config <path_to_config> parameter to force grabbing all the parameters from config file (here is a sample config file).

 

Also, there is one more issue that could happen if you roll out the same image on several computers. Every peer has it's unique ID which it presents to tracker and other peers to identify itself. Peer ID is stored in settings.dat file. When all of the peers present the same peer ID they are not going to see each other.

Peer ID is generated on first run of the app, or if settings.dat (with its backup settings.dat.old) is not found. So - either make sure that btsync never started before rolling out the image, or remove settings.dat to regen peerID.

 

Also, here is the information on how peers communicate so you can troubleshoot connectivity issue.

1. Peers discovery in LAN.

Peers must be in the same subnet.

Peers try to multicast 239.192.0.0 UDP3838 to discover each other.

If discovery successful, peers connect directly to random ports (configured in preferences)

2. Peers discovery using tracker server

Peers try to connect t.usyncapp.com over UDP 3000 to discover each other over Internet.

Peers try UPnP and PMP to local router to map ports to themselves.

If failed - connect to relay server r.usyncapp.com over UDP 3000 

If succeed - connect directly.

Link to comment
Share on other sites

Roman,  

 

Thank you! I was able to get everything working thanks to the information you provided. Amazon do indeed block all multicast and broadcast packets on their system. The solution was to bind a public IP to every EC2 instance on boot (a checkbox in the launch screen) they all then connect to the tracker server, then find each other that way and transmit all data over the local subnet. The more nodes I spin up the more efficient the system seems to be, it's perfect so far.

 

The solution to starting a unique BTSync device on each cloned machine is to start them with an empty config directory on boot (as you advised). I have a small script that launches BTSync with /CONFIG pointed to a config file that configures all the folders to sync, with the devicename field commented out so the sync device always equals the unique Amazon machine name for each instance. 

 

For some reason if I reboot an instance the only way BTSync comes back up properly is with the storage directory wiped clean again so my startup script takes care of that.

 

I've been testing the system on a new job and it's performing well. The cloud farm nodes and my local machine's filesystem are seamless blended, and as our render nodes are now working off their own locally attached drive the thing works quite wonderfully efficiently. 

 

 

 

Hi klytus,

 

First of all, there is a possible solution to reconfigure peers with script after rolling out the image. BTSync can be controlled with configuration file. For windows, you need to launch it with /config <path_to_config> parameter to force grabbing all the parameters from config file (here is a sample config file).

 

Also, there is one more issue that could happen if you roll out the same image on several computers. Every peer has it's unique ID which it presents to tracker and other peers to identify itself. Peer ID is stored in settings.dat file. When all of the peers present the same peer ID they are not going to see each other.

 

 

Edited by klytus
Link to comment
Share on other sites

klytus,

 

Glad to hear that it is working for you now.

 

For some reason if I reboot an instance the only way BTSync comes back up properly is with the storage directory wiped clean again so my startup script takes care of that.

 

 

That's rather strange. The storage contains a DB with all the info regarding folders and it should not be purged before every start of btsync. Could you please describe what BTSync is doing and what is happening if you do not clean up storage dir?

 

Thanks!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.