Proof Of Concept - Bittorrent Sync Based Vpn


root-12

Recommended Posts

Bittorrent hack - Virtual Private Network

It always annoyed me a little, that sync providers have the mindset: “Either it all goes here, or it goes there…” Fact is I want remote control over my files from any device at any time, so this is the specification of how I have created my own VPN using BitTorrent Sync running as an OS agnostic background service…

A Network Attached Storage.

I have a 3Tb NAS with five folders:

  • /BTsyncApp/
  • /BTsync/catalogue
  • /BTsync/files
  • /BTsync/shared
  • /BTsync/devices

/BTsyncApp contains the server application. It is not synchronized using BTsync. It is just the application.

 

/BTsync/catalogue contains a html page (catalogue.html) with all files and is BTsync’ed to all AVAILABLE ON ALL DEVICES. This is about 100kb of simple html text.

 

The webpage looks like this:

+---------------------------+

| HTML5 -- simple app ---   |

+----------------------------

| files/folder/             |

| files/folder/name.ext     |

| files/folder/name2.ext    |

| files/folder2             |

| files/folder2/name.ext    |

| files/folder2/name2.ext   |

+---------------------------+

When I click on any hyperlink I get a box with the following options:

  • [get]
  • [open]
  • [delete]
  • [destroy]
  • [move]
  • [share] / [revoke]
  • [create torrent]
  • [create 1-time torrent]

if it is a folder I get the following additional options:

  • [create folder]
  • [share folder] / [revoke share]

 

/BTsync/files contains approximately 2 Tb of data, and there is no way I want BTsync to synchronize this data across my devices.

/BTsync/devices contains a folder for each device. For example:

  • /BTsync/devices/HTCphone
  • /BTsync/devices/win7Laptop
  • /BTsync/devices/LinuxLaptop

And finally /BTsync/shared is used in a special manner which I’ll explain later.

The Devices

All devices have BTsync installed and have two folders: catalogue and files. For example:

My old HTCwildfire has the folders:

  • /BTsync/catalogue
  • /BTsync/files

My windows laptop:

  • D:/BTsync/catalogue
  • D:/BTsync/files

My linux laptop:

  • /home/me/BTsync/catalogue
  • /home/me/BTsync/files

I can now open the html file from ./catalogue as a webpage in any browser on any device.

The attentive reader would have noticed that only the NAS has the folder /BTsyncApp. The devices do not. That is because the server generates the catalogue-webpage in /catalogue. The devices create a hidden appended worklist within the webpagefile which the server analyses. Thereby the only place where changes can be made is on the server, which prevents synchronization conflicts of the file catalogue. Other files may be updated asynchronously but that will influence the content of the file, not the location - which is what the BTsyncApp looks for.

HTML view

When a given file is not on the device, the link [get] is bold.

 

If I click on the link [get], the htmlfile is saved, with a setting “get” for the filename. The htmlfile is then delta-synced effectively using BTsync, whereby a python script on the NAS will discover the delta and will trigger python to create a hard link from the NAS to the BTsync folder of the device. Very simple.

 

For example: On HTCphone [get] “drive/folder/name.ext” will hard link /BTsync/files/drive/folder/name.ext to /BTsync/devices/HTCphone which is BTsync’ed with the phones folder /BTsync/files/. So after the hard-link is established, the file is automatically downloaded to the phone.

The same occurs if I [get] a folder: The hard link implies that the rest of the tree should be synchronized.

 

Warning: (because of fumbling I have learned to prompt a “confirm/cancel box” in the webpage with very large buttons).

 

When the synchronization is completed, the python script on the NAS updates the webpage-file and synchronizes it across all devices.

 

At this point in time the file should have been synced to my device, and the link [open] is changed in the htmlcode to bold, whilst the link [get] is normal, i.e. no longer bold.

 

Of course, if it is a large volume of data, and I want to know how far my download is in the meantime, I can just open the BTsync application and see what it is doing.

 

Now if I chose to delete a file in BTsync on a device, the local file is of course removed, but on the NAS only the hard-link is removed from the particular device folder.

 

So for example: me@linux$ rm /home/me/BTsync/files/drive/folder2/name2.ext will delete name2.ext locally, and the NAS will follow by removing the hard link.

The file itself is not deleted from files/drive/filder2/name2.ext only the hard link for the device. I will get back to how this may be achieved further down.

 

When BTsync makes a change to the folder pynotify reports a change on the file system and triggers the python script to update the simple webpage. Hereby the field [open] is changed to [get]. I’m sure this could be done using some internal mechanism in BTsync, but I don’t know how.

Share with others

If I want to share a particular file with somebody else (this is the cool thing) I press the hyperlink [share]. This prompts the python script to create a folder in /BTsync/shared/ with the sha1hash of the file concatenate with the time, which it adds to BTsync’s list of synchronized folders using command line instructions. Example:

 

  • /BTsync/shared/c191687881d9345c914186591ea850f7c1efb63c-2013-10-28-235959/

 

This is because it permits me to share the same file to multiple persons with different folders and will maintain the ability to revoke them individually.

Next, the python script creates a hard link from the file to the shared folder, which is then ready for remote access using the read-only “secret”. And, keep in mind that the overhead for setting up the hard link is negligible. Example: 

  • /BTsync/shared/c191687881d9345c914186591ea850f7c1efb63c-2013-10-28-235959/name.ext

 

Finally a can find the read-only “secret” in the BitTorrentSync application and give the code to the person I am sharing with, within seconds as hard linking is very fast.

The menu again

  • So my menu is limited for files to:
    • [get] which makes the file available on a select remote device.
    • [open] which indicates where the local file should be.
    • [delete] which deletes the local file and unlinks the devices hard-line on the NAS’.
    • [destroy] which deletes the local file and makes the NAS traverse all links to remove the file and all hard links permanently.
    • [move] which sends a file on a listed device to another (set of) device(s). So I can push the hard link off my mobile phone and onto my laptop. When BTsync see’s that my phone is next to my laptop, it concludes that it is easier to move the file peer-2-peer than requesting it via the NAS.
    • [share] which makes the file available in a separate folder as read-only. When shared, the option changes to [revoke]which FIRST removes the folder from BTsync’s list, THEN unlinks the file from the [share] and then deletes the folder.
    • [create torrent] which makes it possible for people to download the file who cannot install BTsync. For some corporate users this is be helpful.
    • [create 1-time torrent] which - surprise :-) - only can be used once.
  • The following options appear in addition if it is a folder:
  • [create folder] which prompts for name
  • [share folder] which prompts for size in Mb and makes a shared folder on the NAS which any remote device can read the secret from an add to the BTsync. Any size limit greater than -1 will be translated into how much space is available for the user.
  • [revoke share] which first creates folder [name-revoked-dd-mm-yy] then transfers the hard links of all files in the folder [name] to the newly created folder; and then removes the folder [name] from BTsync’s list of sync’ed folders. Finally it is up to me to decide whether I share the read-only or the full-access secret using my BTsync application.
  • [destroy folder] which wipes the content of the folder on all sync’ed devices. The python script deletes any files in the folder and places a flat text note: “This folder was destroyed by the owner. Date/time”. The folder can obviously not be deleted as this would not synchronize across to other devices (doh!) but you get the point.

This is simple, but it works.

The key element is of course a machine of high on-line availability. Since we are not wanting to use public cloud systems, I wrote this for the most low-power device I could imagine: The Raspberry Pi with an external hard-drive.

Python is as close to OS agnostic as it gets, but I leave the concepts described here for others to review, tune and criticize. My objective was to develop the required functionality and prove the concept, as a scripted transactional system which proved to work. Maybe the team at Bittorrent can supersede by integrating this/augmenting the existing UI’s?

 

Todo:*

  • Cleanup function, that - during idle/night time identifies duplicates in the /BTsync/files on the NAS and substitutes them with hard links. It saves space works faster. This must not work outside /BTsync/files as it could erase BackUps.
  • New (4 Jan 2014). Function that load balances storage on devices. Each device will be added a config file (maybe BTsync config file) that states permitted volume. If a device requires files to be synced/download for which there is no space, then this option will permit the application to take the least used, oldest files and remove them from the device to make space for the newly required file. The removed files will still be available on the NAS. The objective is to save the user will from having to worry about local disk-space.
  • New (6 Jan 2014). Function that maps the device IP address, for multiple NAS, so that all files are available in different netzones. The initial idea is to use the first three ip-address digits (ip://a.b.c.x) to assure availability in different zones. Hereby if one ip-provider cuts the network to the NAS in zone a.b.c, the files can be available from another zone (x.y.z). This will apply to devices holding a "/files" folder only.
Link to comment
Share on other sites

Sounds great! If I understand correctly, this could be the full bitsync-replacement of Dropbox. I'm very excited about this...

 

Are you planning to go open source?

 

Yes, he has effectiviy turned p2p distribution into a traditional model of centra content delivery. When each person has its own share, they can't seed to eachother. This is fine if his bandwitd is sufficient to share everything. But then what is the point of BTSync? Some dropbox-clone with a webUI would do pretty much the same, the only advantadge here is the resume option for huge files. Implementing this with encrypted read-only keys also seems like a pain.

 

If we had nested-shares and different encryption for each file and folder we would finally be able to setup shares from a massive share. Then you could make a share containing the right files and folders. And all the users would be able to cross share. This would maintain the p2p abilities.

If we also could split a massive share into pieces we could store it across several nodes. Think if you could split the share into 1GB pieces wtih different offsets. 4GB would then create 8 pieces, the first four starting at 0-1gb, 1-2gb, 2-3gb, and 3-4tb. The other four starting at 500mb-1500mb, 1500-2500, and so on. This would be good for when a node goes down, if 500-1500mb goes down, the load will be shared across two nodes rather than one.

Link to comment
Share on other sites

If we also could split a massive share into pieces we could store it across several nodes. Think if you could split the share into 1GB pieces wtih different offsets. 4GB would then create 8 pieces, the first four starting at 0-1gb, 1-2gb, 2-3gb, and 3-4tb. The other four starting at 500mb-1500mb, 1500-2500, and so on. This would be good for when a node goes down, if 500-1500mb goes down, the load will be shared across two nodes rather than one.

 

That sounds like Tahoe-LAFS: https://tahoe-lafs.org/trac/tahoe-lafs

Link to comment
Share on other sites

That sounds like Tahoe-LAFS: https://tahoe-lafs.org/trac/tahoe-lafs

 

Tahoe-LAFS is far harder to setup, and is also a distributed central system. The advantage of BTSync is that I can share files with people who can then also share it. I can't install tahoe-lafs on my grandmothers computer, I can't set it up and leave it, and I can't have her seed the files she has downloaded. The advantage of BTSync is the p2p aspect. I can share a massive amount of data with ease, and the more I share the more available it will be. If I share my photos with my grandmother, I can download them utilizing her bandwidth. If the encryption scheme was changed it would make managing this so much simpler.

Tahoe-LAFS would probably be a good solution for OP, because his solution already removes the biggest advantages of BTSync. Nested-shares, and being able to create virtual shares using pieces of an existing one would enable far more complex distribution models, while still getting the full advantage of P2P. It is the difference from having an easy way to send a limited set of files to a few users and the ability to send massive amounts of data with multiple people. Having one share with ALL my files, and then being able to tailor shares for each person while still enabling cross-seeding will increase the size of my swarm.

If I used OPs setup I would have to seed every file with every person I want to share with. Then what would be the difference to making a centralized setup with a DNS setup that makes people download from the server with the most free resources? And he has also removed the ability to use encrypted read-only secrets to store his files on insecure locations. It also requires that he has full access to the servers hosting the files to create the hard-links. The good thing about BTSync is I could store my files on my friends desktop, without him seeing the files, and without me having any access to his desktop.

If it was done using my solution I would simply give him an encrypted read-only secret, attach the files and folders I want him to store and distribute, and then I'd have replication and additional bandwidth using his spare space and bandwidth.

Link to comment
Share on other sites

Yes, he has effectiviy turned p2p distribution into a traditional model of centra content delivery. When each person has its own share, they can't seed to eachother. This is fine if his bandwitd is sufficient to share everything. But then what is the point of BTSync? Some dropbox-clone with a webUI would do pretty much the same, the only advantadge here is the resume option for huge files. Implementing this with encrypted read-only keys also seems like a pain.

 

If we had nested-shares and different encryption for each file and folder we would finally be able to setup shares from a massive share. Then you could make a share containing the right files and folders. And all the users would be able to cross share. This would maintain the p2p abilities.

If we also could split a massive share into pieces we could store it across several nodes. Think if you could split the share into 1GB pieces wtih different offsets. 4GB would then create 8 pieces, the first four starting at 0-1gb, 1-2gb, 2-3gb, and 3-4tb. The other four starting at 500mb-1500mb, 1500-2500, and so on. This would be good for when a node goes down, if 500-1500mb goes down, the load will be shared across two nodes rather than one.

 

An interesting suggestion/comment.

While it is correct that the description above is "centralized" given the presence of the NAS, I believe it is a small step to allow for multiple NAS's by extending the `/catalogue`, and possibly introducing the element of stochastic profiling to assure redundancy based on availability.

 

I believe all that this will require, is:

 

1. A log of availability on each device. 

2. A function to compute the local the set of files which a device may provide availability for in which time-window

 

To create the distributed scheduling method with availability forecasting sounds like an interesting challenge.

Link to comment
Share on other sites

An interesting suggestion/comment.

While it is correct that the description above is "centralized" given the presence of the NAS, I believe it is a small step to allow for multiple NAS's by extending the `/catalogue`, and possibly introducing the element of stochastic profiling to assure redundancy based on availability.

 

I believe all that this will require, is:

 

1. A log of availability on each device. 

2. A function to compute the local the set of files which a device may provide availability for in which time-window

 

To create the distributed scheduling method with availability forecasting sounds like an interesting challenge.

 

 

The precense of a NAS does not making it centralized. It is still P2P if whenever peers come up they can share it between eachother. That is the important part of P2P, every single node can share what they have with every single other node. BTSync has as I mentioned a few problems, and no cross seeding (unless you make hundreds of shares that contain limited amounts of data, but that is pretty unmanagable.

Logging availibility seems fairly simple and reasonable. The function to calculate avilibility at any given time doesn't seem too usefull, but rating the peers and grouping them would make sense. Some user input should be allowed (metadata like ISP, location and bandwidth would be usefull). Then you can avoid your share being distributed on twelve nodes using the same internet connection. It also seems pointless to use bad nodes as encrypted read-only nodes, uploading more than they can upload is just a waste of bandwidth (ADSL connections can have terrible upload, but decent download).

 

Link to comment
Share on other sites

  • 3 weeks later...

Sounds very good. I can see the potential of this, and even though you can technically do this using something like Dropbox, the big difference is you would have control of where the data is, and you know where it sits at all time.

 

Now I may have missed something but I am assuming using the BTSync GUI, you can see which files have been seeded to which computers, correct?

 

Additionally can you please post the link here, when you put it on GitHub? 

I wouldn't mind seeing what I can possibly do with this coding.

Link to comment
Share on other sites

  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.