The way you update/change files is implemented... wrong/bad :(


Recommended Posts

Hello, I'm new here so hello everyone :) And just to start off before I get into my actual issue is to say say I LOVE BTSync, I use it at home now and it's a VERY good step in the right direction! And I'm not intending to flame the project but raise a technical concern as I'm understand it at this time.

I'll start off saying this is related to delta synchronization because this is where BTSync just... dies and can't do. They can't do it I believe quite fundamentally in it's current implementation and the whole internal structure/architecture needs to be re-thought about and re-implemented if im not completely incorrect in my thinking here about how BTSync currently works and behaves. This is sad because I wish it could have started off as a perfect sync option instead of have such a flaw like this, this means we need a v2 or a new product to get around this, hopefully a v2 is all that's needed but I don't have such a say.

All of what I'm saying may be off and if so... I apologise lol but I believe I need to at least mention this.

As I understand it the "secret" is basically like a .torrent repository file which is a non-static repository torrent that is specific to this product/project that holds static sub-torrents inside of it as individual torrents themselves. This is what allows the files to be sent and received so perfectly and in a distributed manner. THIS I LOVE; to me the file distribution mechanism as torrents are themselves perfect.

The problem we're all facing is when a file in the folder is changed, it's like a completely new .torrent file is created for the new file and is 100% separate from the original file, and this is what causes the re-transfer of the entire file to the destination sync points. Unless the BitTorrent team has some way to diff/delta a .torrent file then this point exactly is why the current implementation of modifying a changed file is flawed/bad/unacceptable and just can't do delta transfers. In theory since all torrents are chunked into blocks make it seem this delta is simple and trivial. But that .torrent file is just a hash of all blocks and has tracker info, even I think its almost impossible to diff from a .torrent files contents since their all 1 way checksums to verify the file contents.

What I'm proposing is for the team to implement diff/delta functionality into the client similar to this:

Use an open source implementation of a GOOD diff program like xdelta3 (http://xdelta.org/ or https://code.google.com/p/xdelta/) since it is VERY fast and open source and doen't require much memory at all, or make your own if you really want to... And WHEN A FILE IS ADDED OR CHANGED, delta it right away so both the source AND the final file can be compared ad-hoc or make a way where both files can co-exist upon update to get the diff and not require the disk space of source + destination file, and just do the .diff in-memory on import or do it however you want really.

After that problem is solved in however way they think they can get source + destination files compared to create the delta. Now instead of forcing clients to actually re-sync the ENTIRE file as BTSync does now from the fundamental reason of using static .torrent files for each file, just sync the diff file as its own static .torrent file. So that way on the endpoints all the clients need to do is use the old file and the diff to create the final updated/modified file. Since the client/protocol already uses or detects the most recent file, they just care about the most recent diff + the full file, so even after a file has been initially sync'ed, it can just pull in the delta files and apply it and have the updated file ready to go after an arbitrary amount of time in between the last sync done and most recent file available.

So this is how I propose they should go about the delta syncing of files especially LARGE files that the torrent protocol is truly the best available to transfer such large files. If they want to be the actual solution to this problem then awesome, this should work, if they don't then a new replacement WILL come from some other place and replace BTSync as the superior alternative anyway. I honestly hope the official team at BitTorrent Labs will just do this for a v2 instead of have some random 3-rd party make it even though I don't care who makes it so long as it's open source/free of charge.

It is quite a technical problem in multiple ways to merge BitTorrent + Delta's working in hormony if how I think sync works now, and will merit a v2 or some replacement to do exactly this type of functionality.

Well thats it for me right now, I hope i didn't seem to offensive or was completely technically wrong. Thanks for reading and I hope the official team considers this or a similar solution for delta syncing as it IS VERY IMPORTANT, important enough for people to stop using BTSync when an alternative that does do it crops up.

Link to comment
Share on other sites

Firstly, welcome to the forum!

Secondly, have you tried searching the forum for "delta" or "diff", as there are a number of topics discussing this issue - for example here and here

According to the developers "actually Sync supports delta copying if file size stays the same. However implementing rsync like diff in distributed environment without dedicated server, is kind of complex task." (Source)

Given the "alpha" nature of Sync (Sync has only been publicly available for the past two months!), and the complexity of implementing true "diff" sync, this is likely something that will be implemented at a much later stage. There are several requests already for it in the wishlist thread, and the developers "do understand importance of the diff and its advantages." (Source)

So please be patient and remember that Sync is only an "alpha" at this stage - the developers can't implement EVERY feature request at once, it will take time!

Secondly, your understanding of "secrets" as .torrent files is incorrect - please read the FAQ and Unofficial FAQ for more about how secrets actually work (The unofficial FAQ also gives you answers as to the partial "delta" sync that's already available in Sync!)

Link to comment
Share on other sites

Thank you for the welcome :)

I have looked around and the Sync team mention this "However implementing rsync like diff in distributed environment without dedicated server, is kind of complex task." so the sync of a diff which the client can manage/apply to the pre-transferred file could work within their current infrastructure as long as the diff creation/application process can be done without too much problem. But if versioning becomes a future feature, that makes diffing the versions a piece of cake as I mentioned it.

They also mention the "files whose size does not change" aspect, which is almost exclusively VM images and database files with a fixed size that never change which may be practical for corporate environments but I hope this product is for end-users/consumers. So diffing shouldn't be that bad of an idea for files smaller than 100MB either since I imagine thats the majority of us using it. I just hope the complexity tradeoff can be worth their time to implement diffs since it's always a time & space win.

Honestly though, each end-user client running this program IS a server, each instance of us running it is. BTSync is even closer to being a true BT server/tracker since each of us knows each client connected to our secret which we can see in the GUI on the Devices tab.

Personally though, I don't think rsync is good enough in a distributed setting like this since its too dependant on only one destination to sync to; completely unscalable in torrent terms.

So having our clients handling diff application per secret/file should be workable for this.

Link to comment
Share on other sites

Also regarding those secrets, of course they aren't a .torrent file per-se, more of a concept mapping to what a container .torrent file could be like & encoded via that string since an encoding + reversible compression is possible if the description can be small enough to encode inside of that secret. Similar to like a small and tidy magnet link URL which could be coded into the secret, but it's just a guess for how the secret is actually implemented.

Link to comment
Share on other sites

I don't see what the problem is as they already implement a big chunk of the rsync algorithm when they transfer the metadata (.torrent) file.

For torrent they have to generate a file with the hashes of chunks of the file and send it to the receiving node.

For rsync they have to generate a file with pairs of hashes of chunks of the file and send it to the receiving node.

For torrent they copy matching block aligned chunks from the old file to preseed the new file.

For rsync they scan the old file looking for byte aligned chunks that match the little hashes (CRCs) from the metadata file, if the big hash also matches they copy the chunk to preseed the new file.

The only other difference is that rsync normally uses smaller chunks; okay?

Easy! :)

PS: Don't use MD5 as the big hash as it's been broken for this use case.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.