hhz

Feature Request: A smarter way to handle tagged files (MP3, FLAC, AAC, etc.)

Recommended Posts

Hi!

I'm using btsync for my music collection. A long time ago, I ripped all my CDs to FLAC, but only now am I adding cover art and correcting the tags. For this, I'm using several computers and btsync is very nice to make sure that all is in sync.

As you can imagine, adding a cover image or correcting a typo in a song title only changes very little of a FLAC file, while the actual music content stays the same. However, the btsync checksum algorithm cannot detect this and afaict transfers the whole file yet again.

Is there no smarter way to handle this?

Would it be possible to add some extra knowledge about popular tagged file formats (such as MP3, FLAC, AAC) to compare two versions of a file that otherwise has the same name and content length?

A similar problem are compressed file containers (ZIP, tar.gz, etc.), there are even file formats like Open-/LibreOffice that are using ZIPs. Make a very small change to the content and the container file changes significantly, making incremental file transfer very difficult to do.

Thanks!

Share this post


Link to post
Share on other sites

I think parsing individual file formats would be a lot of work for very little overall gain. I'd rather see moved block detection in the sync algorithm so all file formats can profit from the improvement.

Share this post


Link to post
Share on other sites

Hey there.

I'm a little split minded. On one side I agree, and on the other I really dislike this.

You're suggesting one thing:

Btsync could have a deep knowledge about different file formats, could know different types of common changes on those and therefore could be able to use more effitient diffing algorithms based on the given file type.

But there is one thing I am in fear of:

This is like reprogramming storage engines of other programms inside of btsync, which directly leads to some "magic".

It's not that btsync couldn't only change a view bits on a file if necessary. But there are file formats where kind of randomnes is part of the algorithm of persisting data. On those formats, if you save a single file twice, the resulting files do have different hashes and so are different. This situation has clearly to be avoided.

And since this can easily become an over and over discussion about which file formats are important and easily to track, I would suggest to not implement a single of them but make kind of a plugin concept which allows individual differs/resamplers for different file types. So whoever wants his very own file type synchronized in a very individual and performant way is free to publis a diffing pluign for that.

I have no resource here to quote, but I remember a thread where Bittorrent staff talked about blockwise synchronization somewhere. I don't know the exact block size, but I guess it's like 4MB or something. So whatever you do in a file, as long as your concurrent changes in the bitstream of a file do not exceed 4MB, you should never synchronize more than 8MB per file (which is two blocks). Usually you only need to synchronize 4MB.

Might be a nice thing to adjust this block size, but currently I only know for "max_file_size_diff_for_patching" parameter, which is the total file size a file should not exceed to be subject to partial synchronization.

You should move your comment to the wishlist thread, btw. That's the place for feature requests.

Regards,

Stephan.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.