zemon

Bug: Syncing changed filenames and deleted directories

Recommended Posts

I tried syncing a directory from a Linux machine to a Windows machine. Several filenames were changed on the (source) Linux box and several directories vanished from the (source) Linux box.

The cause was filename characters which are legal in Linux but illegal in Windows, such as the colon ":" For example:

This filename:
.../Medley: song1, song2, song3.mp3

turned into:
.../Medley_ song1, song2, song3.mp3

This directory vanished:
.../Best of Some Band: 1970-1979/...

I don't think that BT Sync should ever change a filename, especially not on the source machine. It certainly should not delete a directory on the source machine, even if the directory cannot be created on the destination.

-- Art Z.

Share this post


Link to post
Share on other sites

Did you share the Linux directory (source) as read-only secret to the Windows machine (destination)?

No, it was a regular (read/write) secret. I wanted a true shared directory. But sharing a directory should not silently delete a subdirectory and it should not silently cause files to be renamed.

-- Art Z.

Share this post


Link to post
Share on other sites

@zemon - you should make sure to contact tech support with this issue. I had a similar bug where the receiver could not rename the !sync file back to the original name, and after awhile it deleted the original on the source. This might be related. They have fixed it in some test builds, and it would be good to know if it fixes your problem also. (Well the deletion part anyway).

If possible I'd follow the steps in this post to recreate the problem and capture more info in the log - then send it off to them.

Share this post


Link to post
Share on other sites

Okay. Well I agree that files shouldn't be deleted but there must be something done about the incompatible characters between filesystems. Without building and maintaining a compatibility mapping for these cases (which is a significant undertaking) I think you should expect either a rename or lack of sync for incompatible files.

The directory that went missing isn't by chance in the .SyncTrash subdirectory is it? At the very least it should be--otherwise that's an additional bug...

Share this post


Link to post
Share on other sites

Colons aren't permitted in file or folder names on windows.

Ah - there you go then - that likely explains why a folder containing a ":" got deleted by BitTorrent Sync!

I'm not quite sure what the ultimate solution would be here, given that windows doesn't allow ":" - or indeed any of the following either \ / : * ? " < > |

If you have two files in a directory on a linux device, one named "file:.mp3" and the other named "file_.mp3", and BitTorrent Sync's current solution is to convert ":" to "_" on Windows... how would/should these two files behave when syncing across to a windows device? Hmmm... I can't see an easy answer!? :unsure:

Share this post


Link to post
Share on other sites

Colons aren't permitted in file or folder names on windows.

He said that in his OP. His issue was that it seems the source directory was deleted simply because it couldn't be created on the destination. Perhaps the destination treated it as a delete and resynced that back to the source?

Share this post


Link to post
Share on other sites

We all know there are many characters illegal in Windows filenames. The point is, however rarely this happens, there can be conflicts between legal characters in the operating systems for which BTSync is built and used. Therefore, these cases need to be handled. Here are the possibilities I can think of:

  • Easiest: if there's a conflict, simply throw a warning and not sync (or delete!) those files
  • Harder: maintain a compatibility mapping table on each system that maps illegal characters from one system to legal characters on another system
  • Hardest: dealing with actual filename conflicts when a compatibility mapping (name change) is made when using the "Harder" method above. This could easily lead to a race condition.

Share this post


Link to post
Share on other sites
  • Harder: maintain a compatibility mapping table on each system that maps illegal characters from one system to legal characters on another system

Harder? I'd say pretty near impossible! If you've got a "file:.mp3" and a "file_.mp3", BitTorrent Sync would "map" the illegal ":" in the first file to a "_" character instead, (the second file wouldn't get "mapped" at all as it doesn't contain any illegal characters)... but then you'd surely end up with two files with seemingly identical names of "file_.mp3" and "file_.mp3"!? ...not sure how BitTorrent Sync could then distinguish between them? (not to mention that your OS wouldn't allow you to have two files with identical names co-existing in the same folder at the same time!)

I think your first suggestion is probably the most logical; simply not syncing files that contain illegal characters to devices that don't support them (but still allow them to sync between devices that do support the characters). That way, if you're only ever syncing from Linux to Linux, there's not a problem, but if you add a Windows machine into that mix, whilst the file would still sync correctly between the Linux boxes, it won't appear on the Windows device (but then this should be denoted in some way - perhaps in the "History" tab - i.e. "Finished syncing with XXX - 1 invalid file(s) could not be sync'd")

Share this post


Link to post
Share on other sites

Oh, c'mon. "Impossible"? :P

The concept is actually pretty simple. The implementation is the part that would require some effort. Perhaps you misunderstood what I meant by a mapping.

btsync already has two special files that are not sync'd between machines: .SyncTrash and .SyncIgnore. Let's just add another one, say .SyncMap. Here's how I envision it working.

Say Machine1 is UNIX and Machine2 is Windows:

  1. M1 wants to sync file:1 to M2.
  2. M2 responds that it will map the file as file_1 on its end.
  3. Both M1 and M2 write in their .SyncMap file the mapping.

I envision the file would look something like this:

Machine1

  • file:1 | M2 | file_1 | sha1

Machine2

  • file_1 | M1 | file:1 | sha1

(I'll get to the sha1 later.) The problem is that .SyncMap as a text file would need to be very specially crafted in order to honor all valid UNIX characters and still maintain the mapping sequence fields, Or the field delimiters could be a binary, non-printable character that's invalid on all systems. Either way, this file can be very easily corrupted accidentally by a user.

So let's say M1's .SyncMap becomes corrupted:

  1. M1 deletes the corrupted file, and now doesn't know it has a pairing with all of those differently-named files on M2
  2. M2 wants to transfer all of the renamed files back to M1
  3. M1 now ends up with two closely named copies of every file
  4. But what happens when M1 wants to transfer the invalidly named files back to M2?

The two problems above can be handled by two methods.

First, the sha1 digest allows both machines to know if a file contains the exact same data, regardless of its filename. So when M2 wants to start transferring back to M1, or when M1 wants to start transferring to M2, the two machines can work out that the files are already present on both systems and simply use the mapping that already exists on M2.

Or second, instead of each system maintaining its own unique .SyncMap, the mapping file itself can be synchronized across all systems and look something like this:

M1 | file:1 | M2 | file_1 | sha1

Now if one .SyncMap is corrupted, it can be resync'd from another machine with ease. But now we have another problem.

Great, we've come up with an elegant way to maintain a .SyncMap file across all systems that survives corruption. But let's address the corruption problem to begin with, and allow for simultaneous changes to the .SyncMap in asynchonous methods so that if, for example, a third and four machine are separated from M1 and M2, and both pairs of machines (M1 & M2, M3 & M4) are making changes to the .SyncMap without involving the others.

Instead of maintaining .SyncMap as a text file, make it a sqlite file. This accomplishes quite a few things:

  • Keeps users from corrupting the file as often with a simple text editor
  • Allows all legal filename characters to be easily handled without trying to make this fit nicely in a flat text file
  • Allows serialization of the records so that asynchronous changes can be made between all machines in the swarm and later synchronized together into a coherent database

Using sqlite is easier than you may think. It's FOSS and libraries are available for everything. It would be very easy to add into btsync. Working out the change in the btsync protocol is the tough part that would require significant effort.

Now let's think a little bit about the "hardest" scenario I mentioned in my previous post. Let's use the same scenario again.

  1. M1 wants to sync file:1 to M2.
  2. M2 already has a different file_1, so map the file as file_1[1].
  3. But file_1[1] already exists, too, so map the file as file_1[2]. ...

You see where I'm going here. That's a race-condition. However unlikely it is to happen, it still could. So btsync must have a counter that stops after so many iterations. Five is probably plenty, but let's make it 100 just to be safe because why not? As long as there's a limit where it finally stops trying to sync that particularly named file.

btsync then should throw a warning somewhere the user is likely to see when using the application, for instance with a "!" icon in either "Devices" or "Shared Folders" tab. Clicking the icon would take you to a filtered list in the "History" tab that would show the problematic files.

The whole situation is solvable, and it's really not that hard. But it would more than likely require another protocol change.

In the meantime, I would propose an update to btsync that would simply skip files that are illegally-named for the destination machine. Use the "!" icon I mentioned above. Hopefully the btsync protocol already allows btsync to know the types of other machines in its swarm, so it would be able to intelligently and active not attempt sync of UNIX files that would be illegally-named on the destination machine.

Share this post


Link to post
Share on other sites

I don't mean to revive a dead thread, but I've been having this issue as well.

The shared directory has a file with a "?" in its title. This syncs fine with all the Linux / Mac machines. But if we add a windows machine to the shared folder, the file gets deleted from every other machine!! The expected behavior would be the windows machine just failing to download that file, not deleting every copy of it! (Luckily it's in .SyncTrash, but I'd prefer not to have to drag it back out of there every time someone boots the windows machine.)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.