Reproducible data loss (same as old file overwriting new files)


letic

Recommended Posts

Hi guys,

I was testing btsync to see if it could replace unison in my setup, and hit a very nasty bug that seems to be frequent for a couple of users (ie http://forum.bittorr...by-older-files/ ).

As it seems to be reproducible in all my tests I decided to set up a very simple environment to reproduce it. I don't think the configuration matters much as people in the forum seems to have the issue with Windows and Mac as well but here it is anyway :

  • btsync 1.0.134
  • debian sid on both nodes one is amd64 the other i386. Tested also with a freebsd 9 server (Nas4Free x64)
  • use the package provided by http://debian.yeasoft.net/btsync
  • conf on both node (running as the user letic) :

    {
    "device_name": "LeTic - btsync Server",
    "listening_port" : 0,
    "storage_path" : "/home/letic/.btsync2",
    "check_for_updates" : false,
    "use_upnp" : false,
    "download_limit" : 0,
    "upload_limit" : 0,
    "webui" :
    {
    "listen" : "0.0.0.0:8888"
    }
    }


Here are the steps to reproduce (host A is 1386, host B is amd64) :

  • On Host A :

    mkdir btsync_test
    cd btsync_test


  • Create a 1M file :
    dd if=/dev/urandom of=1 bs=1024 count=1024


  • Create a 10M file :
    dd if=/dev/urandom of=2 bs=1024 count=10240


  • Check the resulting timestamp. ie for me :

    ls -la
    total 11268
    drwxr-xr-x 1 letic letic 4 May 27 12:26 .
    drwxrwxr-x 1 letic letic 574 May 27 12:00 ..
    -rw-r--r-- 1 letic letic 1048576 May 27 12:09 1
    -rw-r--r-- 1 letic letic 10485760 May 27 12:10 2


  • On Host B :

    mkdir btsync_test
    cd btsync_test


  • Create simple 1 and 2 file :

    echo "blabla" > 1
    echo "test" > 2


  • Check the resulting timestamp. ie for me :

    ls -la
    total 24
    drwxrwxr-x 2 letic letic 4096 May 27 12:45 .
    drwxr-xr-x 89 letic letic 12288 May 27 12:45 ..
    -rw-rw-r-- 1 letic letic 7 May 27 12:45 1
    -rw-rw-r-- 1 letic letic 5 May 27 12:45 2


  • On Host A and B enable debug with debug.txt and start the btsync daemon
  • On Host B add the btsync_test folder through the webgui. I just click on generate and choose the folder (so it use the default writable conf)
  • In the GUI I get : /home/letic/btsync_test - 12 B in 2 files
  • On Host A add the key from host B and choose the corresponding folder
  • File from Host A are uploaded on Host B and replace the files that had a newer timestamp. Here is the result on Host A :

    ls -la
    total 11292
    drwxrwxr-x 3 letic letic 4096 May 27 12:50 .
    drwxr-xr-x 89 letic letic 12288 May 27 12:47 ..
    -rw-rw-r-- 1 letic letic 1048576 May 27 12:09 1
    -rw-rw-r-- 1 letic letic 10485760 May 27 12:10 2
    -rw-r--r-- 1 letic letic 32 May 27 12:48 .SyncID
    -rw-r--r-- 1 letic letic 783 May 27 12:48 .SyncIgnore
    drwxr-xr-x 2 letic letic 4096 May 27 12:48 .SyncTrash


  • Stopped both daemons

Of course I attached both debug log, Hoping it will help you.

Let me know if you need any more info.

Cheers

LeTic

PS : Also sent to support email

synclogs.tar.gz

Link to comment
Share on other sites

I had a look at the logs and there is something clearly wrong during the timestamp detection. This is an excerpt from the hostB logs (the host that is sharing the files and have the more recent file timestamps) :


[20130427 12:49:51] Got id message from peer LeTic idefix - btsync server (88D47FA93F2A0B698886BFD853D3F23A63575060) 1.0.134
[20130427 12:49:51] Merge: processing root message, remote hash F0B403102AD4C49DC22B7DAE7BAB0E1D7B8A89F6, timediff: 0
[20130427 12:49:51] Merge: requesting elements for root
[20130427 12:49:51] Merge: processing elements message for
[20130427 12:49:51] Merge: will request file for /1, ltime:1369651713 lhash:0AB28019C04B5615D780753D5D708CAAB040457A rtime:1369651791 rhash:72068E7D91A8ABB40EBADD1A18D539A1B063
1EC5
[20130427 12:49:51] Merge: will request file for /2, ltime:1369651713 lhash:644BDED24E5A1B15BCC688F99B2C7A93561CC684 rtime:1369651791 rhash:D32683D5D35CA5C092D089992453C1ADF6E4
5D4C
[20130427 12:49:51] Merge: sending get_files message
[20130427 12:49:51] Merge: processing files message with 2 files, timediff: 0
[20130427 12:49:51] SyncFilesController: Got 2 files from remote (192.168.1.95:17464)
[20130427 12:49:51] SyncFilesController: Got file from remote (192.168.1.95:17464): 1 state: 1 type: FT_FILE total:64 have:64 t:1369651791 mt:1369649385 A659E6EAD895FA11FFEA8D4
3B92C11CC82AB4278
[20130427 12:49:51] SyncFilesController: remote file 1 is newer and different from local. Removing local torrent.
[20130427 12:49:51] SyncFilesController: Got file from remote (192.168.1.95:17464): 2 state: 1 type: FT_FILE total:640 have:640 t:1369651791 mt:1369649420 64D32D02F3EF8D080DCAA
F425FA33D8AC164926E
[20130427 12:49:51] SyncFilesController: remote file 2 is newer and different from local. Removing local torrent.
[20130427 12:49:51] State sync finished for folder /home/letic/btsync_test

As I showed in my original post the local timestamp (ltime) should have been somethng around :


date -d "May 27 12:45" +"%s"
1369651500

while the remote (rtime) actually is :


stat -c %Y btsync_test/1
1369649385

Interestingly the local detected timestamp :


date -d @1369651713
Mon May 27 12:48:33 CEST 2013

corresponds to the time the folder was shared on HostB, it even appears as the last parameter to the webgui "&t="


[20130427 12:48:33] HTTP: IP 127.0.0.1: GET /gui/?token=seGvHc8DHbvSwn4cj8CO8H3a-vsiqoNYjp7VN50nemCPPGoJu_ZwGvY5o1EAAAAA&action=addsyncfolder&name=%2Fhome%2Fletic%2Fbtsync_test&secret=FT74ESZJZDC4AW7C4OKA2HMLB3ENWD3R&t=1369651713720
[20130427 12:48:33] SyncFilesController: started periodic scan
[20130427 12:48:33] SyncFilesController[file updated]: processing file /home/letic/btsync_test/1 1369651552 1

The same is also true for the remote which timestamp (1369651791) is the one from when the folder was added on HostA.

So from these logs we can understand several things :

  • Btsync should only be used with an emtpy remote directory, if not, you HAVE to make sure (using rsync or unison) that both copies are identical
  • If you still want to sync folder that are partially synced then you should create the share on the host that has the oldest version of files and then add the one with the newer files, this way the behaviour should be correct
  • It is not yet clear what happened on reconnects. What timestamps are used then ? I'd venture a guess that it's using the time the machine's reconnected thus resulting in old files overwriting newer files.

This seems quite a bogus way of working so I guess this is just a nasty bug that crept up, or a specific usage case the developers haven't tested.

Waiting for confirmation from the official devs.

Cheers

LeTic

PS : I know the secret is in the excerpt (it's also in the atatched logs but this is only a test folder that has already been deleted)

Link to comment
Share on other sites

Sync uses timestamp of the file, when it was added to the Sync directory. I.e. when you add folder to the Sync this time is treated as timestamp of the file. Sounds complex, but there is one use case why we did it this way.

Assume you have file 1.txt with timestamp of May 28. It was synchronized across computers. Then you move the file 1.txt inside the Sync folder and this file has timestamp May 26. If we compare only timestamps, May 26 will be overwritten by May 28 file from other computer. And this is not what you would expect.

Link to comment
Share on other sites

Kos,

You need to use the created time of the file.

On Windows it's easy; the filenodes have a created time.

On Unix it's not so easy. (ctime is not the created time)

You have to save a created time in your data. To initially find this time you use the last time you scanned (1/1/1970 for never scanned before) and the current time to decide the time range it may have been created over. If the modified time is in this range use it. If the modified time is before the range you should use the time of the start of the range. If the modified time is in the future you should obviously use "now".

It would be nice if you could store the created time in some standard format in the extended attributes of a file (eg on ext4) I expect samba would be something it'd be good to be compatible with.

What would be really nice is to take advantage of the fact that ext4 actually does have a file created time.

NB: Just to be clear, the time you use as the "sync conflict resolution time" for this file is the later of the created and modified times that are before 'now'.

edit: Oops on the NB

Link to comment
Share on other sites

Assume you have file 1.txt with timestamp of May 28. It was synchronized across computers. Then you move the file 1.txt inside the Sync folder and this file has timestamp May 26. If we compare only timestamps, May 26 will be overwritten by May 28 file from other computer. And this is not what you would expect.

Sorry for buttin in:

This is a true conflict. In that case both files should survive. The new and older file should be synced with a comment (e.g. "1 (conflicting version from node xyz).txt".

Link to comment
Share on other sites

In that case both files should survive.

Just for the record, I agree! Any timestamp is a very poor way of deciding which file is the one to survive. They can all be manually set to anything you wish and so confuse the program. IMO the timestamps are only useful to help distinguish changed files, not to decide which is better.

But your "Conflicting version from Node X" won't work. For stability within the network you need both nodes to make the same change to the filesystem. So in this case the timestamps can be used to decide which file keeps it's original name.

Link to comment
Share on other sites

Hey kos13,

Thanks for the reply. I understand your usecase but as said by rdebath and hawibtsync I also believe this should be a conflict when a file is replaced by one that has a older timestamp, this is probably a different/older file.

The original use case remains. Host partially synced by another method (unison/rsync/dropbox whatever) will fail as in people's logic they will always share the host that had the most recent copies of his files not the other way around.

Hope this help

LeTic

Link to comment
Share on other sites

Sorry for buttin in:

This is a true conflict. In that case both files should survive. The new and older file should be synced with a comment (e.g. "1 (conflicting version from node xyz).txt".

Unfortunately this is not a conflict. You have two files with different timestamps, so Sync needs to choose oldest one and overwrite with newest one. Of course this will work only if we work with timestamps only. Then you would like to detect this as conflict, you will add something like - file was synchronized at 10:50 May 29 and if timestamp of the file is older than that - this is conflict. But next you need to keep in mind that you might have different peers, different synchronization time for each peer and each file.

Pure conflict is when two files have the same timestamp but are different.

Link to comment
Share on other sites

Unfortunately this is not a conflict. You have two files with different timestamps, so Sync needs to choose oldest one and overwrite with newest one.

Aha, if one node copies an older file over an existing file and the timestamp of that file is older than the timestamp of the last completed syncronization: This is no conflict?

OMG. Seems that I need to remove BTSync from all my machines. We have lots of use cases where old files are put into the game again. Now I do understand lots of the problems I've seen in the past ...

In addition: BTSync itself behaves different in some situations. If you copy a file over an existing file and this file has the same content you don't copy this file, you don't even attach the new timestamp the the existing file on other nodes. So you produce files with different timestamps. I can't imaging what problems will arise from that ...

Link to comment
Share on other sites

Aha, if one node copies an older file over an existing file and the timestamp of that file is older than the timestamp of the last completed syncronization: This is no conflict?

Of course this is not a conflict. You just replaced file with older version that needs to be propagated across all devices. If files are different and have the same timestamp - then this is conflict, since you don't know which version you want to keep.

OMG. Seems that I need to remove BTSync from all my machines. We have lots of use cases where old files are put into the game again. Now I do understand lots of the problems I've seen in the past ...

There were bugs and we are fixing them, if you have any specific use case please report it to us.

In addition: BTSync itself behaves different in some situations. If you copy a file over an existing file and this file has the same content you don't copy this file, you don't even attach the new timestamp the the existing file on other nodes. So you produce files with different timestamps. I can't imaging what problems will arise from that ...

Copy operation means, that first you remove the original file, then you create a new one with the same content. It will have a different timestamp that is set by OS (nothing to do with Sync) and Sync will detect that content is similar just changed timestamp, so this information will be propagated to all nodes. I don't see any problem here, or I just don't understand what you are tying to say.

Link to comment
Share on other sites

It will have a different timestamp that is set by OS (nothing to do with Sync) and Sync will detect that content is similar just changed timestamp, so this information will be propagated to all nodes. I don't see any problem here, or I just don't understand what you are tying to say.

So does this mean that I can accidentally overwrite all other peers if my local clock is set in the future? Or similarly, I can get all my data overwritten if my local clock is set in the past (which can happen after a CMOS reset for example)?

Link to comment
Share on other sites

It is much better if you have local clock synchronized, which is an actual right way. Sync will deal with the case if clock is not synchronized between computers, but this will complicate things a little bit.

Answering your question: Sync will behave correctly if your clocks are not synchronized, it will detect these cases.

Link to comment
Share on other sites

It is much better if you have local clock synchronized, which is an actual right way. Sync will deal with the case if clock is not synchronized between computers, but this will complicate things a little bit.

Answering your question: Sync will behave correctly if your clocks are not synchronized, it will detect these cases.

That is good to know, thanks!

Yes of course, I wouldn't be keeping my clocks unsynchronized on purpose.

Link to comment
Share on other sites

Copy operation means, that first you remove the original file, then you create a new one with the same content. It will have a different timestamp that is set by OS (nothing to do with Sync) and Sync will detect that content is similar just changed timestamp, so this information will be propagated to all nodes. I don't see any problem here, or I just don't understand what you are tying to say.

Must be my bad english.I do overwrite an existing file.

The new file has the same content as the overwritten file but has a new timestamp. This operation is not propagated to the other nodes. The file on the other nodes is untouched and keeps the old timestamp. The history does not show those files. BTSync behaves as if there's no change at all. I can reproduce that at will on Windows 8.

EDIT: Added a file listing from my Windows 8 (the source) and one of my Linux Slackware machines. All machines use release 134. All files where produced at the same time at the same time. Files with unchanged content are not synced and keep their old timestamp:


C:\Privat\BTSync\Daten\unRAID\Dump>dir
Datenträger in Laufwerk C: ist OS
Volumeseriennummer: DA39-2EBA
Verzeichnis von C:\Privat\BTSync\Daten\unRAID\Dump
29.05.2013 20:20 <DIR> .
29.05.2013 20:20 <DIR> ..
29.05.2013 18:36 12.816 Tower disk01.txt
29.05.2013 18:36 12.719 Tower disk02.txt
29.05.2013 18:36 9.709 Tower disk03.txt
29.05.2013 18:36 13.375 Tower disk04.txt
29.05.2013 18:36 8.838 Tower disk05.txt
29.05.2013 18:36 2.526 Tower disk06.txt
29.05.2013 18:36 2.440 Tower disk07.txt
29.05.2013 18:36 2.072 Tower disk08.txt
29.05.2013 18:36 2.360 Tower disk09.txt
29.05.2013 18:36 1.174 Tower disk10.txt
29.05.2013 18:36 62 Tower disk11.txt
29.05.2013 18:36 62 Tower disk12.txt
29.05.2013 18:36 8.888 Tower disk13.txt
29.05.2013 18:36 7.939 Tower disk14.txt
root@Tower:/mnt/disk1/Privat/BTSync/Daten/unRAID/Dump# ls -lisa
total 281
406474 1 drwxrwxrwx 2 nobody users 1168 2013-05-30 13:45 ./
303584 0 drwxrwxrwx 8 nobody users 376 2013-05-09 10:58 ../
526075 16 -rw-rw-rw- 1 nobody users 12816 2013-05-29 18:36 Tower disk01.txt
487705 16 -rw-rw-rw- 1 nobody users 12719 2013-05-29 18:36 Tower disk02.txt
526076 12 -rw-rw-rw- 1 nobody users 9709 2013-05-29 18:36 Tower disk03.txt
526077 16 -rw-rw-rw- 1 nobody users 13375 2013-05-21 17:38 Tower disk04.txt
526054 12 -rw-rw-rw- 1 nobody users 8838 2013-05-26 15:44 Tower disk05.txt
487732 4 -rw-rw-rw- 1 nobody users 2526 2013-05-20 20:27 Tower disk06.txt
526060 4 -rw-rw-rw- 1 nobody users 2440 2013-05-20 20:27 Tower disk07.txt
526074 4 -rw-rw-rw- 1 nobody users 2072 2013-05-20 20:27 Tower disk08.txt
526045 4 -rw-rw-rw- 1 nobody users 2360 2013-05-20 20:27 Tower disk09.txt
526073 4 -rw-rw-rw- 1 nobody users 1174 2013-05-21 17:38 Tower disk10.txt
526210 4 -rw-rw-rw- 1 nobody users 62 2013-05-20 20:27 Tower disk11.txt
526213 4 -rw-rw-rw- 1 nobody users 62 2013-05-20 20:27 Tower disk12.txt
515392 12 -rwxr-xr-- 1 nobody users 8888 2013-05-29 18:36 Tower disk13.txt*
515029 8 -rwxr-xr-- 1 nobody users 7939 2013-05-29 18:36 Tower disk14.txt*

Link to comment
Share on other sites

Hey Kos,

Thanks for the hindsight into btsync behaviour. I think that the whole subject could fill many threads, maybe a summary table with most common use cases could prevent further incomprehension, but I am sure that you already have your hands full right now.

Regarding the simple use case I used to open this thread, could you answer me simply if you believe this needs fixing and if it will be fixed in a future version ? This is quite a common scenario in our environment and I probably won't continue testing btsync if I know this will stay this way.

Thanks in advance

Cheers

LeTic

Link to comment
Share on other sites

Regarding the simple use case I used to open this thread, could you answer me simply if you believe this needs fixing and if it will be fixed in a future version ? This is quite a common scenario in our environment and I probably won't continue testing btsync if I know this will stay this way.

You constantly add folders with new secret to the Sync in your environment? If so, could you please describe this use case in more details, so we know how you are using Sync. The problem you described comes from the fact that you have added new folder to the sync with the files.

We constantly improve Sync and your use case is recorded as a bug. Unfortunately I can't give you any timeline for the fix.

Link to comment
Share on other sites

If so, could you please describe this use case in more details, so we know how you are using Sync.

Well the use case is simple and should be fairly common for people already using another sync solution.

We have folders synced between host by unison. Some are constantly updated (home folders). So if we are to switch from our current solution, the most logical for people would be to first share their host (the PC they use daily) and the replicated host second (which would have an older version of their home).

In that case btsync will overwrite the most current copies by the old one on the server.

Btsync seems fine for new users, but this bug IMHO makes it impossible for people who are looking to replace an existing solution.

We constantly improve Sync and your use case is recorded as a bug. Unfortunately I can't give you any timeline for the fix.

No problem just wanted to know if you considered it as a bug.

Cheers

LeTic

Link to comment
Share on other sites

  • 1 month later...

Hi guys,

Sorry to revive an old thread. Just to let you know that I tested with the latest release (1.1.48) and the issue is still present with the simple use case scenario describe above.

I thought that 1.1.33 fixed the issue as the Changelog stated : "- Fixed overwrite new files with old ones." but still not.

kos13 do you know if a fix for this issue is still planned ?

Thanks in advance

LeTic

Link to comment
Share on other sites

  • 3 months later...

Hi guys,

Sorry to revive an old thread. Just to let you know that I tested with the latest release (1.1.82) and the issue is still present...

-------- sorry LeTic, it just seemed appropriate to copy it directly :D --------

 

Similar (perhaps same?) problem; very scary bug indeed. It's happening with win8/mavericks. I'm losing files each day by overwrites by older files. Clocks in sync, timezone in sync.. This is something else.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.