heypete

Possible to force rehash?

Recommended Posts

Hi all,

I'm running BT Sync 2.3.6 (378) on several systems (2 x Win7, 1 x Ubuntu Linux, and 1x Synology DS211+ NAS).

For unknown reason(s) that I'm still diagnosing, I noticed several JPEG files that are kept in sync between these systems suffered what looks like a bit-flip on one system and were no longer recognized as valid JPEG images. However, this issue was limited to only a single system and the corrupted files didn't propagate beyond that one system. I checked the SHA256 hashes of the affected files and they are consistent between all the systems except the affected one, where they differ. PAR2 recovery files confirm the mis-match on the affected system.

I was able to recover the corrupted files and no data was lost (yay!) and I'm still looking into the cause, but had to manually copy the files from the known-good systems to the affected system -- the file name, size, and modified time of the corrupted files on the affected system is identical to the good files on the other systems so when BT Sync does its quick-index to see if files have changed (such as the check it does when opening BT Sync after it's been closed) it doesn't notice the changes and doesn't re-sync the files.

Is there some way to force BT Sync to re-hash all the files in a folder so it can detect changes to files that might otherwise go undetected by the quick-index?

I couldn't find any option or command to do this, and had to add/remove the folder from BT Sync to have it re-index and re-hash all the files in that folder.

Similarly, is there an option to schedule periodic re-hashing (e.g. once a week) to detect any subtle changes that might not change the file name, size, or modified time?

Thanks!

Share this post


Link to post
Share on other sites

@heypete There is no option to re-hash files, so yes, re-adding folder is the only option if you want to re-hash files. Though note, that Sync does re-hashing automatically during transfer.

Although, could you please share a bit more about the corruption itself. Do I understand correctly that only few bits has changed in your JPEG?

Share this post


Link to post
Share on other sites

Hi Roman,

Cool, thanks. I don't mind occasionally re-adding a folder to force a re-hash, but perhaps there could be an option to do so in future versions of BT Sync? It'd be really handy. Additionally, I don't know if it's possible but it'd be nice if there was an option to push a re-hash request out to other peers for a folder so they would all re-hash files too. Obviously this request should only be possible to initiate for those with read/write privileges.

As for the corruption, it's unclear as to the source. It turns out the corruption was present on two systems but I don't know which one was the source. The corrupted files were recently added to the Sync folder in the last few days. The two systems have very little in common: one is a home-built desktop PC with Windows 7 and Western Digital Black hard disks, while the other is a Lenovo ThinkPad (also with Windows 7) with a Crucial SSD. SMART checks on both systems reveal no problems with the disks/SSD, no bad sectors, no CRC errors, etc. The files were originally added to the desktop PC and synced with the ThinkPad, the Ubuntu server, and the NAS. The latter two had the correct files, while only the two Windows systems had the bad files.

To answer your question, yes, only a few bits were changed. The file names, modification times, and sizes of the good and bad files were identical so Sync's quick/no-hash-index didn't detect the changes to the files.

Alas, I didn't save the corrupted files so I can't diff them to see where the issue cropped up in the files. If it happens again (fingers crossed), I'll be sure to do that.

Share this post


Link to post
Share on other sites

From your description it looks like you encountered a data degradation issue. It happens sometimes for pretty much all types of media. Sync does not operates bits, the smallest chunk of data Sync deals with is 32Kbytes and every chunk is hashed, so it is very unlikely done by Sync.

Also, Sync cannot distinguish if file was "rotten" or just changed by some 3rd party application in a way that mtime and size does not change. As a result, Sync decides that this damaged file is newer when it syncs it and starts distributing it everywhere.

Share this post


Link to post
Share on other sites

I agree that that's probably the cause. Fortunately, I had backups of the synced files in other directories, as well as using PAR2 error-correcting files to verify data integrity and allow for reconstruction of the degraded files.

I also understand that Sync has no knowledge of why a file has changed (e.g. due to intentional user action, a third-party program making changes to the file, malicious action, bitrot, etc.) and I don't expect it to know. I simply expect it to propagate any changes to a file to other peers. The fact that Sync was not detecting the changes because the mtime and size didn't change is troubling, hence my request for the ability to force a re-hash of the files (which will detect any such changes) and for user-configurable periodic re-hashing to detect any changes over time that the mtime/size checks might miss.

Also, I found it somewhat difficult to use Sync itself to propagate the good files: simply overwriting the degraded file with the good file didn't update the mtime/size of the file, so Sync didn't notice it changed and didn't push those changes out to other peers.

Similarly, deleting the degraded file and then copying the good file into the folder didn't propagate those changes either: it seems that when the primary system (the one that I used to delete the bad files and add the good replacements) tells the secondary systems (its peers) that the bad file was deleted, the secondary systems move the bad file to their respective folder archives as they are configured to do. When the primary tells them about the presence of the replacement good file (which has the same name/mtime/size as the bad one in the archive), the secondary systems just restore the bad file from their archive rather than re-transfer the file and without re-hashing the file. At the very least, a re-hash of the file should take place in such a situation.

Ultimately, I had to manually remove the bad files from the two affected systems and copy the good files over manually to replace the bad ones.

Share this post


Link to post
Share on other sites

@heypete Re-hashing files is extremely costly procedure. Especially, if we are talking about large volume of files - like starting from 1Gb of data or 100K of files. Therefore Sync only checks mtime and size initially. And even checking these 2 params takes a lot of time during rescans when we are talking about 100K of files and more.

Share this post


Link to post
Share on other sites

Understood. I use Sync to keep numerous (not 100K, but probably around 10-20K) files of varying sizes (from a few KB to a few GB) synchronized and I know that indexing and hashing is a resource-intensive procedure.

Nevertheless, my primary reason for using BT Sync is keeping files in sync, not minimizing resource consumption. Yes, it's nice to keep the amount of resources needed for syncing minimized, but if files are not kept in sync then that's a major problem.

I would imagine that for most use cases checking the size/filename/mtime is sufficient for detecting if a file has changed. However, in the situation that I've experienced, those checks are not sufficient and I'd like to avoid such a situation in the future.

I use methods like PAR2 files to detect and repair corrupt files, but that's not very useful to me if I don't know if files are corrupted because they're not getting synced to all my systems.

I have no problem with Sync using the size/filename/mtime checks as the normal means of checking if files have changed, but it would be nice if:

  1. There was a manual "re-hash this folder" button so users can re-hash a folder with a single-click without needing to remove/add it from Sync. It may make sense to put it behind an "Advanced Options" button or something, but it'd be great to just have the option.
  2. There was an option to push a "re-hash this folder" request to all systems with which one is peered. In my case, I own all the systems I'm peered with and it'd be nice to force an hash-check to ensure that all my systems have the exact same files and are on the same page, so to speak.
  3. There was an option for scheduling periodic re-hashes of a folder at user-defined intervals. Yes, this consumes resources, but one could schedule it for a time of low usage on one's systems. I do the same thing for scrubbing disks in my RAID arrays and on my NAS, so having Sync do something similar would also be good. This is something that is quite common with backup software: they use low-resource means of checking if files have been modified during normal use, but periodically do re-hashing of all the files to ensure any changed files they missed get detected and updated. I understand this not being the default behavior, but it'd be good to have the option.
  4. BT Sync re-hashed a file that's being restored from the folder archive (or at the very least, there's an option to enable such behavior). As I mentioned earlier, this was a problem when I would delete a corrupted file from one of the systems: the deletion would propagate out to the peers, who would put the file in their folder archives. When I added the known-good file with the same size, mtime, and filename, the other peers thought it was the same as the previously-deleted file in their archives so they just restored the corrupted file from the archives rather than transferring the good file.

I'd be happy to contribute some money or other resources (does beer count as a resource?) as a bounty for adding these features.

Thanks!

Share this post


Link to post
Share on other sites

Update: the issue happened again, this time with an MP4 movie. It turns out it's not just a bit or two that got flipped, but the "bad" file was all zeros. Here's the first few lines of the "bad" file from xxd:

00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Compare that with the following lines from the "good" file:

00000000: 0000 0018 6674 7970 6d70 3432 0000 0000  ....ftypmp42....
00000010: 6973 6f6d 6d70 3432 0000 2796 6d6f 6f76  isommp42..'.moov
00000020: 0000 006c 6d76 6864 0000 0000 d32e 6b7d  ...lmvhd......k}
00000030: d32e 6b7d 0000 03e8 0000 52fa 0001 0000  ..k}......R.....
00000040: 0100 0000 0000 0000 0000 0000 0001 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0001 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 4000 0000  ............@...
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0003 0000 0026  ...............&
00000090: 7564 7461 0000 001e a978 797a 0012 15c7  udta.....xyz....
000000a0: 2b30 302e 3030 3030 2b30 3030 2e30 3030  +00.0000+000.000
000000b0: 302f 0000 0077 6d65 7461 0000 0021 6864  0/...wmeta...!hd
000000c0: 6c72 0000 0000 0000 0000 6d64 7461 0000  lr........mdta..
000000d0: 0000 0000 0000 0000 0000 0000 0000 2b6b  ..............+k
000000e0: 6579 7300 0000 0000 0000 0100 0000 1b6d  eys............m

(Note: line 000000a0 had metadata including my GPS-derived position, which I have obfuscated to all zeros in both hex and ASCII for privacy reasons.)

Both files are exactly 45,865,267 bytes long, only the "bad" file is all zeros.

 

I'm afraid I can't reproduce this issue on demand, so I didn't have debug logs enabled, but I do have logs from all four systems involved and will submit them in accordance with the directions at http://sync-help.bittorrent.com/customer/portal/articles/1634349-collecting-debug-logs?b_id=3885

Share this post


Link to post
Share on other sites

@heypete This is very unlikely to be a bitrot. We've got your ticket in our Help Desk and will look into logs shortly. We'll keep you updated in Help Desk and once issue resolved - will post results here in topic.

BTW, thanks a lot for very thorough description of your setup, timeline and collecting logs from all peers. It is really crucial for issue analysis.

Share this post


Link to post
Share on other sites

@whoever-reads-this-topic

The issue seems to only appear when one is using parallel-nested folders (see diagram below) and is totally okay if you use nested folders on one computers that are linked to separate folders on other computer(s). While we are working to get it solved, it is advised not to use parallel-nested folders.

 

Untitled_presentation_-_Google_Slides.png

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.