Non-stop Indexing - 3 days and counting


mr.canada

Recommended Posts

I'm attempting to setup a sync between two mac servers.  The folder i'm trying to sync is very large - 2.3 TB (yes, TB, not GB).  It contains 215,000 items.  The two macs are running current OS (today it's 10.11.6), and current Mac Server.

They each have full 1gbps fiber internet.

Honestly, is this just too much for Sync to deal with?  It's been "indexing" for 3 days.  If i allow the machines to communicate, then the sync does start to work... i've left it for 24 hours and several gigs has synced, but it's also Indexing at the same time.  I imagine these processes compete for the CPU, so i've paused syncing in hopes of Indexing being able to complete.

 

I wish there was a progress bar, % compete, ETA for the Indexing process as there is for the Sync process.....

 

 

Link to comment
Share on other sites

Open Size column  in Sync UI and see if the initial indexing has gone through - how big the Size is*. It's very likely that you just see periodic rescan in UI.  Increase it in power user settings , option "folder_rescan_interval" . it's in seconds. Put there a few hours, or a day (in seconds!) .

*To compare the size: better not check folder size in File manager, but do that in console command 'du -hs" in the folder. Compare that to what Sync has to see if Sync was able to index everything. 

Link to comment
Share on other sites

  • 2 weeks later...

Thank you Helen - this was helpful.  Revealing the Size column in the UI allows me to see the difference between what is already sync'ed, and what needs to be sync'ed.  It does not reveal the progress of the indexing process, but i suppose that's a chicken-egg dilemma - no way for Sync to know how much data is has not yet indexed until it has indexed it.

I increased the index interval to 1 day.  It no longer indexes endlessly.  But i don't really understand what this process does.... now that indexing only happens once a day, how does Sync know what files to transmit during the day?  does it rely on filesystem change notification from the OS to be informed each time a file is added or modified?  If so, why index every day - why not just do it once and be done with it?

indexing 2.3TB of data takes a long time - maybe an hour or so.  why repeat this process daily?

 

Link to comment
Share on other sites

I've found some more detailed explanations here...

https://help.getsync.com/hc/en-us/articles/205458185-Setting-how-often-Sync-should-check-for-file-changes-

This article points out that OS filesystem notifications are not always perfect...

-------

Different operating systems have limitations for folder-monitoring applications. For example, if a file path is located very deep in a Windows file structure, it will not send change notifications for these deeper levels. To cover such cases, Sync periodically re-scans all folders for changes.

-------

Can you please tell me how reliable Mac OSX 10.11.6 is in this case?  I have a LOT of files, and VERY deep folder paths.  I would love to be able to increase the rescan interval to a very long time - perhaps 1 week.  But I also need Sync to keep up with new files in near realtime.  

 

Link to comment
Share on other sites

Yes, exactly. 

Sync has two ways to know about file update - system notifications or folder rescan. While notifications may fail (e.g. they do not work on some file system, on network shares, etc), Sync will learn about the files itself when doing rescan job periodically 
Sync starts scanning the folder, takes some time doing it (the bigger the folder is, the more nested folders and files are there), the longer it takes, stops scanning. Then waits the rescan_period, and starts new scan. If you increase folder_rescan_interval it will put off the new scan.  Folder rescan also starts when Sync is launched. 
 

Unfortunately, I cannot tell how the notifications will work in your particular case, there is no strict dependancy. On Macs they are more or less reliable. I'd rather just keep an eye on it, and if syncing doesn't start within a reasonable wait-time, decrease rescan.

Link to comment
Share on other sites

OK.. so I see there are some challenges knowing if things changed.. however in my case the folder is a READ-ONLY one. So that means that Sync has no need to scan anything at all. It should just act when it receives updates from other instances.

Surely that change can be worked on separately.

 

Link to comment
Share on other sites

Well, it would know simply because it receives notification from the remote instance of Sync on another computer. If you are syncing with the RO option (and especially with Overwrite enabled) then the only thing Sync needs to watch for is changes reported by the source which would be the other computer. When that other computer's Sync tells this Sync that something change, that's when it writes and downloads.

So when it receives notification from the remote Sync that files X, Y, and Z have had changes, then it can locally scan just those files, without having to routinely scan them. Hence, it would not need to re-scan the entire directory, and it would only need to scan anything when it gets updates from another Sync instance.

Am I missing something still?

Link to comment
Share on other sites

And what if nothing changes on RW peer, but does change on RO? Sync will never know that and on RO peer you will have a mismatching different file? And you don't know that, and when it's time to restore from the bakcup RO folder, you will have there not what you expect. 

So, Sync does and need to rescan the shares on all peers. All shares it has. If you don't want Sync to rescan them, either increase rescan interval to a day, or disable it at all - put 0 (zero) to folder_rescan_interval parameter. 

Link to comment
Share on other sites

I'm not trying to argue but trying to understand better. As far as my situation, increasing the time is good enough for now.

There still seems to be some gap I'm not making here though. If nothing changes on the RW peer, but does change on the RO peer, currently what would happen? Does the change done on the RO peer get essentially "undone" at the next re-scan, assuming no change was done on the RW peer?

 

Link to comment
Share on other sites

11 hours ago, coewar said:

If nothing changes on the RW peer, but does change on the RO peer, currently what would happen?

the file gets invalidated and all syncing for it stops. Even if you edit this file on RW peer later, it won't be synced. To avoid that, keep option "Overwrite any changed files" enabled on RO share. With that, if a file changes on RO, and Sync notices that, it will re-download the copy of the file from RW peer. Here you can read examples of what will happen. Hope it help. And this is one of the reasons why even on RO shares folder rescan is necessary. 

@Ronery,

Yes, integer "folder_rescan_interval":99999. Here are other advanced option that you can put to config 

Link to comment
Share on other sites

  • 7 months later...

Windows 10, Same problem with same shared folder on multiple laptops indexing constantly and eating battery. 

Only the largest of three shared folders affected and I assume size / number of files is the cause.

Increasing folder_rescan_interval from 600 to 1200 and restarting Resilio Sync resolved the issue. 

Thanks to everyone above for the advice

Link to comment
Share on other sites

  • 5 months later...
  • 1 month later...

@Helen - you mention that you can set the folder_rescan_interval to 0 to not have them indexed.

But that is set for all folders/shares, right?  I only have one folder that have a very large number of files, but I have a few other folders that do change more often and should be scanned.

So my question, is it possible to set the scan to 0 for just one folder?  The other folders are important for more timely and accurate syncing, while the large one is more of an archive/backup.

Link to comment
Share on other sites

  • 11 months later...
  • 11 months later...
On 4/25/2017 at 8:20 PM, IvanGill said:

Windows 10, Same problem with same shared folder on multiple laptops indexing constantly and eating battery. 

Only the largest of three shared folders affected and I assume size / number of files is the cause.

Increasing folder_rescan_interval from 600 to 1200 and restarting Resilio Sync resolved the issue. 

Thanks to everyone above for the advice

Thanks for that solution, it worked for me. That was stuck with 1.5 gb files and start after few seconds after this change, 1200 instead of 600 for folder_rescan_interval

Thanks again 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.