Practical File Count Limitations - 100M Files


quantkiwi

Recommended Posts

IIRC sync uses 1KB per file in RAM.

 

It's actually a little lower than that - around 300-400 bytes per file

 

Is there a practical file limitation?

There is no "enforced" limitation on the number of files you can sync, however, it will be affected by factors such as free disc space, available memory, etc.

 

I have 100M files to sync between servers over the internet ... How long do you think a sync like this would take (assuming a solid internet connection)

It's impossible to say, especially without knowing things like the size of files - as 100 million 1KB files would take a different total time to sync than 100 million 100MB files(!). Generally speaking though, it's quicker to sync when the two devices are on the same LAN, rather than across the internet, so you may achieve better sync speeds if you setup a VPN between your two servers so that they appear on the same physical network.

Link to comment
Share on other sites

Thank you for your quick replies -- 

 

The files are roughly 1kb-1mb, average is 10kb. 300 bytes per file is fine as long as it doesn't need to hold them all in memory at once? Does it should move through the list of files one by one? 300 bytes * 100M files is 30GB of ram needed :\

 

I don't mind if the initial sync takes a few days, but I add to the "master" server every day with 200mb of files, which I hopefully want sync'd to all the slaves within 2-3 hours.

 

Thank you for your help,

Best

Edited by quantkiwi
Link to comment
Share on other sites

@quantkiwi, @GreatMarko

I'm afraid it is closer to 1Kb per file in latest versions. So, it would be approx 100Gb consumed to keep the tree of all files in memory. Sync will do its best to keep these files in Sync, although I can see couple of obstacles which will won't let you get desired 2-3 hours sync delay to all devices:

 

1. Memory. Either you need to get really huge volume of memory, or set up humongous swap file. If swap won't get to the SSD drive - it will be really slow.

2. Sync gets info about changed files in 2 ways: notifications from OS and via folder rescan.

 - Notifications from OS are unreliable way. They have a number of limitations (different for different OS) - like, Windows won't send you notifications for the very deep folders, while Linux may miss it if you got large amount of subfolders.

 - Rescan heavily depends on HDD speed. While rescanning Sync only reads file names, mtime and size, though this is a random access and it loads a lot classic HDDs (while is still pretty fast for SSD drives). And you can't make rescan period lesser than at least time of rescan, of course.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.