tobeychris

Linux - Poor Handling of Large Number of Files

Recommended Posts

Hi,

I started using BTSync as soon as it came out publically at home and I love it. I use it to sync my music, documents, etc. between multiple computers at home and remotely so that my main system can act as a backup server for all of the others. Some of the folders are ~300GB in 150,000 files, all Windows machines.

Now on to my problem:

After loving the product for personal use, I wanted to use it at work for a new project. We just bought 5 new Haswell servers and we wanted their configurations to all be the same.

-Each has an SSD with the OS mirrored across all of them and everything with that is fine.

-They also each have a 3TB HDD (WD RED) to store our compile tools.

-All on the same gigabit switch.

-iptables firewall disabled.

I wanted to use BTSync to keep a folder (/usr/tools/) the same across all servers. Ideally, if new tools are added they would instantly sync to all the other servers, and this was supposed to cut down on my setup time.

I have btsync installed on all systems and can view the webpage to manage them. When I go to add the folder (/usr/tools/) there is significant lag before the website registers that it is complete. I let this system finish indexing all the files (239GB in 2,400,000 files) which took over 8 hours (!!!!). I could not believe that it took 8 hours using 100% of one of the CPUs. When I added the second system to the swarm, it was immediately found and added, but the transfer rate was pathetic. It did about 2GB in an hour. I added the other three systems to see if that would help, but it only made things slower.

I ended up copying the folders over manually and letting them all index before connecting them back to the LAN. Four of the five servers now report that they are in sync, but one is fully indexed yet still things it needs all 239GB from the other servers (but doesn't transfer anything).

With only the four that are in sync online, btsync is still using 100% of one of the cores (presumably trying to constantly check for changes?)

So my questions are:

-Is it expected that indexing 2.4 million files will take a very very long time?

-Is there anyway to let it use more than one core for indexing?

-What could be preventing one of the servers from syncing if they are all configured exactly the same?

-Would smaller folders be handled better?

-Why is the CPU usage so high?

Share this post


Link to post
Share on other sites

I don't have a solution, but I am experiencing similar issues with a large folder. There is 391gb in ~1700 files for this folder. I've tried adding it several times and it either indexes and then does not sync to other devices completely, or doesn't even index properly (for example it has only indexed 133.9 GB in 350 files this time.

I will follow this thread with interest.

Share this post


Link to post
Share on other sites

I can confirm really poor transfer rates with my Linux NAS (QNAP). The CPU war a little more than idle all the time, but the speed was less then 100 kb most of the time (Gigabit LAN).

I'm talking about 150 GB of data / 30.000 files / 900 folders. For the first full sync I had to wait 26 hours and was like... wtf?

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.