SQLite WAL files gouging disk on peers


jpap

Recommended Posts

In ~/Library/Application Support/Resilio Sync on macOS I can see the SQLite databases being used to track shared folders.

The WAL files are massive!  In my case, for a share of two folders (having 1,654,405 files in many sub-folders) the entire folder is 2.86GB on each peer.  About 1.1GB of that are wasted in WAL files while Resilio Sync is idle.

I can appreciate the need for using SQLite to track files, but surely these WAL files can be periodically cleaned whilst Sync is idle?  Or at least an effort to limit the maximum WAL size.

Link to comment
Share on other sites

  • 2 weeks later...

I can confirm the problem - on a bigger scale. Debian system:

  • Linux XXXX 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15) x86_64 GNU/Linux,
  • System is bare metal Intel(R) Xeon(R) CPU E3-1220L V2 on a Supermicro server board with 16G ECC RAM and low temp (cpu and disks)
  • Resilio 2.5.9-1 (via deb package) on a paid License (should be home-pro)
  • ~15 Folders between 5 nodes
  • most files per share >100k
  • largest folders ~5T (~50k files)

Database cache (db-wal) grew >25G for one folder (with other growing >1G, even >10G at the same time). Finally even filling up the SSD holding the resilio database completely. The SSD itself is a Samsung Pro series with good smart status and a clean log history.

Then (might be "then", but could also have happed earlier, because I didn´t notice it before end of day) rslsync crashed (multiple times):

  • [Mo Nov 20 11:45:24 2017] rslsync[9663]: segfault at 14 ip 0000564e786e8423 sp 00007ff0875f5470 error 4 in rslsync[564e784b8000+851000]
  • [Mo Nov 20 11:50:30 2017] rslsync[11596]: segfault at 14 ip 000055df3fa37423 sp 00007f624d94a470 error 4 in rslsync[55df3f807000+851000]
  • [Mo Nov 20 11:56:41 2017] rslsync[11644]: segfault at 14 ip 00005572cbf4a423 sp 00007fbadee2c470 error 4 in rslsync[5572cbd1a000+851000]
  • [Mo Nov 20 11:59:41 2017] rslsync[11706]: segfault at 14 ip 000055c2d4587423 sp 00007ff1160f1470 error 4 in rslsync[55c2d4357000+851000]
  • [Mo Nov 20 12:02:37 2017] rslsync[11749]: segfault at 14 ip 0000562b2bfac423 sp 00007f573bcb9470 error 4 in rslsync[562b2bd7c000+851000]
  • [Mo Nov 20 12:05:12 2017] rslsync[11783]: segfault at 14 ip 000055f1378e7423 sp 00007f4b87941470 error 4 in rslsync[55f1376b7000+851000]
  • [Mo Nov 20 12:07:46 2017] rslsync[11820]: segfault at 14 ip 0000565304d78423 sp 00007f3ea62ce470 error 4 in rslsync[565304b48000+851000]
  • [Mo Nov 20 22:38:20 2017] rslsync[17483]: segfault at 14 ip 000055dc300d2423 sp 00007fb25e18e470 error 4 in rslsync[55dc2fea2000+851000]

Don´t know, if the disk filled up before the first crash, oder because of the crashes, as the db-wal files timestamps were always close to current time. Multiple crashed might be due to systemd restarting the vanishing rslsync task.

Manually stopping, starting (wait until web gui is reachable) and stopping made the db-wal files shrink to normal sizes (a few M each). After some waiting (~4h), the disk was filled again with the same crashes etc.

I Recently upgraded to 2.5.9 (around 1-2 weeks ago) from 2.5.7. Such has never happened before - I had no resilio crashes ever. The last segfault or serious issue I had using this product happed during early btsync times.

I have now moved the database to a larger storage to see, if the problem persists / reappeares. Around 1h of running without problems as of yet. But I would really prefer to move the DB back to my SSD (>50G free space should really be enough for a combined database size of less than 2G, right?).

 

Hope you can nail the problem down for the 2.5.10...

Link to comment
Share on other sites

Update.

Disk space limitations were definitely not the reason for those segfaults. With my new setup and and more than enough disk space, segfaulting+restarting continues.

Maybe segfaulting is the reason for those db-wals not getting shrinked... can´t really check this one.

I manually restarted after moving the db directory to a large disk about the time of my last post. This is what happened about 8 hours later:

[Di Nov 21 07:21:16 2017] rslsync[18626]: segfault at 14 ip 0000562ae6ad1423 sp 00007f87b1a0f470 error 4 in rslsync[562ae68a1000+851000]
[Di Nov 21 07:31:11 2017] rslsync[23970]: segfault at 14 ip 00005612cd542423 sp 00007f5eeeda4470 error 4 in rslsync[5612cd312000+851000]
[Di Nov 21 07:40:38 2017] rslsync[24056]: segfault at 14 ip 000055d23eed2423 sp 00007f3f3af0a470 error 4 in rslsync[55d23eca2000+851000]
[Di Nov 21 07:48:21 2017] rslsync[24237]: segfault at 14 ip 00005601ed5e4423 sp 00007fc6ba8b6470 error 4 in rslsync[5601ed3b4000+851000]
[Di Nov 21 07:56:53 2017] rslsync[24316]: segfault at 14 ip 0000561ac8966423 sp 00007eff6b8d8470 error 4 in rslsync[561ac8736000+851000]
[Di Nov 21 08:06:09 2017] rslsync[24409]: segfault at 14 ip 0000562e30df8423 sp 00007f90306d3470 error 4 in rslsync[562e30bc8000+851000]
[Di Nov 21 08:17:43 2017] rslsync[24495]: segfault at 14 ip 000055f85f1c3423 sp 00007f59917ee470 error 4 in rslsync[55f85ef93000+851000]
[Di Nov 21 08:26:22 2017] rslsync[24686]: segfault at 14 ip 000055a55fa4b423 sp 00007f4c54c00470 error 4 in rslsync[55a55f81b000+851000]
[Di Nov 21 08:43:32 2017] rslsync[24784]: segfault at 14 ip 000055f953191423 sp 00007fd577e38470 error 4 in rslsync[55f952f61000+851000]
[Di Nov 21 08:49:12 2017] rslsync[24999]: segfault at 14 ip 0000558ecbabe423 sp 00007f7ace74c470 error 4 in rslsync[558ecb88e000+851000]
[Di Nov 21 08:55:01 2017] rslsync[25076]: segfault at 14 ip 00005566d3db5423 sp 00007f512c096470 error 4 in rslsync[5566d3b85000+851000]
[Di Nov 21 09:00:39 2017] rslsync[25158]: segfault at 14 ip 000055fb216f8423 sp 00007f79dcfc4470 error 4 in rslsync[55fb214c8000+851000]
[Di Nov 21 09:06:35 2017] rslsync[25238]: segfault at 14 ip 00005618edf7c423 sp 00007fa294e92470 error 4 in rslsync[5618edd4c000+851000]
[Di Nov 21 09:12:23 2017] rslsync[25317]: segfault at 14 ip 00005565a3010423 sp 00007f2458b9a470 error 4 in rslsync[5565a2de0000+851000]
[Di Nov 21 09:19:46 2017] rslsync[25457]: segfault at 14 ip 0000558f2e686423 sp 00007f7aa8a6c470 error 4 in rslsync[558f2e456000+851000]
[Di Nov 21 09:30:14 2017] rslsync[25542]: segfault at 14 ip 0000556bff872423 sp 00007f3edd744470 error 4 in rslsync[556bff642000+851000]
[Di Nov 21 09:45:19 2017] rslsync[25634]: segfault at 14 ip 00005588d0548423 sp 00007f7f834a7470 error 4 in rslsync[5588d0318000+851000]
[Di Nov 21 10:11:51 2017] rslsync[25834]: segfault at 14 ip 0000563c5b924423 sp 00007fe0b22c0470 error 4 in rslsync[563c5b6f4000+851000]
[Di Nov 21 10:27:50 2017] rslsync[26099]: segfault at 14 ip 0000558ba9fcf423 sp 00007f64e2c7d470 error 4 in rslsync[558ba9d9f000+851000]
[Di Nov 21 10:34:57 2017] rslsync[26229]: segfault at 14 ip 000056383aeb6423 sp 00007eff73b9b470 error 4 in rslsync[56383ac86000+851000]
[Di Nov 21 10:46:25 2017] rslsync[26308]: segfault at 14 ip 0000558ac6bf7423 sp 00007f8d8a968470 error 4 in rslsync[558ac69c7000+851000]
[Di Nov 21 10:56:28 2017] rslsync[26483]: segfault at 14 ip 00005595afbc9423 sp 00007fded8b0b470 error 4 in rslsync[5595af999000+851000]
[Di Nov 21 11:04:41 2017] rslsync[26579]: segfault at 14 ip 000055d1fd9d2423 sp 00007f04bb6c4470 error 4 in rslsync[55d1fd7a2000+851000]
[Di Nov 21 11:14:33 2017] rslsync[26655]: segfault at 14 ip 000055e93d1bc423 sp 00007f2aa3274470 error 4 in rslsync[55e93cf8c000+851000]
[Di Nov 21 11:25:12 2017] rslsync[26833]: segfault at 14 ip 0000557a0fb76423 sp 00007f7a83459470 error 4 in rslsync[557a0f946000+851000]
[Di Nov 21 11:35:27 2017] rslsync[26933]: segfault at 14 ip 000055806aef0423 sp 00007f947b5ff470 error 4 in rslsync[55806acc0000+851000]
[Di Nov 21 11:58:47 2017] rslsync[27031]: segfault at 14 ip 000055a3e0dba423 sp 00007eff2a522470 error 4 in rslsync[55a3e0b8a000+851000]
[Di Nov 21 12:09:42 2017] rslsync[27305]: segfault at 14 ip 0000564f68283423 sp 00007fc45bc27470 error 4 in rslsync[564f68053000+851000]
[Di Nov 21 12:17:54 2017] rslsync[27482]: segfault at 14 ip 0000563fbd963423 sp 00007f7eb21f4470 error 4 in rslsync[563fbd733000+851000]
[Di Nov 21 12:26:29 2017] rslsync[27561]: segfault at 14 ip 0000563fcf988423 sp 00007f8525fdb470 error 4 in rslsync[563fcf758000+851000]
[Di Nov 21 12:34:52 2017] rslsync[27654]: segfault at 14 ip 0000561df8142423 sp 00007ff3c3266470 error 4 in rslsync[561df7f12000+851000]
[Di Nov 21 12:43:35 2017] rslsync[27739]: segfault at 14 ip 0000556081357423 sp 00007fbc821cc470 error 4 in rslsync[556081127000+851000]
[Di Nov 21 12:56:19 2017] rslsync[27900]: segfault at 14 ip 0000556b69439423 sp 00007fb15e4e7470 error 4 in rslsync[556b69209000+851000]
[Di Nov 21 13:05:46 2017] rslsync[28028]: segfault at 14 ip 000055bdf4d45423 sp 00007fe218c7c470 error 4 in rslsync[55bdf4b15000+851000]
[Di Nov 21 13:18:48 2017] rslsync[28118]: segfault at 14 ip 00005594788c3423 sp 00007f05a98a1470 error 4 in rslsync[559478693000+851000]
[Di Nov 21 13:37:02 2017] rslsync[28313]: segfault at 14 ip 0000564c945d9423 sp 00007f6b6616f470 error 4 in rslsync[564c943a9000+851000]
[Di Nov 21 13:57:08 2017] rslsync[28450]: segfault at 14 ip 0000563bcc0b7423 sp 00007fb00398f470 error 4 in rslsync[563bcbe87000+851000]
[Di Nov 21 14:06:44 2017] rslsync[28696]: segfault at 14 ip 0000564d4f68e423 sp 00007fc1fd1f6470 error 4 in rslsync[564d4f45e000+851000]
[Di Nov 21 14:16:51 2017] rslsync[28788]: segfault at 14 ip 000056079a006423 sp 00007f58ae200470 error 4 in rslsync[560799dd6000+851000]
[Di Nov 21 14:26:35 2017] rslsync[28963]: segfault at 14 ip 000055f36cd2b423 sp 00007f219526b470 error 4 in rslsync[55f36cafb000+851000]
[Di Nov 21 14:36:29 2017] rslsync[29059]: segfault at 14 ip 000055b1e97c9423 sp 00007fe2dc944470 error 4 in rslsync[55b1e9599000+851000]
[Di Nov 21 14:50:02 2017] rslsync[29151]: segfault at 14 ip 00005605257a6423 sp 00007f82608c7470 error 4 in rslsync[560525576000+851000]
[Di Nov 21 16:35:26 2017] rslsync[29334]: segfault at 14 ip 00005603c1f65423 sp 00007ff30942c470 error 4 in rslsync[5603c1d35000+851000]
[Di Nov 21 16:47:10 2017] rslsync[30242]: segfault at 14 ip 00005577e6ffe423 sp 00007f2ad2a6b470 error 4 in rslsync[5577e6dce000+851000]
[Di Nov 21 16:55:05 2017] rslsync[30441]: segfault at 14 ip 00005584e9c47423 sp 00007f49087aa470 error 4 in rslsync[5584e9a17000+851000]
[Di Nov 21 17:05:26 2017] rslsync[30516]: segfault at 14 ip 0000562f3061b423 sp 00007efda5120470 error 4 in rslsync[562f303eb000+851000]
[Di Nov 21 17:12:56 2017] rslsync[30611]: segfault at 14 ip 000055bb493e1423 sp 00007f0ace83a470 error 4 in rslsync[55bb491b1000+851000]
[Di Nov 21 17:20:35 2017] rslsync[30760]: segfault at 14 ip 0000560876236423 sp 00007f64e000c470 error 4 in rslsync[560876006000+851000]
[Di Nov 21 17:28:08 2017] rslsync[30850]: segfault at 14 ip 000055be62258423 sp 00007f287e371470 error 4 in rslsync[55be62028000+851000]
[Di Nov 21 17:36:02 2017] rslsync[30931]: segfault at 14 ip 000055a15bde0423 sp 00007f94bf7ed520 error 4 in rslsync[55a15bbb0000+851000]
[Di Nov 21 17:44:16 2017] rslsync[31009]: segfault at 14 ip 000056104e467423 sp 00007f1d1d9fc470 error 4 in rslsync[56104e237000+851000]
[Di Nov 21 17:52:20 2017] rslsync[31173]: segfault at 14 ip 0000555ee7a16423 sp 00007fbe77d93470 error 4 in rslsync[555ee77e6000+851000]
[Di Nov 21 18:01:01 2017] rslsync[31249]: segfault at 14 ip 000055eda2475423 sp 00007fedb96aa470 error 4 in rslsync[55eda2245000+851000]
[Di Nov 21 18:19:24 2017] rslsync[31347]: segfault at 14 ip 000055cebaffa423 sp 00007f0e05b3e470 error 4 in rslsync[55cebadca000+851000]
[Di Nov 21 18:30:30 2017] rslsync[31564]: segfault at 14 ip 00005576a664d423 sp 00007f54543de470 error 4 in rslsync[5576a641d000+851000]
[Di Nov 21 18:39:06 2017] rslsync[31656]: segfault at 14 ip 000055cd7d4e0423 sp 00007f5f146f3470 error 4 in rslsync[55cd7d2b0000+851000]
[Di Nov 21 18:47:45 2017] rslsync[31823]: segfault at 14 ip 00005602cc3bd423 sp 00007fb9ca6bd470 error 4 in rslsync[5602cc18d000+851000]

Looks a little like

do {

  segfault;

  restart by systemd;

} while not (end of world);

 

Regular sync.log shows close to nothing (just a couple of bad shareID, Ignoring "peers" message and the obvious signs of restarts, but no shutdows or errors).

Is is possible to downgrade (via deb archive) from 2.5.9 to 2.5.7, or will this require a full rehash / reinstall?

 

Link to comment
Share on other sites

Hello,

I see the same thing. Does this get resolved? Currently I have to restart resilio at least once a day to get rid of the huge wal files. I just bought a license for resilio, not sure if this is connected. But I definitely expect more from a commercial product.

btw. I am running resilio in an AWS instance on a ubuntu server installation 16.04

Regards

Link to comment
Share on other sites

I opened a support ticket a few days ago and they might have already tackled the problem and found a solution at least for my version of the problem.

If you plan to open your own ticket, you might want to reference #69062 to help them coordinate the support. But from my perspective, it looks like there will be an update / fix very soon(tm).

Which btw would be exacty the expectation I have from a commercial software product: Support and fixes ;-)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.