Metacache organization wastes 90%+ disk space?


Recommended Posts

When my backups ran the night after I started playing with BitTorrent Sync I was shocked to see how large the metacache folder was.

The actual disk usage for my metacache directory is:

$ du -hs metacache/
266M metacache/

A closer look reveals a ton of small files, most of which are smaller then my block size on ext4 (dumpe2fs reports "Block size: 4096"). This means a ton of space is likely wasted.

I wrote a quick awk script to understand the metacache directory (https://gist.github....lemanna/5653926)

$ ./real-size.sh metacache | tail -n23
47 occurrences of file size = 199
48 occurrences of file size = 480
56 occurrences of file size = 360
67 occurrences of file size = 380
77 occurrences of file size = 340
91 occurrences of file size = 320
107 occurrences of file size = 300
116 occurrences of file size = 280
156 occurrences of file size = 260
172 occurrences of file size = 240
242 occurrences of file size = 220
286 occurrences of file size = 200
469 occurrences of file size = 179
644 occurrences of file size = 159
993 occurrences of file size = 138
1917 occurrences of file size = 118
2598 occurrences of file size = 75
5211 occurrences of file size = 98
5222 occurrences of file size = 78
12804 occurrences of file size = 76
Total filesize: 6717161
Total filecount: 66142
Total files with size < 4096: 32436 -> 49.0399%

Out of 266 MB for the entire metacache directory, only 6.41MB (6717161 bytes) is actually storing usable data. About 49% of the files are under the 4096 block size.

If we add in the disk space wasted including directories it grows even more:

$ du -bs metacache
14605231 metacache

That's 13.93MB (14605231 bytes) for directory data and file data, all to store 6.41MB of real data.

Space wasted (1 - 6.41/266 ) is 97.59%

I can only imagine how this will scale as more and more files get added. Are there plans to use a database (sqlite?) of some sort to improve disk utilization as well probably improve performance?

I saw mention of a solution a month ago. Any updates on this?

Other then this, Bittorrent Sync is a very interesting tool. I look forward to it improving. I wish it was open source though...

Link to comment
Share on other sites

My metacache (on linux) is 464MB for 265GB data. That's about 0.17%.

Are you using the latest 1.0.134 version?

I think we're looking at different things. I'm comparing how much disk space is actually used by the metacache (266 MB) with the actual data stored in the files (6.41 MB). This is the overhead of my filesystem (ext4 on Linux) inefficiently storing 66142 files.

My actual file data that is synced is 11GB and I'm running version 1.0.134.

Link to comment
Share on other sites

The metacache is exactly what it's name suggests a cache.

It's actually a cache of torrent files for each file in your share. If the file is needed and it doesn't exist it'll get regenerated; with a modern machine (and spinning disk) and a small file I would expect it to be faster to regenerate the file than it is to fetch the file from disk.

If you have "too many" you can delete them. (I don't know if you should stop BTSync first though).

I trust the next version will delete some of the files itself; especially in the case where the torrent file takes up as much disk space as the original file. This will be especially important if the data is moved to a sqlite DB as it'll be more difficult to purge old files manually.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.