AcostaJA

Active Distribuited Data Integrity Protection (Raid And Other Backup Obsolete) *updated*

Recommended Posts

I'd like BtSynk to provide a sort of ZFS like data integrity protection:

 

  • At file block level, keeping the hashes of each file block in a database (as ZFS does).
  • Verfing  each Fileblock against it stored Hash, periodically and/or when acessed,
    • in case a block of data is damaged or altered rebuild it requesting again block from other peers as it where *incomplete*, writing a new file with verified integrity.
  • Also providing (optional) a Peer distribuited Parity Data allowing also Extra Layer of file data recovery for extreme data recovery when offline.
    • Since the Parity is a factor of each N blocks X Redundancy, its required only an extra fraction of Data as on:  BLOCK SIZE / (nBlock to XOR ) X ( Required Redundancy).
    • Parity File lies on the Hash Database to provide its structure and organization, also maybe part of the hash database.
    • Parity File is Calculated on Each Peer, not sent across the network.

My 2cents.

Edited by AcostaJA

Share this post


Link to post
Share on other sites

When the file differs from the stored hash, how would BTSync know whether the file has changed intentionally or been corrupted?

1. A file is corrupted if it hash was modified w/o recorded edit (the file keeps it dates - creation, last edited - unchanged).

2. If a minimal modification is done intentionally by another than the owner, this change is detected by the hashes, allowing the owner to rebuild the file as it original status (this feature works coordinated with the other feature proposed of snapshot time machine like) .

Share this post


Link to post
Share on other sites

1. A file is corrupted if it hash was modified w/o recorded edit (the file keeps it dates - creation, last edited - unchanged).

 

Okay, that might work.

 

If a minimal modification is done intentionally by another than the owner, this change is detected by the hashes, allowing the owner to rebuild the file as it original status (this feature works coordinated with the other feature proposed of snapshot time machine like) . 

 

Sorry, what? If the change was done intentionally by someone else I sync with, I want that change. I thought we were talking about bit rot. And I thought if file corruption was detected, the original should be recovered by resyncing that file, not by restoring an old version of the file from my own version archive (which also only works if the file was changed at least once since the first run).

I already have ZFS and/or traditional backups for that.

Share this post


Link to post
Share on other sites

Okay, that might work.

Sorry, what? If the change was done intentionally by someone else I sync with, I want that change. I thought we were talking about bit rot. And I thought if file corruption was detected, the original should be recovered by resyncing that file, not by restoring an old version of the file from my own version archive (which also only works if the file was changed at least once since the first run).

I already have ZFS and/or traditional backups for that.

Chris you miss the point, an authorized modification on a shared file is not what this feature wants, but an malicious one, or at least warn the owner about that modification and provide a safe path for recovery if wanted, this why must be offered with the other feature proposed (file snapshot version protection).

Congrats for your ZFS storage until Btrfs is released its the best on data protection, sadly common people can't afford a NAS adequate for ZFS (I assume you use raid z2) it cost 2K$ and up I'd implemented by professionals, plus the mess to supervise the server, btw it's the best today, my suggestion tries to provide a bit if that to common people that only has a laptop, a tablet an smartphone and few fortunate a cheap 300$ NAS.

[emoji3][emoji2][emoji1]

Share this post


Link to post
Share on other sites

Sorry, I still don't get it. How would BTSync know whether a change pushed by a peer is "malicious" or "authorized"?

Regarding ZFS: No, it did not cost 2000$. http://www.freenas.org/ + a HP MicroServer for ~200 EUR.

This Is not matter of this feature request.

A zfs optimal implementation requires an box capable to host ECC RAM, plus at least 4 hdd for raid z2 (minimum raid level recommend) also at least 1 gb of ECC ram per TB of available disk space upto 5gb of ECC RAM per TB if you want to use de duplication feature. The cheapest hardware with this characteristics cost 1200$ (core i3 on c602 mb) add this 16gb ECC ram (200$+-) and 4 storage class hdd (wd red or hgst enterprise) this add near 600 $ (4x2 tb).

A non optimal deployment of zfs means an sub espec service.

Share this post


Link to post
Share on other sites

This Is not matter of this feature request.

Then I give up.

I like parts of what I thought you were suggesting (an active countermeasure against silent data corruption), but you just seem to want a cheap ZFS replacement for local files, which has pretty much nothing to do with what BTSync does.

I therefore wouldn't hold high hopes for that request ever to make it into BTSync.

 

The cheapest hardware with this characteristics cost 1200$ (core i3 on c602 mb)

You're wrong there. My aforementioned HP Microserver runs 8 GB of ECC RAM.

Share this post


Link to post
Share on other sites

Then I give up.

I like parts of what I thought you were suggesting (an active countermeasure against silent data corruption), but you just seem to want a cheap ZFS replacement for local files, which has pretty much nothing to do with what BTSync does.

I therefore wouldn't hold high hopes for that request ever to make it into BTSync.

You're wrong there. My aforementioned HP Microserver runs 8 GB of ECC RAM.

That micro server still needs 4 hdd (add =600$ for 4x3tb ) + ecc ram thus reaching =1100, not to mention the cpu not powerful enough for a decent nas4free implementation with encrypted storage or adequate for remote ssl access.

Thanks for your time, however I don't want your share my focus, I'm only providing an idea here, if you judge it as *a cheap zfs it's* OK, I name it *something some people will find useful*.

Share this post


Link to post
Share on other sites

Updated this part on the Original post, Improving readbility and correcting typos to first part, and to general improve the second part of the request.
 

I'd like BtSynk to provide a sort of ZFS like data integrity protection:

  • At file block level, keeping the hashes of each file block in a database (as ZFS does).
  • Verfing each Fileblock against it stored Hash, periodically and/or when acessed,
    • in case a block of data is damaged or altered rebuild it requesting again block from other peers as it where *incomplete*, writing a new file with verified integrity.
  • Also providing (optional) a Peer distribuited Parity Data allowing also Extra Layer of file data recovery for extreme data recovery when offline.
    • Since the Parity is a factor of each N blocks X Redundancy, its required only an extra fraction of Data as on: BLOCK SIZE / (nBlock to XOR ) X ( Required Redundancy).
    • Parity File lies on the Hash Database to provide its structure and organization, also maybe part of the hash database.

    • Parity File is Calculated on Each Peer, not sent across the network.
My 2cents.

 

 
Given on p2p environment Parity Data is required to be complete on each peer, has no sense to split th parity file among peers, while an useful resourse to have such parity on each peer.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.