BTSync archive workflow question


alexmeyer

Recommended Posts

Hey, love BTSync, and I've been using it for a bit. Trying to figure out some workflow things, and I thought i'd throw this out there and see if anyone else has some different perspectives.

 

My setup:

My Macbook Pro with a 750GB hard drive that's my main computer, I do lots of video editing projects, photography, etc. Lots of data being created all the time. And I'm always running short of space on there.

And then I've got a couple BTSync nodes (our home 6TB NAS server, an old tower with a 1TB hd sitting at work, etc.) syncing those folders read-only from my laptop.

 

The main thing I want to be able to do with BTSync is keep everything on my laptop continually backed up (which it does wonderfully), but then when I go to archive old projects to make room on my laptop hard drive, I'd love it if the nodes kept the old files and didn't delete anything. So basically using the nodes as a big distributed archival system. I know if I delete stuff from my laptop, the nodes will move the stuff into the .SyncArchive folders, but only for 30 days, and then I'd have to move stuff from that .SyncArchive folder to a safe unsynced folder, repeating for each node, etc.

 

Anyway, just wondering if anyone else is doing anything similar. Basically a one-way sync as a backup, but not syncing deletions. Kinda. Or some alternative to that. :P

 

Thanks!

Alex

Link to comment
Share on other sites

I know if I delete stuff from my laptop, the nodes will move the stuff into the .SyncArchive folders, but only for 30 days, and then I'd have to move stuff from that .SyncArchive folder to a safe unsynced folder, repeating for each node, etc.

 

You can increase the advanced "sync_trash_ttl" setting from the default "30" to a higher value to keep files in your .SyncArchive folders for longer than 30 days!

Link to comment
Share on other sites

IMHO BTsync is not meant to replace archival backups. The .SyncArchive does not indicate when a file was moved there or any versioning of a file, apart from this one copy.

I would recommend either using an additional tool like rsync for archiving the data or ZFS to take snapshots, allowing you to backup and retrieve the data at the specified backup points. 

Link to comment
Share on other sites

IMHO BTsync is not meant to replace archival backups. The .SyncArchive does not indicate when a file was moved there or any versioning of a file, apart from this one copy.

I would recommend either using an additional tool like rsync for archiving the data or ZFS to take snapshots, allowing you to backup and retrieve the data at the specified backup points. 

 

I completely agree that BitTorrent Sync isn't primarily designed to be an archiving solution, however, just to pick up on the point about versioning - Sync does have some support for "versioning" of files within .SyncArchive. i.e. a file named "myfile.txt" that's deleted will be moved to .SyncArchive. If another file with the same name is added to the sync folder and subsequently deleted, .SyncArchive will then contain "myfile.txt" and "myfile.1.txt".... the next deletion of "myfile.txt" will add "myfile.2.txt" to .SyncArchive, and so forth.

 

So again, whilst Sync's primary purpose isn't as a complete archiving solution, it does have some, limited, support for "versioning"

Link to comment
Share on other sites

Hey guys, thanks for the replies.

 

Marko: I know that I can change the 30 day limit, it's more just that the .SyncArchive folder isn't where I'd want the long-term archival of stuff to be. The legit archive stuff would also be mixed in with stuff I've actually intentionally deleted. Just not the ideal setup.

 

nils: I actually don't need any versioning or anything fancy. Basically, when I'm done with a video project (mostly promo videos, etc.), I don't really need the project anymore, but I want to archive it so if I need something in the future, I still have it backed up. Using the nodes as a virtual mirrored raid, basically.

 

 

Hmm, just tested, and it looks like BTSync handles file moving (cut/paste) operations nicely. I might just use an "archive" folder within my parent sync folder, move old projects into that folder to consolidate the old ones that I want to archive, and then backup and remove that folder on each node. And once those are backed up, then I can remove the stuff from my laptop hard drive.

I'm really just trying to figure out the most efficient workflow for all of this. :P

Link to comment
Share on other sites

  • 2 weeks later...
Hmm, just tested, and it looks like BTSync handles file moving (cut/paste) operations nicely. I might just use an "archive" folder within my parent sync folder, move old projects into that folder to consolidate the old ones that I want to archive, and then backup and remove that folder on each node. And once those are backed up, then I can remove the stuff from my laptop hard drive.

I'm really just trying to figure out the most efficient workflow for all of this. :P

 

I'm really interested into this archiving solution, the best would be an official solution.

 

If not, then I think the following scenario would be excellent:

have a daemon on the device where the btsync is running.

The daemon would treat the "/btsync/archive" directory special, and would operate like this:

move a file outside of /btsync scope, ie. /archive, then create an identically named file inside

the btsync/archive directory appending something in the end (.archive for example), and

put relevant info inside:

* original filename,

* original file's date

* checksum of the original file

* size

* time of archiving

* some metadata, tags (future development, useful for searching),

* comment field, where the user can add some useful texts, not affecting the operation at all.

 

Example:

/btsync/archive/family_video_april/mts001.avi -> /archive/family_video_april/mts001.avi

-> /btsync/archive/family_video_april/mts001.avi.archive

 

 

And one could restore the original file remotely, by renaming the archive file to restore. So the

daemon would restore the file.

Example:

mv /btsync/archive/family_video_april/mts001.avi.archive /btsync/archive/family_video_april/mts001.avi.restore

 

It would resolve the diskspace issue, while leaving you directory structure intact.

 

One problem also needs to be solved, namely the problem of small files.

Like a .git repository, where many files only a few bytes, so archiving would be too slow.

In that case in the archive directory it should compress the many files into a single file, then archive that one.

Or some other solution.

 

But I would LOVE an official solution. The above method could be implemented inside btsync easily.

 

Best,

 arcol

Link to comment
Share on other sites

Instead of cleaning your laptop of old projects, why not periodically create archives of old projects on one of the servers (outside the btsync folder), then cleanup the btsync, which will then sync back to the laptop and 'clean' it too?

 

For example, create an "To Archive" in your btsync tree, then on the server you can do something like (I'm assuming it's a *nix based server):

#!/bin/bashBTSYNC_ROOT='/btsync/To Archive'ARCHIVE_ROOT='/archive'set -eset -ucd "$BTSYNC_ROOT"for fname in * ; do   [[ ! -d "$fname" }} && continue   # skip anything that isn't a directory   archive_file="${ARCHIVE_ROOT}/${fname}.tar.gz"   tar czpf "$archive_file" "$fname" && rm -Rf "$fname"   echo "Archived $fname to $archive_file"done

*** UNTESTED CODE; USE AT OWN RISK ***

Link to comment
Share on other sites

@arcol: while that's an interesting idea, I honestly don't need anything that complex. My main reason for posting this question was to see if anyone else had a similar setup/workflow/etc., and if so, to see how they handle it. I'm thinking that an integrated solution with that level of complexity is probably out of btsync's development scope.

 

 

@fukawi2: that's a really good idea, and would probably work great if my setup were slightly different.

I have mixed OSes for nodes (linux, windows, etc.), which isn't actually that big of a deal. It'd be easy enough to script something similar for each platform.

The main problem with that approach though (for me and my setup at least), is that my syncs are one-way read-only syncs from my laptop to the nodes. Since my laptop is my main workhorse machine and the nodes are basically just a large mirrored backup, it's one-way. I wouldn't ever risk doing a two-way sync with the critical project data on my laptop. Don't need it for that kind of use, and way too much room for major issues. I can just imagine something going funky and deleting/corrupting/messing up all my projects on my laptop. :P

So everything else up to that point would work, it just wouldn't sync back the actual deletion of the old files.

Otherwise, like I said, it's a good idea. :)

 

 

Thanks everybody for the replies so far!

-Alex

Link to comment
Share on other sites

FWIW, I did a similar setup (backing up a media collection to a certain popular low-power device with an external hard drive, using a read-only secret). Unfortunately I didn't even think about the fact that the external hard drive was significantly smaller than the collection, and despite the backup device having a "read only" secret, it still somehow managed to blat a significant amount of my media collection on the other devices once the disk was filled. The read-only stuff is still buggy IMHO.

 

Fortunately I found a previous manual backup and didn't loose much in the end.

Link to comment
Share on other sites

  • 3 months later...

The answer for this problem is called 'hard links'. Hard links allow you to create snapshots of a filesystem at any given time. They are usually used for incremental backups.

Hardlinks are present by default. There is allways at least one hardlink to any file. Imagine the filename beeing that hardlink. If the user requests a hardlínk delete, the system checks, whether there are still other hardlinks pointing to the file and deletes the file only if not, otherwise just deletes the hardlink.

 

So you would sync files into a folder on your server and periodically copy all the hardlinks to another one.

 

e.g 

#servers home folder(archive)

/home/alexmeyeer

 

#btsync folder

/home/btsync/alexmeyer

 

Periiodically just do

$ cp -rlp /home/btsync/alexmeyer /home/alexmeyeer

 

The 'l' switch tells the command not to copy the actual file, but to create a hardlink instead. The amount of space taken by this copy operation is negligible and the advantage is, that your homefolder always contains the truth -

all the files you ever head. It is actually a little improvement of @fukawi's workflow, which has the disadvantage that files are sometimes in the sync folder and sometimes in the archive. This makes it tricky 

to build some indexing databases for images/music with programms like picassa or banshee on the server.

 

A more advanced workflow would be using the rsync command for that, which has also the option for creating hardlinks. I don't know exactly but it should then detect hardlink renames (moves) in the sync folder. There is also an automation daemon for that, called 'lsyncd' which will fire up rsync on changes in any specified folder. 

 

 

I am planing to use exakt the same scenario, where the lsynd daemon is watching the sync folder for changes and backups it to my home folder with rsync by creating hardlinks. I will report how this turned out

Edited by dre
Link to comment
Share on other sites

I've done something similar:

 

Laptop has a folder called BTSync

Desktop also has BTSync, and another called BTArchive

Other storage has BTSync (ro) and BTArchive (ro)

 

BTSync is 'live', and everything is duplicated from it to everywhere else. When I've finished with something in BTSync but still want to keep it I move it on my Desktop to BTArchive, where it is still backed up, but no longer filling up my laptop.

Link to comment
Share on other sites

I checked the rsync man page and the command would be

$ rsync -av  --link-dest=../btsync/alexmeyer /home/btsync/alexmeyer/ /home/alexmeyer/

 

The --link-dest option is path relative to the destination dir, which tells rsync to hardlink all the files from source dir againts it.

As in this case, source and link-dest are the same, all the files will be hard-linked instead of copied.

 

But there is a drawback. Rsync has no way to detect file moves. So if you change the name in the source dir, you allways end up

wirth an another hardlink in the archive. Even the --fuzzy option is to no help.

 

I actually reconsidered my syncing scenario and decided to go with git instead of BTSync. You would wonder how many features does git acually have, you just not know after using it for years. What you cannot achive with git is a completely automated workwlow, but it gives you the ultimate configuration power for your special workflow.

 

The workflow for this scenario would be:

 

1. Add files to local git repo.

2. Commit and Push to the server.

3. Delete Files (when not needed locally)

4. Add them to your local ignore (or just never commit  the deletes)

http://stackoverflow.com/questions/1753070/git-ignore-files-only-locally

 

There is also the option to delete from the git index completely, but keep the files locally

with $git rm --cached

 

And you actually will need to clear the git history from time to time, in order to get the files wiped out completely

from you local/remote machine because git keeps the hardlinks to them. 

 

To wipe out git history up to a specifc commit:

$ git rev-parse --verify [commit_hash]  >> .git/info/grafts

$ git filter-branch --

Edited by dre
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.