dreamcat4

New Members

View Profile See their activity

Posts
1
Joined
October 1, 2014
Last visited
October 1, 2014

dreamcat4's Achievements

New User (1/3)

Linux/Unix symlink

dreamcat4 replied to tracius01's topic in Sync General Discussion

OK. I'll give it a shot to explain. Details details. === 1st To begin with, with we may wish to process any symlinks in it's own separate 2nd pass. After all of the existing main part of the sync task was normally completed. That way, if symlinks aren't globally enabled in the settings option, there will be no kind of performance penalties or run-time changes occuring whatsoever. Keeping the existing program code unchanged to avoid introducing any new bugs to the existing (good, and working) software. Another early thing you can do (while still releasing betas) and to help include this feature only in the Advanced Settings tab but always mark underneath it a red text saying that the Symlink feature is currently experimental. This allows some users to begin using it and help testing even if you are not 100% fully done. Because obviously from what you are saying there may be many possible error cases when they point to un-supported filesystems / network shares. And you simply cannot ever fully test and predict to know the resulting behaviour. Those are not-clearly defined situations. Anyway. So let us assume in the btsync program (all platforms) we have coming from main() side some recursion mechanism that examines each new file / modified and then decides what btsync is supposed to do with it. That can eventually be part of the main sync task. Or for initially and beta version, as above ^^ suggested for stability we might initially assume to put all the new symlink work in some separate "after-task". Which is processed as it's own full separate recursion loop at the end of everything else being synced for that 1 bt-sync-folder. Maybe to save doing everything twice. Then perhaps the first main() loop will remember and note the existence of a symlink by adding it to a new and separate list or array to be processed again at the end (uses more memory). Note that at this stage, no symlinks have been followed whatsoever. So no circular-sylminks situation exists yet to for be anything to be worry about. So that is the top-down angle now already we have started to be given some ideas about it. Of course, that is some thing else entirely to actually go into the existing source code and translate all of those existing loops and functions which need to be modified / added to. And that is all towards the main structural component to support our new symlinking task. Drilling from top-downwards. === The 2nd component comes from the reverse side. This is a bottom-up perspective. Let us imagine we must write 1 small function that operates only on 1 individual symlink encountered. So this function takes as it's argument 1 symlink path that we know must reside inside the sync folder path. The purpose of the function is to expand or glob and resolve to all of the paths and files which are represented behind that symlink. This is where each symlink must be processed and resolved by some kinds of dynamic criteria. Either the symlink is a full path (static). Or it is dynamic. We would imagine if the symlink is dynamic and it's path depth is not sufficient to break out of the enclosing sync folder (for example path/to/symlink --> ../../dir. Then we should probably not alter the symlink in any way as it is self-contained. So instead just copy the symlink file verbatim ^1. Or since the file it is pointing to is already synced then make a local copy of that file's / blob / hash id on the target machine^2. If the path resolves outside the enclosing sync folder, then we could just do the same thing anyway. And treat any lack of resulting synced files entirely as user error^3. As is done in many other kinds of situations with posix systems. If the symlink is an absolute path, same again. Does it point to a file that has already been synced? If yes then we may translate the symlink path to another absolute path on the target system^4. Then either the target file / blob / hash is already transferred and it is make as a hard copy. Or the symlink is altered to point to the correct file^5. Or (for full-path symlinks), we could instead just literally interpret all full-path symlinks to resolve to the target file always^6 - that is also useful in some cases and situations on posix compliant operating systems. So we either are clever about it and dynamically determine which symlink files to copy literally with an if-already-synced algorithm ^1-6. Or we always copy all symlinks literally as symlinks (no interpretation whatsoever)^7. Or we always copy all symlinks as files (no interpretation whatsoever)^8. So, if you want global symlinking options. I would suggest to present the user with no more than 3 possible options. Which are 1) "always copy symlinks as symlinks". 2) "automatically resolve symlinks as best as possible". And 3) "always copy symlinks as files". Where option 2) is a combination of ^1 + ^3 + ^6 above. (or if you want to be really fancy then ^4 instead of ^6. However for a WINDOWS sync target then option 2) can't be translated onto its native NTFS filesystem safely, so the sync must result in being 3) instead. If you want to give the user any more finer-grained control than the global application setting. Then I would suggest you present to the user the same 3 options again, but on a per-sync-folder-basis. Anything else (more features than that) may be too complex for users to understand. === 3rd, we have to re-consider mechanisms for how to ensure that the system will not be confused by the different files and not to sync-back and overwrite the symlinks with regular files coming back from the sync target. There are a variety of schemes that may be employed depending entirely upon the mechanism(s) already in use by the existing implementation. For example last sync time stamp might have been used to determine stale-ness. Or the blob / hash may need to be overlaid with a new mask that does not collide (to mark it as a symlink). Or a separate symlink metadata attribute. Since the regular matching algorithm would no longer work for those files. It is hard to know what sort of work would be involved there. But this is also a very important part of the implementation. == 4th, we have to re-consider how to resolve circular symlinks when hard-copying files. The situation only occurs for user setting 2) and 3). How to resolving the list of files contained in the path of a symlink folder that contains other symlink folders within it. Again there are a variety of solutions depending how you wish to deal with the matter. However if your pre-existing file syncing mechanism was well written properly and is blob / hash / based then at least no new or additional should files need to be transferred across the network. Because a circular reference just contains all of the same files that were already synced previously. Best solution is probably just to detect any circulars immediately and just not to copy them at all past depth1. Then there are no duplicate files whatsoever. if the target machine is posix then (first inside the circular) you may just copy the literal symlink file. Else copy nothing at all. However I recommend to instead place a consistent error file. Which may be something like a small template text file that says inside it "This file could not be copied due to a circular dependancy". You may if you wish include supplementary information in the text file for the use to be able to manually trace or reconstruct where the path would have pointed to (because that original folder is already guaranteed to exist by the end of the same task). Again, You only need to put that at the first repeated depth=1 of the cycle. It would then only ever replace a symlink itself. So that is a feature spec document. Assign relevant text labels to these different kinds / categories of features and problems to overcome. The group marked Nrd = {1st, 2nd, 3rd, 4th}. The group marked ^N = { ^1, ^2…. ^8 }. And the group marked N) = { 1), 2), 3) }. All of that together would definitely be as 1 major feature. But if doing some early investigating of the code. Then can first of all be broken down into several such smaller chunks. No harm in that. Probably some interesting technical challenges it would make.
- October 1, 2014
- 29 replies
- - unix
  - linux
  - (and 1 more)
    Tagged with:
    
    unix
    
    linux
    
    symlink

Sign In

dreamcat4

Posts

Joined

Last visited

dreamcat4's Achievements

New User (1/3)

Linux/Unix symlink

Browse

Activity