-
Notifications
You must be signed in to change notification settings - Fork 14
Description
This is "inspired" by the problem originally reported in #438 (comment) with a proposed fix (closed without merge) in #451 to just let tar ignore disappearing files.
Although we still do not know underlying trigger (lingering cleanup process or alike), this specific behavior reminded that in many cases we would like to provide a path to some location (on remote resource) which pipelines could use as a scratch space.
In https://github.com/ReproNim/reproman/pull/438/files#diff-5b4aa18b79cf44a38ba925fff658fd8cR129 I just added that work/ directory to .gitignore. And that probably (will try next) should theoretically be sufficient if I use datalad-pair orchestrator which should datalad save remotely and use datalad update to fetch results.
In case of datalad-pair-run, the content is first tar'ed on remote side (hence that original "inspirational" issue of files disappearing in a work/ directory) including the not-so-needed work dir, which might be huge, so we should allow for that to be avoided.
The easiest way is to specify some work directory outside of the dataset which gets datalad saved/transferred. But ideally
- it cannot be a fixed name
- it should be allowed to be not job specific (e.g. if I am to rerun some failed computation, would not be reused across jobs),
- it should be allowed to be job specific (to avoid any side effects).
so I guess we should
- allow to define variables per each resource (e.g. I could assign to
smaugscratchdir = /mnt/btrfs/scrap/tmp) - expose those,
jobid,datalad_dataset_id(ifdatalad-pair*) variables so they could be used to format the command to be executed. So I would specify-w {scratchdir}/jobidfor the case avoiding side-effects, and smth like-w {scratchdir}/myanal` if I want it to be shared .
Also relates to #467 ("cleanup") on what to do with such directories upon success/failure.