Skip to content

ssh support and a second round of abstract file operations#2

Open
richfitz wants to merge 30 commits intoben-gready:masterfrom
richfitz:ssh
Open

ssh support and a second round of abstract file operations#2
richfitz wants to merge 30 commits intoben-gready:masterfrom
richfitz:ssh

Conversation

@richfitz
Copy link

@richfitz richfitz commented May 3, 2018

This does three things:

  1. Introduces a new ssh driver based on changes that I introduced in a set of storr changes Remote file operations and driver richfitz/storr#74 - the testing is a bit of a pain because the easiest and most secure way I can think of involves docker. This introduces a dependency on https://github.com/ropensci/ssh, which is not yet on CRAN
  2. Blindly changes your s3 code over to the new framework. This reduces the amount of code considerably! But I have not run it. There's lots of comments throughout on things that I think you especially need to check
  3. Adds travis for automated testing. You'll need to set this up on your own account too and then update the README to point at your account not mine
  4. Adds a licence - see dc0a7ba for further comments on this

There's a comment richfitz/storr#72 (comment) that describes a process for resolving all the live work in a sensible way

@richfitz
Copy link
Author

richfitz commented May 3, 2018

Tagging you here @ben-gready mostly because I don't tend to get notified about PRs on github...

@richfitz
Copy link
Author

richfitz commented May 3, 2018

Wahay - finally passing - see https://travis-ci.org/richfitz/storr.remote and https://codecov.io/gh/richfitz/storr.remote/branch/ssh

I'm offline from about now through until Tuesday morning

@@ -0,0 +1,2 @@
YEAR: 2018
COPYRIGHT HOLDER: Ben Gready
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this and the license. It's something I've basically not thought of much before. If it's acceptable to add more than one person, happy for you to be on here as well.

## really get what is going on. It's quite likely that what you
## really want is
##
## unique(basename(files))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct, but need to test - I didn't know about the function basename!

prefix = path_root,
max = Inf)
keys <- files_table[files_table$Size > 0,]$Key
## TODO: this should really be stripped off the front rather
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that sounds a lot safer!

},

destroy = function() {
## TODO: not sure if this is right?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. Should delete the entire store from the root. Will test when I get it working with PR#74 on storr.


exists = function(path) {
path_remote <- file.path(self$root, path)
## QUESTION: What messages are produced by this?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the object does not exist, it messages the following Client error: (404) Not Found. There might be a more elegant way to test for file existence using list files etc.

exists_dir = function(path) {
## TODO: If it is possible to test if the remote thing is a
## directory (e.g., one of the bits of head_object?) that would
## be nice to get here too...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to do some logic with list files to achieve this - head_object doesn't seem to show anything along those lines. The file structure in S3 seems to be more convention than anything rigid - I can have an object named foo and another object named foo/bar (and maybe even foo/ - I'm not sure). Makes things fairly annoying!

## but if aws.s3 can delete multiple objects (the docs suggest
## it can) you can probably do better. This function must
## return a logical vector the same length as "path"
## indicating if deletion actually happened.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a good way to have the functionality as a stop gap - I will try to take a look at this when I get a chance.

## TODO: I've copied the contents of your previous function
## here, but stripped it down to ony take the if_dir ==
## "del_only_key" path because that was what looked like what
## was implemented. I have not tested this!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I still need to test though!

# storr.remote
Interface for plugging in remote storage (e.g. AWS S3) to [storr](https://github.com/richfitz/storr)

[![Travis-CI Build Status](https://travis-ci.org/richfitz/storr.remote.svg?branch=master)](https://travis-ci.org/richfitz/storr.remote)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the travis stuff - I have a lot to learn about all this stuff!

Copy link
Owner

@ben-gready ben-gready left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for adding all of this. It looks great, and I think it's a massive improvement to get the generic remote_driver into storr. I'm having trouble getting this up and running for testing (still learning how to use github for PRs from other repositories), so I will come back to it probably on the weekend to try and get it setup so I'm using this branch + #74 from storr. Once I have it running, I will test using the S3 bucket I have access to, and make any changes needed to get it all working.

As far as testing the S3 driver, think we may be able to use a similar approach to how you've gone about ssh, using something like this docker image.

I'm really pressed for time these days, but I will try to get the s3 stuff to the point that it's working in the next few days, and then we can add testing etc after that.

Thanks again for all the help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants