Abstract file ops by ben-gready · Pull Request #1 · ben-gready/storr.remote

ben-gready · 2018-04-27T07:01:15Z

Functionality seems to be equivalent to master branch now, so I'm putting up a PR for discussion purposes. @richfitz I would be keen to know your thoughts on the approach I have taken in this branch.

…rr.remote into abstract-file-ops

richfitz · 2018-05-02T15:46:41Z

R/driver_rds_remote.R

+    hash_algorithm = NULL,
+    traits = list(accept = "raw"),
+
+    initialize = function(file_ops, path, compress, mangle_key, mangle_key_pad,


I think the file ops should be generated from the whatever handle holds the S3 credentials

I'm not too sure I understand this one. I will add documentation about how the AWS credentials are setup, and then let's discuss some more. See my comments further down about what I was hoping to achieve with file_ops...

I think the way to did this was right in the end!

richfitz · 2018-05-02T15:47:10Z

R/driver_rds_remote.R

+      self$file_ops <- file_ops
+      self$path <- path
+
+      ## This is a bit of complicated dancing around to mantain


If you base this off one of the other drives the backward compatibility may be less problematic

I based this off the rds driver in storr - I wasn't sure how much of the backwards compatibility stuff is needed, so I included it - if it's not needed (which I'm guessing it's not), then I'd be up for removing it.

yeah, I recognise it - backward compatibility is hard 😢

I have an idea to at least centralise this, but it will probably not make it in the first round of code

this is done now in the other prs

richfitz · 2018-05-02T15:48:04Z

R/driver_rds_remote.R

+    set_hash = function(key, namespace, hash) {
+      self$file_ops$create_dir(path = self$name_key("", namespace))
+      self$file_ops$writeLines(text = hash, path = self$name_key(key, namespace))
+      #*** should be making use of (or making an equivalent version of) the


This is something that should be done in the file_ops class I think

My intention in the file ops object was to boil down the fewest set of file operations needed (list directory, read file, write file, delete file, delete directory etc.) that are needed from whichever backend storage system is being used. That way, non-storr related code (specific to a back end) could be entirely encapsulated within that fairly simple object. I was then using those minimal functions to recreate the other driver objects. My thinking is that other backends/storage types just need to replicate the object with a minimal set of file operations and plug it into the driver_rds_remote object. I still don't really know the storr codebase well enough to know if that is a good strategy or not though... Any thoughts?

This is a fabulous strategy - it's absolutely the right way of doing things and something I'd not thought of. But these can still be made atomic as much as possible

Can I just re-iterate how much I like your idea

richfitz · 2018-05-02T15:48:32Z

R/driver_rds_remote.R

+      #file if the write fails)
+    },
+
+    get_object = function(hash) {


It will be worth thinking about caching these - there is no need to ever move a single data object twice

Sounds good, but again, I will hold off on changing internals for now

Yup, I've centralised this into a helper class now

richfitz · 2018-05-02T15:51:07Z

R/driver_rds_remote.R

+      self$file_ops$object_exists(self$name_hash(hash))
+    },
+
+    del_hash = function(key, namespace) {


There's no need to use an actual directory. In the redis driver there is no directory either - it's just a schema over the keys. So there I use (I think) something like data:<hash> and keys:<namespace>:<key> - you could do the same thing here storing things as data_<hash>.rds and keys_namespace_value - though you have to work out what you'll do with names that include an underscore! Alternatively, you could use keys/namespace/value but then base64 encode that and store that as a file on S3

the only place where this gets really nasty is deletion because you lose the ability to remove a directory recursively and you can get race conditions with a list/del pair

I did think about not using an actual directory, but (I think) that would mean an entire S3 bucket would need to be dedicated to each store, and so i leaned in favor of trying to mimic a directory structure. This does mean that multiple storrs can be created in a bucket, and you can browse the file structure using an S3 client. I tried to create delete functions that work, but I'm not sure a good strategy for testing them out.

It'd be great if this could be documented somewhere (the layout on S3)

richfitz · 2018-05-02T15:52:43Z

R/driver_rds_remote.R

+      ## future versions.  I'm writing out a version number here that
+      ## future versions of driver_rds can use to patch, warn or
+      ## change behaviour with older versions of the storr.
+      if (!is_new && !file_ops$object_exists(path = storr:::driver_rds_config_file(path, "version"))) {


You almost certainly want to get all the options into a single object and do that in one read/write operation. I think I do that with the redis or DBI driver?

I'll take a look and try to understand it better myself, but will hold off changing anything internal for now, given your other comments

Hold off on this and I'll see if I can sort something out - this is going to be a headache for all the remote rds storrs

this is fixed on the other prs now

richfitz · 2018-05-02T15:55:34Z

Just to loop you in on the radio silence here; I am trying to work out a minimal set of remote file ops to work over a ssh server. But there are some comments on the storr half of what you have above.

If you'd like I can do a PR to your PR with all the boring style changes I might make

richfitz · 2018-05-02T17:35:18Z

Update: got most of the work done to generalise this done. I'm taking a long weekend, but will get you some code mid next week; perhaps hold off on changing any internals in what you have, but if you could document how to test this (setting up keys etc) that would be awesome. No stress though

ben-gready · 2018-05-03T03:22:48Z

Thanks for the comments and suggestions here. You'll have to excuse my lack of knowledge in places and questions. Sounds great with the PR - thanks! I will make sure to get some documentation in around setting up keys in the next few days. I will also try to find time to get some of your tests from storr working on this I can, although I'm juggling lots of work stuff right now as well :)

Thanks again for all the comments and help with this. I hope it can be a useful contribution eventually! Enjoy the long weekend!

richfitz · 2018-05-03T09:10:11Z

The solution I am circling in on depends a little too closely on some storr internals, so I am going to make some changes to storr itself, and then the solution becomes pretty much just your file ops idea

ben-gready · 2018-05-09T03:39:09Z

@richfitz Apologies for the radio silence here recently. I have not lost interest in this, but rather have been overwhelmed with work stuff the last few days. It'll likely be towards the weekend before I get a chance to get my head back into this.

Cheers,
Ben

ben-gready added 4 commits April 26, 2018 22:56

initial attempt at R6 object abstracting file operations on S3

c347118

further abstraction. Not working when storr reinitialized right now

9555ae0

needed to add S3readRDS method correctly

8e003bd

small cleanup

94d62e6

richfitz mentioned this pull request Apr 27, 2018

Add AWS S3 support for rds storr richfitz/storr#72

Open

ben-gready and others added 5 commits April 27, 2018 12:07

Merge branch 'master' into abstract-file-ops

39d2167

put export statement in correct place

9fa566f

Merge branch 'master' into abstract-file-ops

f828f06

add trailing slash to path if missing in list_dir. Regenerate docs.

f6b1415

Merge branch 'abstract-file-ops' of https://github.com/ben-gready/sto…

5f954ee

…rr.remote into abstract-file-ops

richfitz reviewed May 2, 2018

View reviewed changes

Conversation

ben-gready commented Apr 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ben-gready May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richfitz commented May 2, 2018

Uh oh!

richfitz commented May 2, 2018

Uh oh!

ben-gready commented May 3, 2018

Uh oh!

richfitz commented May 3, 2018

Uh oh!

ben-gready commented May 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ben-gready May 3, 2018 •

edited

Loading