Conversation
…rr.remote into abstract-file-ops
| hash_algorithm = NULL, | ||
| traits = list(accept = "raw"), | ||
|
|
||
| initialize = function(file_ops, path, compress, mangle_key, mangle_key_pad, |
There was a problem hiding this comment.
I think the file ops should be generated from the whatever handle holds the S3 credentials
There was a problem hiding this comment.
I'm not too sure I understand this one. I will add documentation about how the AWS credentials are setup, and then let's discuss some more. See my comments further down about what I was hoping to achieve with file_ops...
There was a problem hiding this comment.
I think the way to did this was right in the end!
| self$file_ops <- file_ops | ||
| self$path <- path | ||
|
|
||
| ## This is a bit of complicated dancing around to mantain |
There was a problem hiding this comment.
If you base this off one of the other drives the backward compatibility may be less problematic
There was a problem hiding this comment.
I based this off the rds driver in storr - I wasn't sure how much of the backwards compatibility stuff is needed, so I included it - if it's not needed (which I'm guessing it's not), then I'd be up for removing it.
There was a problem hiding this comment.
yeah, I recognise it - backward compatibility is hard 😢
I have an idea to at least centralise this, but it will probably not make it in the first round of code
| set_hash = function(key, namespace, hash) { | ||
| self$file_ops$create_dir(path = self$name_key("", namespace)) | ||
| self$file_ops$writeLines(text = hash, path = self$name_key(key, namespace)) | ||
| #*** should be making use of (or making an equivalent version of) the |
There was a problem hiding this comment.
This is something that should be done in the file_ops class I think
There was a problem hiding this comment.
My intention in the file ops object was to boil down the fewest set of file operations needed (list directory, read file, write file, delete file, delete directory etc.) that are needed from whichever backend storage system is being used. That way, non-storr related code (specific to a back end) could be entirely encapsulated within that fairly simple object. I was then using those minimal functions to recreate the other driver objects. My thinking is that other backends/storage types just need to replicate the object with a minimal set of file operations and plug it into the driver_rds_remote object. I still don't really know the storr codebase well enough to know if that is a good strategy or not though... Any thoughts?
There was a problem hiding this comment.
This is a fabulous strategy - it's absolutely the right way of doing things and something I'd not thought of. But these can still be made atomic as much as possible
There was a problem hiding this comment.
Can I just re-iterate how much I like your idea
| #file if the write fails) | ||
| }, | ||
|
|
||
| get_object = function(hash) { |
There was a problem hiding this comment.
It will be worth thinking about caching these - there is no need to ever move a single data object twice
There was a problem hiding this comment.
Sounds good, but again, I will hold off on changing internals for now
There was a problem hiding this comment.
Yup, I've centralised this into a helper class now
| self$file_ops$object_exists(self$name_hash(hash)) | ||
| }, | ||
|
|
||
| del_hash = function(key, namespace) { |
There was a problem hiding this comment.
There's no need to use an actual directory. In the redis driver there is no directory either - it's just a schema over the keys. So there I use (I think) something like data:<hash> and keys:<namespace>:<key> - you could do the same thing here storing things as data_<hash>.rds and keys_namespace_value - though you have to work out what you'll do with names that include an underscore! Alternatively, you could use keys/namespace/value but then base64 encode that and store that as a file on S3
There was a problem hiding this comment.
the only place where this gets really nasty is deletion because you lose the ability to remove a directory recursively and you can get race conditions with a list/del pair
There was a problem hiding this comment.
I did think about not using an actual directory, but (I think) that would mean an entire S3 bucket would need to be dedicated to each store, and so i leaned in favor of trying to mimic a directory structure. This does mean that multiple storrs can be created in a bucket, and you can browse the file structure using an S3 client. I tried to create delete functions that work, but I'm not sure a good strategy for testing them out.
There was a problem hiding this comment.
It'd be great if this could be documented somewhere (the layout on S3)
| ## future versions. I'm writing out a version number here that | ||
| ## future versions of driver_rds can use to patch, warn or | ||
| ## change behaviour with older versions of the storr. | ||
| if (!is_new && !file_ops$object_exists(path = storr:::driver_rds_config_file(path, "version"))) { |
There was a problem hiding this comment.
You almost certainly want to get all the options into a single object and do that in one read/write operation. I think I do that with the redis or DBI driver?
There was a problem hiding this comment.
I'll take a look and try to understand it better myself, but will hold off changing anything internal for now, given your other comments
There was a problem hiding this comment.
Hold off on this and I'll see if I can sort something out - this is going to be a headache for all the remote rds storrs
|
Just to loop you in on the radio silence here; I am trying to work out a minimal set of remote file ops to work over a ssh server. But there are some comments on the storr half of what you have above. If you'd like I can do a PR to your PR with all the boring style changes I might make |
|
Update: got most of the work done to generalise this done. I'm taking a long weekend, but will get you some code mid next week; perhaps hold off on changing any internals in what you have, but if you could document how to test this (setting up keys etc) that would be awesome. No stress though |
|
Thanks for the comments and suggestions here. You'll have to excuse my lack of knowledge in places and questions. Sounds great with the PR - thanks! I will make sure to get some documentation in around setting up keys in the next few days. I will also try to find time to get some of your tests from storr working on this I can, although I'm juggling lots of work stuff right now as well :) Thanks again for all the comments and help with this. I hope it can be a useful contribution eventually! Enjoy the long weekend! |
|
The solution I am circling in on depends a little too closely on some storr internals, so I am going to make some changes to storr itself, and then the solution becomes pretty much just your file ops idea |
|
@richfitz Apologies for the radio silence here recently. I have not lost interest in this, but rather have been overwhelmed with work stuff the last few days. It'll likely be towards the weekend before I get a chance to get my head back into this. Cheers, |
Functionality seems to be equivalent to master branch now, so I'm putting up a PR for discussion purposes. @richfitz I would be keen to know your thoughts on the approach I have taken in this branch.