-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Re #77 (comment), I would like to propose a new driver that handles this slightly differently. It requires more work up front, but I think it could allow for more customization and future-proofing in the long run. @richfitz, if you like the idea, please let me know and I will write a more thorough design document.
Initialization
The proposed multiformat driver accepts a custom read/write protocol on initialization. The default format is RDS, and storr_multiformat() with an empty formats argument should behave like storr_rds().
s <- storr_multiformat(
path,
formats = storr_format_protocol(
storr_format(
class = c("keras.engine.sequential.Sequential"),
extension = "keras",
hash = "object",
serialize = function(object) {
keras::serialize_model(object)
},
unserialize = function(raw) {
keras::unserialize_model(raw)
},
read = function(filepath = path) {
readRDS(file = path)
},
write = function(object, path) {
saveRDS(object = object, file = path)
}
),
storr_format(
extn = "fst",
class = "data.frame",
hash = "file", # Hash the file, not the in-memory data. Avoids serialization.
read = function(path) {
# Read in fst format.
},
write = function(object, path) {
# Write in fst format.
},
)
)
)We could store the format protocol in an R script that gets source()d when we call storr_multiformat() on an existing storr.
path/
├── config/
├───── formats.R
├───── hash_algorithm
├───── mangle_key
├───── version
├── data/
├── keys/
└── scratch/
If a multiformat storr already exists at the given path, the user should not be allowed to set the formats argument.
s <- storr_multiformat(
path,
formats = storr_format_protocol(storr_format(...))
)
#> Error: cannot set formats of an existing multiformat storr.Storage
s$set(key, value) could
- Choose the most appropriate format for
valuegiven its S3 class. - If
hashis equal to"object"for the given format, serialize and hashvaluein memory. - Save the object to a temporary file in
scratch/. - If
hashis equal to"file", hash the temporary file without having serialized anything. - Move the file to
HASH.EXT, whereEXTis the file extension we gave in the protocol.
Retrieval
s$get(key) could
- Get the file extension of the data file.
- Identify the format in which it was originally saved.
- Read the data using the
readfunction in the protocol.