Skip to content

Multiformat driver #103

@wlandau

Description

@wlandau

Re #77 (comment), I would like to propose a new driver that handles this slightly differently. It requires more work up front, but I think it could allow for more customization and future-proofing in the long run. @richfitz, if you like the idea, please let me know and I will write a more thorough design document.

Initialization

The proposed multiformat driver accepts a custom read/write protocol on initialization. The default format is RDS, and storr_multiformat() with an empty formats argument should behave like storr_rds().

s <- storr_multiformat(
  path,
  formats = storr_format_protocol(
    storr_format(
      class = c("keras.engine.sequential.Sequential"),
      extension = "keras",
      hash = "object",
      serialize = function(object) {
        keras::serialize_model(object)
      },
      unserialize = function(raw) {
        keras::unserialize_model(raw)
      },
      read = function(filepath = path) {
        readRDS(file = path)
      },
      write = function(object, path) {
        saveRDS(object = object, file = path)
      }
    ),
    storr_format(
      extn = "fst",
      class = "data.frame",
      hash = "file", # Hash the file, not the in-memory data. Avoids serialization.
      read = function(path) {
        # Read in fst format.
      },
      write = function(object, path) {
        # Write in fst format.
      },
    )
  )
)

We could store the format protocol in an R script that gets source()d when we call storr_multiformat() on an existing storr.

path/
├── config/
├───── formats.R
├───── hash_algorithm
├───── mangle_key
├───── version
├── data/
├── keys/
└── scratch/

If a multiformat storr already exists at the given path, the user should not be allowed to set the formats argument.

s <- storr_multiformat(
  path,
  formats = storr_format_protocol(storr_format(...))
)
#> Error: cannot set formats of an existing multiformat storr.

Storage

s$set(key, value) could

  1. Choose the most appropriate format for value given its S3 class.
  2. If hash is equal to "object" for the given format, serialize and hash value in memory.
  3. Save the object to a temporary file in scratch/.
  4. If hash is equal to "file", hash the temporary file without having serialized anything.
  5. Move the file to HASH.EXT, where EXT is the file extension we gave in the protocol.

Retrieval

s$get(key) could

  1. Get the file extension of the data file.
  2. Identify the format in which it was originally saved.
  3. Read the data using the read function in the protocol.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions