Skip to content

Permit file inventory of packaged content (TAR/ZIP) - E-ARK AIP format requirement #68

@shsdev

Description

@shsdev

The following issue description is part of a requirement to define AIP storage recommendations for E-ARK AIPs.

The inventory.json could allow listing the file inventory of packaged archive files, such as TAR or ZIP.

This would allow documenting the changes (updates/additions/deletions) between different versions of packaged archive files.

To give an example, the following directory contains an archive file mydataobject.tar:

mydataobject/data
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
└── v001
    └── content
        └── mydataobject_v001.tar

The OCFL inventory would allow documenting the versions of this data object:

{
    "digestAlgorithm": "sha512",
    "fixity": {
        "md5": {
            "e5ad509db4ddb4cef0de4c1c19c7988b": [
                "v001/content/mydataobject_v001.tar"
            ]
        },
        "sha256": {
            "68a5b60ddef62758389f6894a1e7df28c1d228a5d56d2eec3ce2f74e80c27910": [
                "v001/content/mydataobject_v001.tar"
            ]
        }
    },
    "head": "v001",
    "id": "urn:uuid:1017cc9b-eaed-4064-947e-a07c752d3760",
    "manifest": {
        "24db03a2a7d9c7e2e7ea533e2ac84b7274f937eaff31e95f508cd9c5418a902adf5c18d2f67fa80aa25b7d72ce829951e79ea66210959c86aab33b5ef0c8b8bc": [
            "v001/content/mydataobject_v001.tar"
        ]
    },
    "type": "https://ocfl.io/1.0/spec/#inventory",
    "versions": {
        "v001": {
            "created": "2021-03-27T18:49:22Z",
            "message": "Initial data object",
            "state": {
                "24db03a2a7d9c7e2e7ea533e2ac84b7274f937eaff31e95f508cd9c5418a902adf5c18d2f67fa80aa25b7d72ce829951e79ea66210959c86aab33b5ef0c8b8bc": [
                    "v001/content/mydataobject_v001.tar"
                ]
            }
        }
    }
}

It would be desirable to have an option to create an inventory of the packaged archive file as if it would be unpackaged. So instead of the directory listing with the TAR file above, it would treat the TAR file as if it would be unpackaged, for example as follows (in this case with a bagit container inside):

mydataobject/data
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
└── v00000
    └── content
        └── mydataobject_v001
            ├── bag-info.txt
            ├── bagit.txt
            ├── data
            │   ├── data_file1.pdf
            │   ├── data_file2.pdf
            │   └── ...
            ├── manifest-sha256.txt
            ├── manifest-sha512.txt
            ├── tagmanifest-sha256.txt
            └── tagmanifest-sha512.txt

This would allow using OCFL to document updates/additions/deletions in archived container files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions