Skip to content

KType/KItem export/import #44

@MBueschelberger

Description

@MBueschelberger

In the future, we want to entirely the dump the content of a kitem to various formats, e.g. json, yaml, aasx, hdf5.

As an initial step, we would need to dump the model with all of its fields to a dict, which then can be serialized to json or yaml:

Let us consider the following dict:

data = {
    "name": "MD_data_epoxy_resin",
    "id": "971c0409-fb46-4e22-822e-108f5300efe8",
    "slug": "md_data_epoxy_resin-971c0409",
    "ktype_id": "dataset",
    "created_at": "2024-12-19T05:52:05.915662",
    "updated_at": "2024-12-19T05:52:05.915662",
    "summary": None,
    "avatar_exists": False,
    "annotations": [
        {
            "iri": "https://w3id.org/dimat/BatchUploaded",
            "namespace": "https://w3id.org/dimat",
            "label": "BatchUploaded",
            "id": "971c0409-fb46-4e22-822e-108f5300efe8"
        },
        {
            "iri": "https://w3id.org/dimat/PropertyData",
            "namespace": "https://w3id.org/dimat",
            "label": "PropertyData",
            "id": "971c0409-fb46-4e22-822e-108f5300efe8"
        }
    ],
    "linked_kitems": [],
    "external_links": [],
    "contacts": [],
    "authors": [
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "user_id": "ce234115-f053-4884-b9ca-ae7761eb1131"
        }
    ],
    "affiliations": [],
    "attachments": [
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "name": "MD_data_epoxy_resin.xlsx"
        },
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "name": "subgraph.ttl"
        }
    ],
    "kitem_apps": [
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "title": "pilot-2_upload_carbon_epoxy",
            "kitem_app_id": 207,
            "executable": "pilot-2_upload_carbon_epoxy",
            "description": None,
            "tags": None,
            "additional_properties": {
                "triggerUponUpload": True,
                "triggerUponUploadFileExtensions": [".xlsx"]
            }
        }
    ],
    "custom_properties": {
        "id": "971c0409-fb46-4e22-822e-108f5300efe8",
        "content": {
            "sections": [
                {
                    "id": "id17345875421574uce0p",
                    "name": "General",
                    "entries": [
                        {"id": "id1734587542157e3d8ip", "type": "Text", "label": "Name", "value": "Expoxy Resin"},
                        {"id": "id1734587542157xggcen", "type": "Text", "label": "Version", "value": "v01"},
                        {"id": "id1734587542157p18gqp", "type": "Text", "label": "Description", "value": "Resin for aeronautical application"},
                        {"id": "id1734587542157nmuw93", "type": "Text", "label": "Keywords", "value": "resin, epoxy"},
                        {"id": "id1734587542157xk8ytt", "type": "Number", "label": "MassDensity", "value": 1200},
                        {"id": "id1734587542157etvvgt", "type": "Text", "label": "Anisotropy", "value": "Isostropic"}
                    ]
                }
            ]
        }
    },
    "rdf_exists": True,
    "access_url": "https://cmdb.materials-data.space/knowledge/dataset/md_data_epoxy_resin-971c0409"
}

NOTE: The content of the attachment is not downloaded, the dataframe and the subgraph is not present and the avatar (image represented as binary) is missing. This needs to be added to a potential export function.

The export to hdf5 and aasx requires an additional effort, since we would need a wrapper for both formats.

The easiest case is to start with the hdf5.

A potential hdf5-kitem-wrapper can look like this (not tested, just AI-prompted):

import h5py

with h5py.File('data.h5', 'w') as hdf:
    # Store top-level attributes
    for key in ['name', 'id', 'slug', 'ktype_id', 'created_at', 'updated_at', 'summary', 'avatar_exists', 'rdf_exists', 'access_url']:
        hdf.create_dataset(key, data=data[key])
    
    # Store annotations
    annotations_group = hdf.create_group('annotations')
    for i, annotation in enumerate(data['annotations']):
        annotation_group = annotations_group.create_group(f'annotation_{i}')
        for key, value in annotation.items():
            annotation_group.create_dataset(key, data=value)

    # Store linked_kitems if any
    linked_kitems_group = hdf.create_group('linked_kitems')
    for i, linked_kitem in enumerate(data['linked_kitems']):
        linked_kitem_group = linked_kitems_group.create_group(f'linked_kitem_{i}')
        for key, value in linked_kitem.items():
            linked_kitem_group.create_dataset(key, data=value)

    # Store attachments
    attachments_group = hdf.create_group('attachments')
    for i, attachment in enumerate(data['attachments']):
        attachment_group = attachments_group.create_group(f'attachment_{i}')
        for key, value in attachment.items():
            attachment_group.create_dataset(key, data=value)

    # Store kitem_apps
    kitem_apps_group = hdf.create_group('kitem_apps')
    for i, app in enumerate(data['kitem_apps']):
        app_group = kitem_apps_group.create_group(f'app_{i}')
        for key, value in app.items():
            if key == 'additional_properties':
                for prop_key, prop_value in value.items():
                    app_group.create_dataset(f'additional_properties/{prop_key}', data=prop_value)
            else:
                app_group.create_dataset(key, data=value)

    # Store custom_properties
    custom_properties_group = hdf.create_group('custom_properties')
    for key, value in data['custom_properties'].items():
        if key == 'content':
            content_group = custom_properties_group.create_group('content')
            for section in value['sections']:
                section_group = content_group.create_group(section['id'])
                for entry in section['entries']:
                    section_group.create_dataset(entry['label'], data=entry['value'])
        else:
            custom_properties_group.create_dataset(key, data=value)

The Kitem hence needs to be extended with a export function. This export function can receive an enum for all support export formats:

from enum import Enum

class Format(Enum, str):

    JSON = "json"
    YAML = "yaml"
    HDF5 = "hdf5"


class KItem(BaseModel):

    [...]
    
    def export(self, format: Format) -> Any:
        [....]

Same would need to be implemented for the KTypes.

Later, a corresponding import-function needs to be implemented:

from dsms import import_kitem, import_ktype, Fomat, DSMS

dsms = DSMS(env=".env")

kitem = import_kitem("path/to/file", format=Format.hdf5)
ktype = import_ktype("path/to/file", fomat=Format.hdf5)

dsms.commit()

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions