Migration of CERN Digitized Videos

## Useful links
- The current data model of CDS Videos is available [here](https://github.com/CERNDocumentServer/cds-videos/blob/main/cds/modules/records/schemas/records/videos/video/video-v1.0.0.json).
- The collection of digitized videos to be migrated in CDS is [here](https://cds.cern.ch/collection/CERN%20Moving%20Images).
- The documentation of what has been when digitizing is [here](https://digital-repositories.web.cern.ch/dm/digitisation/digitisation-video/#curated-fields), with the list of metadata fields.

## Data model changes
The first step is to analyze the data model of CDS Videos and understand what changes should be done. Given that, in the future, we will migrate CDS Videos to CDS, the data model changes should be compatible with the InvenioRDM data model (and custom fields).

### Extra fields
We should evaluate if these extra fields could go to a JSON blob field, allowing key/values, and the impact of this solution on search capabilities.

### Category
It makes sense to import them with `category:CERN`, given that these are CERN official videos.

### Owners
It is not yet clear who should the owner of these records and who can edit metadata. To be discussed and decided.
For curation, we should probably create a group "multimedia curators" and decide who goes in.

### Considerations
There are duplicated videos: same videos, already in CDS Videos, have been re-digitized. They have the same recid. Both videos, old and new, should be kept. We need to check if the data model supports it.
Same for metadata, metadata of existing videos should be enriched by the newly digitized ones.

### Relevant code

We should re-use [cds-dojson](https://github.com/CERNDocumentServer/cds-dojson) module and the fields rules for CDS Videos.
See documentation of `dojson`: https://dojson.readthedocs.io/en/latest/usage.html for examples.

We will create a branch e.g. `digitization-2023` in cds-dojson, where we can apply the modifications to the CDS Videos schema and add new conversion rules.
We should update the README to explain why the new branch, with relevant links to the digitization project/process.

`cds-dojson` usage example:

```python
from cds_dojson.marc21.utils import create_record
from cds_dojson.marc21 import marc21
from cds_dojson.marc21.models.videos.video import model as video_model

# make sure that the XML does not have the first tag XML header:
# <?xml version="1.0" encoding="UTF-8"?>
# otherwise you might have the error:
# ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

PATH = '/tmp/2256680.xml'
marcxml = None
with open(PATH, 'rb') as fp:
    marcxml = fp.read()
blob = create_record(marcxml)

# ALTERNATIVES
# guess video/project by __query__
marc21.do(blob)
# expect directly a video
record = video_model.do(blob)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migration of CERN Digitized Videos #1926

Useful links

Data model changes

Extra fields

Category

Owners

Considerations

Relevant code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migration of CERN Digitized Videos #1926

Description

Useful links

Data model changes

Extra fields

Category

Owners

Considerations

Relevant code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions