diff --git a/api/TimeAddressableMediaStore.yaml b/api/TimeAddressableMediaStore.yaml index 31a918c3..0146237c 100644 --- a/api/TimeAddressableMediaStore.yaml +++ b/api/TimeAddressableMediaStore.yaml @@ -1286,7 +1286,7 @@ paths: $ref: '#/components/responses/trait_resource_info_head_404' get: summary: Flow Read-Only - description: Returns the Flow read_only property. If set to 'true', service implementations SHOULD reject client requests to update Flow metadata (other than the read_only property), Flow Segments and Media Objects. + description: Returns the Flow read_only property. If set to 'true', service implementations SHOULD reject client requests to update Flow metadata (other than the read_only property), and Flow Segments. Service implementations should also reject requests to the [`/flows/{flowId}/storage`](#/operations/POST_flows-flowId-storage) endpoint for the Flow, and requests to delete the Flow. operationId: GET_flows-flowId-read-only tags: - Flows @@ -1302,7 +1302,7 @@ paths: description: The requested Flow does not exist. put: summary: Set Flow Read-Only - description: Set the read-only property. If set to 'true', service implementations SHOULD reject client requests to update Flow metadata (other than the read_only property), Flow Segments and Media Objects. + description: Set the read-only property. If set to 'true', service implementations SHOULD reject client requests to update Flow metadata (other than the read_only property), and Flow Segments. Service implementations should also reject requests to the [`/flows/{flowId}/storage`](#/operations/POST_flows-flowId-storage) endpoint for the Flow, and requests to delete the Flow. operationId: PUT_flows-flowId-read-only tags: - Flows @@ -1826,6 +1826,10 @@ paths: A service instance MAY support Media Objects that are held in external storage in another TAMS or other media storage system. The Flow Segment may in that case require the `get_urls` property to provide the information needed by clients to access the Media Object. + Clients MAY modify Flow Segments, but this should only be done in exceptional circumstances to correct metadata as such operations will likely break the idempotency of Segments. + Properties of Media Objects, such as `get_urls`, SHOULD be modified via the [`/objects`](#/operations/GET_objects) endpoints. + They SHOULD NOT be modified at this endpoint, and TAMS instences SHOULD reject such requests with a `400` error. + If a client needs to modify a Flow Segment, e.g. to correct metadata such as the `key_frame_count` or add additional URLs to `get_urls`, then the client SHOULD first delete the existing Segment and then write a new one. The behaviour is undefined if the Segment exists and the service may return a 400 error response. @@ -1922,9 +1926,9 @@ paths: $ref: 'schemas/uuid.json' description: The Flow identifier. post: - summary: Allocate Flow Storage + summary: Allocate Initial Flow Storage description: | - Allocate storage locations for writing Media Objects. + Allocate initial storage locations for writing Media Objects. The Storage Backend type, which is indicated in the [/service](#/operations/GET_service) resource, determines the information provided in the response. The examples and description below are for the "http_object_store" Storage Backend type. @@ -1934,10 +1938,6 @@ paths: The client is expected to register the Flow Segment using the [/flows/{flowId}/segments](#/operations/POST_flows-flowId-segments) endpoint once the upload is complete. Service implementations need to handle situations where Objects were uploaded but no Flow Segment was registered successfully. - The response may include PUT URLs for creating buckets for the Media Objects. - These PUT URLs should be used before uploading Media Objects. - The object_id associated with each storage location has the bucket name as its prefix. - When making requests to the provided `put_url`, clients should include credentials if the provided URL is on the same origin as the API itself, akin to the `same-origin` mode in the [WhatWG Fetch Standard](https://fetch.spec.whatwg.org/#concept-request-credentials-mode). operationId: POST_flows-flowId-storage tags: @@ -1981,10 +1981,52 @@ paths: parameters: - name: objectId in: path - description: The Media Object identifier. + description: The Media Object identifier. The Object ID may include special characters such as `/` which should be URL encoded. required: true schema: type: string + - name: verbose_storage + in: query + description: | + Include storage metadata in `get_urls`. + When `verbose_storage` is `false` only `url`, `presigned`, and `label` will be included in `get_urls`. + schema: + default: false + type: boolean + - name: accept_get_urls + in: query + description: | + A comma separated list of labels of media object `get_urls` to include in the response. + Omitting `accept_get_urls` will result in no filtering of `get_urls`. + An empty `accept_get_urls` results in an empty or no `get_urls` in the response. + Media object `get_urls` with no label or storage ID cannot be filtered for; they will only be returned if `accept_get_urls` is omitted, and `accept_storage_ids` is omitted or empty. + Without `get_urls`, the response from the service could be substantially faster if it is not required to + generate a large number of pre-signed URLs for example. + Where multiple filter query parameters are provided, the returned `get_urls` will match all filters. + schema: + $ref: 'schemas/url-label-list.json' + - name: accept_storage_ids + in: query + description: | + A comma separated list of `storage_id`s of media object `get_urls` to include in the response. + Omitting `accept_storage_ids`, or providing an empty `accept_storage_ids` will result in no filtering of `get_urls`. + Media object `get_urls` with no label or storage ID cannot be filtered for; they will only be returned if `accept_get_urls` is omitted, and `accept_storage_ids` is omitted or empty. + A full list of available `storage_id`s may be found at the `service/storage-backends` endpoint. + Where multiple filter query parameters are provided, the returned `get_urls` will match all filters. + schema: + type: string + pattern: ^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})(,[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})*$ + - name: presigned + in: query + description: | + If set to `true`, only presigned URLs (i.e. those whos `presigned` property is `true`) will be returned in `get_urls`. + If set to `false`, only non-presigned URLs (i.e. those whos `presigned` property is `false`) will be returned in `get_urls`. + If omitted, both presigned and non-presigned URLs will be returned. + If `presigned` is set to `false`, the response from the service could be substantially faster if it is not required to + generate a large number of pre-signed URLs. + Where multiple filter query parameters are provided, the returned `get_urls` will match all filters. + schema: + type: boolean - $ref: '#/components/parameters/trait_resource_paged_key' - $ref: '#/components/parameters/trait_paged_limit' responses: @@ -2024,10 +2066,52 @@ paths: parameters: - name: objectId in: path - description: The Media Object identifier. + description: The Media Object identifier. The Object ID may include special characters such as `/` which should be URL encoded. required: true schema: type: string + - name: verbose_storage + in: query + description: | + Include storage metadata in `get_urls`. + When `verbose_storage` is `false` only `url`, `presigned`, and `label` will be included in `get_urls`. + schema: + default: false + type: boolean + - name: accept_get_urls + in: query + description: | + A comma separated list of labels of media object `get_urls` to include in the response. + Omitting `accept_get_urls` will result in no filtering of `get_urls`. + An empty `accept_get_urls` results in an empty or no `get_urls` in the response. + Media object `get_urls` with no label or storage ID cannot be filtered for; they will only be returned if `accept_get_urls` is omitted, and `accept_storage_ids` is omitted or empty. + Without `get_urls`, the response from the service could be substantially faster if it is not required to + generate a large number of pre-signed URLs for example. + Where multiple filter query parameters are provided, the returned `get_urls` will match all filters. + schema: + $ref: 'schemas/url-label-list.json' + - name: accept_storage_ids + in: query + description: | + A comma separated list of `storage_id`s of media object `get_urls` to include in the response. + Omitting `accept_storage_ids`, or providing an empty `accept_storage_ids` will result in no filtering of `get_urls`. + Media object `get_urls` with no label or storage ID cannot be filtered for; they will only be returned if `accept_get_urls` is omitted, and `accept_storage_ids` is omitted or empty. + A full list of available `storage_id`s may be found at the `service/storage-backends` endpoint. + Where multiple filter query parameters are provided, the returned `get_urls` will match all filters. + schema: + type: string + pattern: ^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})(,[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})*$ + - name: presigned + in: query + description: | + If set to `true`, only presigned URLs (i.e. those whos `presigned` property is `true`) will be returned in `get_urls`. + If set to `false`, only non-presigned URLs (i.e. those whos `presigned` property is `false`) will be returned in `get_urls`. + If omitted, both presigned and non-presigned URLs will be returned. + If `presigned` is set to `false`, the response from the service could be substantially faster if it is not required to + generate a large number of pre-signed URLs. + Where multiple filter query parameters are provided, the returned `get_urls` will match all filters. + schema: + type: boolean - $ref: '#/components/parameters/trait_resource_paged_key' - $ref: '#/components/parameters/trait_paged_limit' responses: @@ -2055,7 +2139,98 @@ paths: "400": description: Bad request. Invalid query options. "404": - description: The requested Media Object does not exist. + description: The requested media object does not exist. + /objects/{objectId}/instances: + post: + summary: Register a Media Object instance + description: | + Request the service to create an Object instance on a new Storage Backend. Or add a new uncontrolled URL to `get_urls`. + + To request the duplication of the Object to a new Storage Backend, clients POST a `storage_id` to this endpoint that does not currently have an instance of the Object. The API will then: + + - Allocate storage for Media Object `objectId` on Storage Backend `storage_id` + - Copy the Media Object from an existing location to the newly allocated storage + - Start advertising the new copy in `get_urls` once ready + + The API instances SHOULD be capable of handling the case where the only existant instances are uncontrolled. + + Where a client has written a new uncontrolled Object instance, the client is responsible for ensuring that the Object written is complete and correct before registering it with this method. + + All instances of an Object MUST be identical. + operationId: POST_objects-instances + tags: + - Objects + parameters: + - name: objectId + in: path + description: The Media Object identifier. The Object ID may include special characters such as `/` which should be URL encoded. + required: true + schema: + type: string + requestBody: + content: + application/json: + examples: + controlled: + summary: Registering a controlled instance + value: + $ref: examples/objects-instances-controlled-post.json + uncontrolled: + summary: Registering a uncontrolled instance + value: + $ref: examples/objects-instances-uncontrolled-post.json + schema: + $ref: schemas/objects-instances-post.json + required: true + responses: + "201": + description: Object instance successfully registered. + "400": + description: Bad request. Invalid request JSON. + "403": + description: Forbidden. You do not have permission to modify this Media Object. + "404": + description: The Media Object does not exist. + delete: + summary: Delete a Media Object instance + description: | + Delete an instance of a Media Object. + + One of `storage_id` or `label` MUST be specified in the query parameters. `storage_id` SHOULD be used where `controlled` is `True` for the instance. + + API instances should remove the Media Object instance from the `get_urls` list and then, if the instance is controlled, delete the Object instance from storage. + + API instances SHOULD prevent clients from deleting all Object instances. Additionally, API instances MAY prevent clients from deleting all controlled Object instances. Where clients wish to remove all copies of an Object from the store, they should do so by deleting all Flows or Flow Segments which reference the Object. + operationId: DELETE_objects-instances + tags: + - Objects + parameters: + - name: objectId + in: path + description: The Media Object identifier. The Object ID may include special characters such as `/` which should be URL encoded. + required: true + schema: + type: string + - name: storage_id + in: query + description: The storage_id identifying the Media Object instance to be deleted. + schema: + type: string + - name: label + in: query + description: The label identifying the Media Object instance to be deleted. + schema: + type: string + responses: + "204": + description: No content. The Media Object instance has been deleted. + "400": + description: Bad request. Invalid query options. + "403": + description: Forbidden. You do not have permission to modify this Media Object. + "404": + description: The requested Object ID in the path is invalid. + /flow-delete-requests: head: summary: List Flow Delete Requests diff --git a/api/examples/objects-get-200.json b/api/examples/objects-get-200.json index 84b38102..3fd00264 100644 --- a/api/examples/objects-get-200.json +++ b/api/examples/objects-get-200.json @@ -5,5 +5,10 @@ "4f79cfd1-c057-47f4-8e4d-1b126ca7bf34", "0fde9c11-da9d-434a-a113-d3b20a2cf251" ], - "first_referenced_by_flow": "4f79cfd1-c057-47f4-8e4d-1b126ca7bf34" -} \ No newline at end of file + "first_referenced_by_flow": "4f79cfd1-c057-47f4-8e4d-1b126ca7bf34", + "get_urls": [ + { + "url": "https://store.example.com/tams-e2b89b02-21e7-5f9d-aa2d-db38b01453c9/846023d3-612d-5014-bc47-88f6eb2d04bb" + } + ] +} diff --git a/api/examples/objects-instances-controlled-post.json b/api/examples/objects-instances-controlled-post.json new file mode 100644 index 00000000..07f74050 --- /dev/null +++ b/api/examples/objects-instances-controlled-post.json @@ -0,0 +1,3 @@ +{ + "storage_id": "60af2ab4-e8a5-4c65-a09b-d35983680315" +} diff --git a/api/examples/objects-instances-uncontrolled-post.json b/api/examples/objects-instances-uncontrolled-post.json new file mode 100644 index 00000000..3f484127 --- /dev/null +++ b/api/examples/objects-instances-uncontrolled-post.json @@ -0,0 +1,4 @@ +{ + "url": "https://tams-b.s3.eu-west-1.amazonaws.com/35e9b447-be10-43e0-ab85-e1e9fe15d354?X-Amz-Security-Token=signature...", + "label": "pipeline-b" +} diff --git a/api/schemas/flow-core.json b/api/schemas/flow-core.json index dee6badf..2ad77c37 100644 --- a/api/schemas/flow-core.json +++ b/api/schemas/flow-core.json @@ -75,7 +75,7 @@ }, "read_only": { - "description": "If set to 'true', service implementations SHOULD reject client requests to update Flow metadata (other than the read_only property), Flow Segments and Media Objects", + "description": "If set to 'true', service implementations SHOULD reject client requests to update Flow metadata (other than the read_only property), and Flow Segments. Service implementations should also reject requests to the [`/flows/{flowId}/storage`](#/operations/POST_flows-flowId-storage) endpoint for the Flow, and requests to delete the Flow.", "type": "boolean" }, "codec": diff --git a/api/schemas/flow-segment-post.json b/api/schemas/flow-segment-post.json index 2dbf17b1..bac1a5a9 100644 --- a/api/schemas/flow-segment-post.json +++ b/api/schemas/flow-segment-post.json @@ -41,7 +41,7 @@ }, "get_urls": { - "description": "A list of URLs to which a GET request can be made to directly retrieve the contents of the Segment. This is required by the `http_object_store` Storage Backend type, which is the only one currently described. Clients may choose any URL in the list and treat them as identical, however service instances may sort the list such that the preferred URL is first. `get_urls` should only be used to add uncontrolled URLs. URLs for the provided object_id controlled by the service instance will be populated automatically by the service instance.", + "description": "A list of URLs to which a GET request can be made to directly retrieve the contents of the Media Object. This is required by the `http_object_store` Storage Backend type, which is the only one currently described. Clients may choose any URL in the list and treat them as identical, however service instances may sort the list such that the preferred URL is first. `get_urls` should only be used to add uncontrolled URLs. URLs for the provided object_id controlled by the service instance will be populated automatically by the service instance.", "type": "array", "items": { @@ -54,7 +54,7 @@ { "url": { - "description": "A URL to which a GET request can be made to directly retrieve the contents of the Segment. Clients should include credentials if the provide URL is on the same origin as the API endpoint", + "description": "A URL to which a GET request can be made to directly retrieve the contents of the Media Object. Clients should include credentials if the provide URL is on the same origin as the API endpoint", "type": "string" }, "label": @@ -67,7 +67,7 @@ }, "key_frame_count": { - "description": "The number of key frames in the Segment. This should be set greater than zero when the Segment contains key frames that serve as a stream access point", + "description": "The number of key frames in the Media Object. This should be set greater than zero when the Media Object contains key frames that serve as a stream access point", "type": "integer" } } diff --git a/api/schemas/flow-segment.json b/api/schemas/flow-segment.json index b6921dad..cc49fb56 100644 --- a/api/schemas/flow-segment.json +++ b/api/schemas/flow-segment.json @@ -1,99 +1,52 @@ { - "title": "Flow Segment", - "description": "Provides the location and metadata of the media files corresponding to timerange Segments of a Flow.", "type": "object", - "required": + "description": "Provides the location and metadata of the media files corresponding to timerange segments of a Flow.", + "title": "Flow Segment", + "allOf": [ - "object_id", - "timerange" - ], - "properties": - { - "object_id": - { - "description": "The Object identifier for the Media Object.", - "type": "string" - }, - "ts_offset": - { - "description": "The timestamp offset between the sample timestamps stored in, or inferred from, the media file and the corresponding timestamp in the Segment, ie. ts_offset = segment ts - media object ts. Assumed to be 0:0 if not set. Format as described by the [Timestamp](#/schemas/timestamp) type", - "$ref": "timestamp.json" - }, - "timerange": - { - "description": "The timerange for the samples contained in the Segment. The timerange start is always inclusive. If samples have a duration then the timerange end is exclusive and covers at least the duration of the last sample. The exclusive timerange end will typically be set to the timestamp of the next sample. If the samples don't have a duration then the timerange end is inclusive. Format is described by the [TimeRange](#/schemas/timerange) type. Note that where temporal re-ordering is used, the timerange and samples refers to the presentation timeline.", - "$ref": "timerange.json" - }, - "last_duration": - { - "description": "The difference between the exclusive end of the `timerange` and the last sample timestamp. Format as described by the [Timestamp](#/schemas/timestamp) type, but cannot be negative", - "$ref": "timestamp.json" - }, - "sample_offset": - { - "description": "The start of the Segment represented as a count of samples from the start of the Object. Note that a sample is a video frame or audio sample. A (coded) audio frame has multiple audio samples. Assumed to be 0 if not set. Must be set if the Flow Segment doesn't start at the beginning of the Media Object.", - "type": "integer" - }, - "sample_count": - { - "description": "The count of samples in the Segment (which may be fewer than in the Object). The count could be less than expected given the Segment duration and rate if there are gaps. If not set, every sample from sample_offset onwards is used. Must be set if the Flow Segment doesn't use the entire Media Object. Note that a sample is a video frame or audio sample. A (coded) audio frame has multiple audio samples", - "type": "integer" - }, - "get_urls": { - "description": "A list of URLs to which a GET request can be made to directly retrieve the contents of the Segment. This is required by the `http_object_store` Storage Backend type, which is the only one currently described. Clients may choose any URL in the list and treat the content returned as identical, however service instances may sort the list such that the preferred URL is first. Storage backend metadata for controlled URLs should be populated by the service instance based on the storage backend the Media Object copy resides in.", - "type": "array", - "items": + "type": "object", + "required": + [ + "object_id", + "timerange" + ], + "properties": { - "type": "object", - "unevaluatedProperties": false, - "allOf": - [ - { - "$ref": "storage-backend.json" - }, - { - "type": "object", - "required": - [ - "url" - ], - "properties": - { - "storage_id": - { - "description": "Storage backend identifier", - "$ref": "uuid.json" - }, - "url": - { - "description": "A URL to which a GET request can be made to directly retrieve the contents of the Segment. Clients should include credentials if the provide URL is on the same origin as the API endpoint", - "type": "string" - }, - "presigned": - { - "description": "If `true`, this URL is pre-signed. If this parameter is unset, the URL is NOT pre-signed.", - "type": "boolean" - }, - "label": - { - "description": "Label identifying this URL. If the URL is controlled by the service instance, this is the Storage Backend's label. If the URL is uncontrolled, this is the label provided when a client registered the URL. If the 'label' is not set then this URL can't be filtered for using the 'accept_get_urls' API query parameter.", - "type": "string" - }, - "controlled": - { - "description": "If `true`, this URL is on a storage backend controlled by this service instance. If `false`, this URL is uncontrolled and does not have it's lifecycle managed by this instance. If this parameter is unset, assume `true`.", - "type": "boolean" - } - } - } - ] + "object_id": + { + "description": "The object store identifier for the Media Object.", + "type": "string" + }, + "ts_offset": + { + "description": "The timestamp offset between the sample timestamps stored in the media file and the corresponding timestamp in the Segment, ie. ts_offset = segment ts - media object ts. Assumed to be 0:0 if not set. Format as described by the [Timestamp](../schemas/timestamp#top) type", + "$ref": "timestamp.json" + }, + "timerange": + { + "description": "The timerange for the samples contained in the Segment. The timerange start is always inclusive. If samples have a duration then the timerange end is exclusive and covers at least the duration of the last sample. The exclusive timerange end will typically be set to the timestamp of the next sample. If the samples don't have a duration then the timerange end is inclusive. Format is described by the [TimeRange](../schemas/timerange#top) type. Note that where temporal re-ordering is used, the timerange and samples refers to the presentation timeline.", + "$ref": "timerange.json" + }, + "last_duration": + { + "description": "The difference between the exclusive end of the `timerange` and the last sample timestamp. Format as described by the [Timestamp](../schemas/timestamp#top) type, but cannot be negative", + "$ref": "timestamp.json" + }, + "sample_offset": + { + "description": "The start of the Segment represented as a count of samples from the start of the Media Object. Note that a sample is a video frame or audio sample. A (coded) audio frame has multiple audio samples. Assumed to be 0 if not set.", + "type": "integer" + }, + "sample_count": + { + "description": "The count of samples in the Segment (which may be fewer than in the Media Object). The count could be less than expected given the Segment duration and rate if there are gaps. If not set, every sample from sample_offset onwards is used. Note that a sample is a video frame or audio sample. A (coded) audio frame has multiple audio samples", + "type": "integer" + } } }, - "key_frame_count": { - "description": "The number of key frames in the Segment. This should be set greater than zero when the Segment contains key frames that serve as a stream access point", - "type": "integer" + "$ref": "object-core.json" } - } -} + ] +} \ No newline at end of file diff --git a/api/schemas/object-core.json b/api/schemas/object-core.json new file mode 100644 index 00000000..77cfb42a --- /dev/null +++ b/api/schemas/object-core.json @@ -0,0 +1,52 @@ +{ + "type": "object", + "description": "Provides the location and metadata of the media files corresponding to a Media Object.", + "title": "Object", + "properties": { + "get_urls": { + "description": "A list of URLs to which a GET request can be made to directly retrieve the contents of the Media Object. This is required by the `http_object_store` Storage Backend type, which is the only one currently described. Clients may choose any URL in the list and treat the content returned as identical, however servers may sort the list such that the preferred URL is first. Storage Backend metadata for controlled URLs should be populated by the TAMS instance based on the Storage Backend the Meda Object instance resides in.", + "type": "array", + "items": { + "type": "object", + "unevaluatedProperties": false, + "allOf": [ + { + "$ref": "storage-backend.json" + }, + { + "type": "object", + "required": [ + "url" + ], + "properties": { + "storage_id": { + "description": "Storage Backend identifier", + "$ref": "uuid.json" + }, + "url": { + "description": "A URL to which a GET request can be made to directly retrieve the contents of the media object. Clients should include credentials if the provide URL is on the same origin as the API endpoint", + "type": "string" + }, + "presigned": { + "description": "If `true`, this URL is pre-signed. If this parameter is unset, the URL is NOT pre-signed.", + "type": "boolean" + }, + "label": { + "description": "Label identifying this URL. If the URL is controlled by the service instance, this is the Storage Backend's label. If the URL is uncontrolled, this is the label provided when a client registered the URL. If the 'label' is not set then this URL can't be filtered for using the 'accept_get_urls' API query parameter.", + "type": "string" + }, + "controlled": { + "description": "If `true`, this URL is on a Storage Backend controlled by this service instance. If `false`, this URL is uncontrolled and does not have it's lifecycle managed by this instance. If this parameter is unset, assume `true`.", + "type": "boolean" + } + } + } + ] + } + }, + "key_frame_count": { + "description": "The number of key frames in the Media Object. This should be set greater than zero when the Media Object contains key frames that serve as a stream access point", + "type": "integer" + } + } +} diff --git a/api/schemas/object.json b/api/schemas/object.json index 643c97c0..6ee07abb 100644 --- a/api/schemas/object.json +++ b/api/schemas/object.json @@ -1,26 +1,32 @@ { "title": "Object", - "description": "Describes a Media Object in the service instance.", - "type": "object", - "required": [ - "id", - "referenced_by_flows" - ], - "properties": { - "id": { - "description": "The Media Object identifier.", - "type": "string" - }, - "referenced_by_flows": { - "description": "List of Flows that reference this Media Object via Flow Segments in this service instance.", - "type": "array", - "items": { - "$ref": "uuid.json" + "allOf": [ + { + "type": "object", + "required": [ + "id", + "referenced_by_flows" + ], + "properties": { + "id": { + "description": "The Media Object identifier.", + "type": "string" + }, + "referenced_by_flows": { + "type": "array", + "description": "List of Flows that reference this Media Object via Flow Segments in this store instance.", + "items": { + "$ref": "uuid.json" + } + }, + "first_referenced_by_flow": { + "description": "The first Flow that had a Flow Segment reference the Media Object in this store instance. This Flow is also present in 'referenced_by_flows' if it is still referenced by the Flow. This property is optional and may in some implementations become unset if the Flow no longer references the media object, e.g. because it was deleted.", + "$ref": "uuid.json" + } } }, - "first_referenced_by_flow": { - "description": "The first Flow that had a Flow Segment reference the Media Object in this service instance. This Flow is also present in 'referenced_by_flows' if it is still referenced by the Flow. This property is optional and may, in some service implementations, become unset if the Flow no longer references the Media Object, e.g. because it was deleted.", - "$ref": "uuid.json" + { + "$ref": "object-core.json" } - } -} \ No newline at end of file + ] +} diff --git a/api/schemas/objects-instances-post.json b/api/schemas/objects-instances-post.json new file mode 100644 index 00000000..74cc4f46 --- /dev/null +++ b/api/schemas/objects-instances-post.json @@ -0,0 +1,39 @@ +{ + "type": "object", + "description": "Register a Media Object instance in the store.", + "title": "Media object registration", + "oneOf": [ + { + "description": "Request the duplication of a Media Object instance to a new Storage Backend, via it's `storage_id`.", + "title": "Controlled instance", + "type": "object", + "required": [ + "storage_id" + ], + "properties": { + "storage_id": { + "description": "Storage backend identifier", + "$ref": "uuid.json" + } + } + }, + { + "description": "Register an uncontrolled Media Object instance via its `url`.", + "title": "Uncontrolled instance", + "type": "object", + "required": [ + "url" + ], + "properties": { + "url": { + "description": "A URL to which a GET request can be made to directly retrieve the contents of the media object. Clients should include credentials if the provide URL is on the same origin as the API endpoint", + "type": "string" + }, + "label": { + "description": "Label identifying this Media Object instance. Service implementations should reject any requests using labels that are already associated with Storage Backends. If the 'label' is not set then this instance can't be filtered for using the 'accept_get_urls' API query parameter.", + "type": "string" + } + } + } + ] +} \ No newline at end of file diff --git a/docs/README.md b/docs/README.md index b27fbd59..61a20e4f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -23,6 +23,7 @@ For more information on how we use application notes, see [here](./appnotes/READ | [0014](./appnotes/0014-referencing-tams-content-in-other-systems.md) | Referencing TAMS content in other systems | | [0015](./appnotes/0015-using-tams-in-opentimelineio.md) | Using TAMS in OpenTimelineIO | | [0017](./appnotes/0017-reuse-of-ids.md) | When to re-use IDs in TAMS and compatible systems | +| [0018](./appnotes/0018-managing-multiple-object-instances.md) | Managing Multiple Object Instances | ## ADRs @@ -67,6 +68,7 @@ For more information on how we use ADRs, see [here](./adr/README.md). | [0031](./adr/0031-flow-image-support.md) | Add new flow type to support still images | | [0032](./adr/0032-specifying-storage-backend.md) | Specifying storage backend when requesting storage allocation | | [0034](./adr/0034-storage-allow-object_ids.md) | Add object_ids option to Flow Storage request | +| [0038](./adr/0038-improved-storage-management.md) | Improved Storage Management | | [0039](./adr/0039-remove-pre-actions.md) | Proposal to remove pre-actions from storage allocation response | \* Note: ADR 0004a was the unintended result of a number clash in the early development of TAMS which wasn't caught before publication diff --git a/docs/adr/0038-improved-storage-management.md b/docs/adr/0038-improved-storage-management.md new file mode 100644 index 00000000..792a1fa5 --- /dev/null +++ b/docs/adr/0038-improved-storage-management.md @@ -0,0 +1,152 @@ +--- +status: "accepted" +--- +# Improved Storage Management + +## Context and Problem Statement + +In [ADR0032](./0032-specifying-storage-backend.md), support was added for advertising multiple storage backends, and selecting one when allocating storage against Flows. +The TAMS specification has always had the ability to advertise multiple URLs for retrieving Media Objects. +But, so far, there has not been direct support for creating and managing duplicates of Media Objects under the control of a TAMS instance. + +An Objects endpoint was added in [ADR0027](./0027-add-objects-api-endpoint.md) that advertises the Flows where a given Object is referenced. +This has begun a transition from thinking about Objects being heavily tied to a Segment. +And a move from thinking about "Segment reuse" to "Object reuse". +Until now, the TAMS specifcation has been unclear on the ownership of `get_urls` being the Segment or Object. +In particular, whether re-use of an Object should result in `get_urls` and changes to them being reflected across all segments using of them. + +This ADR proposes explicitly linking ownership of `get_urls` to Objects, and providing mechanisms for adding and removing controlled and uncontrolled instances of Objects to their `get_urls` list. +This seperation of Objects and Segments does not require breaking changes, but does provide greater clarity in how the specification should be implemented. + +## Considered Options + +* Option 1a: Manage `get_urls` via the Flows endpoints +* Option 1b: Add `get_urls` management to the Objects endpoint +* Option 2a: Manage additional Object storage via Flow storage endpoint +* Option 2b: Manage additional Object storage via a Objects storage endpoint +* Option 2c: Manage additional Object storage AND initial Object storage via a Objects storage endpoint +* Option 3a: Call Object Instance management endpoint `get_urls` +* Option 3b: Call Object Instance management endpoint `instances` +* Option 4a: Duplicaion of Object Instances is managed by the Server +* Option 4b: Duplicaion of Object Instances is managed by the Client + +## Decision Outcome + +Chosen options: + +* Option 1b: Add `get_urls` management to the Objects endpoint +* Option 2b: Manage additional Object storage via a Objects storage endpoint +* Option 3b: Call Object Instance management endpoint `instances` +* Option 4a: Duplicaion of Object Instances is managed by the Server + +These options have been chosen because they provide clearer boundaries between Media Objects and Segments in the data model and its implementation. +They should avoid confusion arrising from changes to one Flow impacting another. +And they minimise un-needed changes to the API and common workflows. +Option 4a also minamises potential new attack vectors. + +### Implementation + +Implemented by + +## Pros and Cons of the Options + +### Option 1a: Manage `get_urls` via the Flows endpoints + +This option would see us add support for in-place editing of `get_urls` and see the edits propagated to other segments making use of the same Media Object. + +* Good, because it somewhat matches existing patterns for updating `get_urls` +* Good, because it would remove a race condition of the delete & re-create pattern with Object garbage-collection in implementations +* Bad, because changes to one Flow's segment may have an impact on other's +* Bad, because it persists a blurring of Segments and Media Objects in the TAMS data model + +### Option 1b: Add `get_urls` management to the Objects endpoint + +This option would see HTTP methods added to/under the `/objects` endpoint to facilitate management of `get_urls`. + +* Good, because it would remove a race condition of the delete & re-create pattern with Object garbage-collection in implementations +* Good, because it provides a clearer boundary between Media Objects and Segments +* Good, because edits are performed on the shared Media Objects, rather than as side affects between Flows +* Neutral, because it requires the replacement of a "delete & re-create" pattern with for-purpose endpoints + +### Option 2a: Manage additional Object storage via Flow storage endpoint + +The existing endpoint used for allocation of storage, which a client will upload media to, is under the `/flows/{flowId}` endpoint at `/flows/{flowId}/storage`. +This is because storage needs to be tied to a specific Flow initially so it can inherit permissions from that Flow, and so the correct MIME type may be obtained and applied to the object on the object storage backend. + +This option would see storage for additional instances of a Media Object be allocated via the existing endpoint. + +* Good, because it makes use of an existing endpoint and workflows +* Good, because it reduces required changes to the API +* Bad, because it persists a blurring of Segments and Media Objects in the TAMS data model +* Bad, because the allocation of storage against one flow, that may then be used by many could be confusing +* Bad, because it poorly communicates the shared management of Objects + * It may result in confusion where a client can edit properties of the Object in one location (e.g. Flow A's Segments), but not another (e.g. Flow B's Segments) + +### Option 2b: Manage additional Object storage via a Objects storage endpoint + +This option would see storage for additional instances of a Media Object be allocated via a new endpoint under the `/objects` endpoint. +Initial allocation would still be performed at `/flows/{flowId}/storage` for the reasons stated above. + +* Good, because it provides a clearer boundary between Media Objects and Segments +* Good, because addition of further Object instances is performed on the shared Media Objects, rather than as side affects between Flows +* Good, because it better communicates the shared management of Objects +* Neutral, because it requires new endpoints on the API +* Neutral, because it results in two endpoints for allocating storage, but for different purposes + +### Option 2c: Manage additional Object storage AND initial Object storage via a Objects storage endpoint + +This would see Option 2b extended such that all storage allocation happens under the `/objects` endpoint. +Allocation of storage via `/flows/{flowId}/storage` would be deprecated/removed. + +* Good, because it provides a clearer boundary between Media Objects and Segments +* Good, because it would provide a single endpoint for storage management +* Neutral, because it would require a new mechanism for conveying the initial Flow to inherit permissions and MIME type from +* Neutral, because it requires new endpoints on the API +* Bad, because it would be a breaking change to a core part of the API +* Bad, because it results in two endpoints for allocating storage for the same purpose +* Bad, because it poorly communicates the shared management of Objects + +### Option 3a: Call Object Instance management endpoint `get_urls` + +Where Option 1b is chosen, we would need to decide on a name for the new Objects endpoint. +As the purpose of this endpoint is to add/remove instances in the `get_urls` list, one option is to title the endpoint `get_urls`. + +* Good, because it matches the name of the property it affects +* Bad, because some instances may map to multiple URLs + * e.g. pre-signed/non-pre-signed variants of URLs +* Bad, because URLs may be generated by instances when retrieved + * e.g. pre-signed URLs + * This means the property being PUT/POSTed to the new endpoint doesn't directly match those in the list + +### Option 3b: Call Object Instance management endpoint `instances` + +Another option is to title the endpoint `instances`. + +* Good, because it could avoid confusion over the one-to-many relationship of instances and URLs +* Good, becuase it more clearly conveys that the client manages the instance, but the service manages the URL +* Neutral, because it doesn't match the name of the property it affects + +### Option 4a: Duplicaion of Object Instances is managed by the Server + +This option would see client's request duplication of an Object to a new Storage Backend, and for that duplication to be carried out by the Server. + +* Good, because it requires minimal HTTP requests +* Good, because it ensures the copy is identical to the originating Instance +* Good, because it allows use of efficient copy mechanisms on storage backends +* Neutral, because it requires the server to carry out a task beyond processing metadata + * Given many object stores support duplication via a single request, it is likely to be more simple and efficient to implement and process than creating multiple pre-signed URLs, verifying Objects have been allocated on a given Storage Backend when registering, etc. +* Neutral, because it doesn't follow existing patterns for Object upload + * Though those patterns are for a subtly different purpose + +### Option 4b: Duplicaion of Object Instances is managed by the Client + +This option would see a similar pattern to the existing one for initial creation of objects used for duplication. +Clients would request storage allocation, upload the Media Object to that new location, and then register its availability with the server. + +* Good, because it follows existing patterns +* Neutral, because it only requires the server to carry out metadata management + * Though this may require more a complex implementation in practice +* Bad, because it requires more HTTP requests that Option 4a +* Bad, because it presents a potential attack vector + * A malicious actor could upload a maliciously crafted Object which doesn't match the original for it to be advertised against existing segments +* Bad, because it prevents the use of efficient object duplication methods present on some object stores diff --git a/docs/appnotes/0018-managing-multiple-object-instances.md b/docs/appnotes/0018-managing-multiple-object-instances.md new file mode 100644 index 00000000..b7a50850 --- /dev/null +++ b/docs/appnotes/0018-managing-multiple-object-instances.md @@ -0,0 +1,146 @@ +# 0018: Managing Multiple Object Instances + +## Abstract + +[ADR0038](../adr/0038-improved-storage-management.md) added the ability to create multiple managed copies of the same Media Object in the same TAMS instance. +This application note describes how a client may create, reference, duplicate, and delete instances of a Media Object. +It also describes potential security considerations for deployments. + +## Managing Multiple Object Instances + +### Initial Object Creation + +When a Media Object is initially created, it must be allocated storage against a specific Flow. +This is so that the Media Object may inherit permissions and its MIME Type from the Flow. + +A request is made to [`/flows/{flowId}/storage`](https://bbc.github.io/tams/7.0/index.html#/operations/POST_flows-flowId-storage) with the `limit` property set to the number of Media Object storage locations required. +If a specific Storage Backend is required, or if the service instance does not provide a default, a `storage_id` may also be specified. +Available Storage Backends, and defaults, are advertised at the [`/service/storage-backends`](https://bbc.github.io/tams/7.0/index.html#/operations/GET_storage-backends) endpoint. + +Example POST body to `/flows/{flowId}/storage`: + +```json +{ + "limit": 1, + "storage_id": "60af2ab4-e8a5-4c65-a09b-d35983680315" +} +``` + +Example response: + +```json +{ + "media_objects": [ + { + "object_id": "tams-e2b89b02-21e7-5f9d-aa2d-db38b01453c9/846023d3-612d-5014-bc47-88f6eb2d04bb", + "put_url": { + "url": "https://example.store.com/tams-e2b89b02-21e7-5f9d-aa2d-db38b01453c9/846023d3-612d-5014-bc47-88f6eb2d04bb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0&X-Amz-Date=20230316T120329Z&X-Amz-Expires=300&X-Amz-SignedHeaders=content-type%3Bhost&X-Amz-Signature=0", + "content-type": "video/mp2t" + } + } + ] +} +``` + +There is no requirement for a client to use all Media Objects they request. +This allows a client to request allocation of Objects in bulk, improving efficiency. + +In the example above, a client would PUT the Media Object's file to the `put_url` for one of the Media Object's in the list, with the `content-type` on the request set to the specified value. + +Once the media Object has been uploaded, it should be registered on the Flow's timeline via a Segment. +The appropriate Object ID from the requests above, and the Timerange it covers should be registered via a POST request to the [`/flows/{flowId}/segments`](https://bbc.github.io/tams/7.0/index.html#/operations/POST_flows-flowId-segments) endpoint. + +Example POST body to `/flows/{flowId}/segments`: + +```json +{ + "object_id": "tams-e2b89b02-21e7-5f9d-aa2d-db38b01453c9/846023d3-612d-5014-bc47-88f6eb2d04bb", + "timerange": "[20:0_21:0)" +} +``` + +Note that the first time a Media Object is registered against a Flow Segment, the Flow ID of the Flow Segment MUST match the one the storage was allocated against. +i.e. The `flowId` MUST match in `/flows/{flowId}/storage` and `/flows/{flowId}/segments`. +This is to enable the correct inheritance of permissions and content-type. + +The Flow Segment making use of the Media Object will now be available for reading at [`/flows/{flowId}/segments`](https://bbc.github.io/tams/7.0/index.html#/operations/GET_flows-flowId-segments). + +### Referencing an Existing Object + +After initial registration with a Flow, a Media Object may be referenced by other Flows. +The client adding a reference to the existing Object MUST have read permissions on a Flow which already references the Object, and write permissions on the destination Flow. +For example, a client with read access to a Flow with ID `{flowId}` and write permissions on a Flow with ID `{flowId2}` may re-use Objects from `{flowId}` in `{flowId2}`. + +Example POST body to `/flows/{flowId2}/segments`: + +```json +{ + "object_id": "tams-e2b89b02-21e7-5f9d-aa2d-db38b01453c9/846023d3-612d-5014-bc47-88f6eb2d04bb", + "timerange": "[165:0_166:0)", + "ts_offset": "145:0" +} +``` + +Note the `ts_offset` which describes the difference between the timing internal to the media, and the Flow timeline. +The Segment above in `flowId` used the default `ts_offset` of `0:0`. +As the Segment it was used in in `flowId` started at `20:0`, but in `flowId2` it is placed at `165:0`, we must set a `ts_offset` of `145:0`. +For more information on `ts_offset`, see [here](https://bbc.github.io/tams/7.0/index.html#/operations/GET_flows-flowId-segments). + +### Duplicating an Existing Object + +There are many reasons a client may want to create a duplicate instance of a Media Object. +To create a backup. +To create copies that are physically or logically closer to other systems. +To move content to archive storage. +All while being able to refer to this collection of duplicates with the same Media Object ID in Flow Segments. + +To initiate the duplication of a Media Object to a new Storage Backend, clients POST the required `storage_id` to `/objects/{objectId}/instances`. +The TAMS instance will allocate storage, and populate it from an existing copy of the Media Object. +It will then begin advertising the copy in `get_urls` lists. + +Example POST body to `/objects/{objectId}/instances`: + +```json +{ + "storage_id": "323367fd-21bb-4f2e-ad38-faf048c4ccfc" +} +``` + +### Deleting an Object Instance + +Specific instances of a Media Object can be deleted by a DELETE request to `/objects/{objectId}/get_urls` with the relevant `storage_id` in the query string. + +Example DELETE request: + +```text +http://tams.example.com/objects/846023d3-612d-5014-bc47-88f6eb2d04bb/get_urls?storage_id=323367fd-21bb-4f2e-ad38-faf048c4ccfc +``` + +Once deleted, this instance will no longer be advertised in `get_urls` on the `/objects/{objectId}` endpoint or on the `/flows/{flowId}/segments` for Flow Segments which use the Media Object. + +## Deployment Considerations + +### Security + +The approach to supporting multiple Media Object instances in TAMS enables efficient re-use of media, changing ownership through the lifecycle of the media, and self-service re-location of media to meet the purposes of individual users +With this increased flexibility comes the potential for new attack vectors for malicious actors. + +Consider a Flow A with its Media Objects. +A malicious actor has read access to Flow A, but not write access. +The actor creates a new Flow, Flow B, and re-uses Media Objects from Flow A in Flow B. +The write permissions they have on Flow B allows them to add new instances to the Objects. +The malicious actor creates new malicious uncontrolled instances with content different to the existing instances and adds them to the Objects. +Users of Flow A are now presented with the malicious instances, in addition to the original ones, on Flow A's Segments. + +This attack vector can be mitigated in multiple ways. + +The TAMS instance's authorisation logic may be configured to only allow those with specific permissions, such as write access to the original Flow A, to add new uncontrolled instances. +This mitigates the attack vector described, but places more of a burden on the original owner to manage creation/deletion of duplicate uncontrolled instances. + +The TAMS instance may be configured to only allow managed duplication. +This guarantees all instances will be identical and removes the ability for malicious instances to be uploaded by the actor. + +Clients may also wish to exercise extra caution when using uncontrolled instances. +They may wish to favour controlled URLs or check that the URL is on a trusted domain, for example. + +An organisation may, of course, also assess and accept the risk associated with allowing user-managed creation of uncontrolled instances of Media Objects.