ADR-0003: TrustyAI Service Deployment using Operator#5
ADR-0003: TrustyAI Service Deployment using Operator#5ruivieira wants to merge 10 commits intotrustyai-explainability:mainfrom
Conversation
|
cc: @anishasthana @Jooho @etirelli Your feedback would be most valuable, thank you! |
adr/ADR-0003-trustyai-service-deployment-using-operator-pattern.md
Outdated
Show resolved
Hide resolved
|
|
||
| If such configuration is not provided, the operator will use the default configuration. | ||
|
|
||
| ### Route |
There was a problem hiding this comment.
Copying a comment from the google doc -- we should think through the servicemesh integration (if any) here too.
|
|
||
| ### Route | ||
|
|
||
| If deployed on OpenShift, the Operator will also create a `Route` object to expose the TrustyAI Service to external clients. The `Route` object will have the following configuration: |
There was a problem hiding this comment.
Does the service have auth enabled by default? If not, we will need to think about that too since otherwise TrustyAI data would be available to the wider internet by default
There was a problem hiding this comment.
@anishasthana Good point, thanks! (no it doesn't)
adr/ADR-0003-trustyai-service-deployment-using-operator-pattern.md
Outdated
Show resolved
Hide resolved
elmiko
left a comment
There was a problem hiding this comment.
nice design doc. one thing i like to recommend when designing a new operator is considering what events you will emit as well. conditions, metrics, and logs are good to surface issues but also consider adding events to your operator to help understand what is doing.
i recommend this article as a good primer about some of the differences.
|
|
||
| ## Proposal | ||
|
|
||
| We propose to use a stand-alone TrustyAI Kubernetes Operator which would create and manage the required Deployment, Service, ConfigMap, Route, and ServiceMonitor resources based on a simple Custom Resource while keeping the state consistent with the desired one [^1]. |
|
|
||
| * `replicas` is an optional field that specifies the number of replicas of the TrustyAI service that you want to run. If not provided, the default is one replica. | ||
| * `storage` is a mandatory field that specifies the storage details. It has two nested fields: | ||
| * `format` - the storage format, (example: a Persistent Volume Claim (PVC)). |
There was a problem hiding this comment.
Is there a specific AccessMode PVC needed? (RWX,RWO)
There was a problem hiding this comment.
- Is it using default StorageClass?
- So default StorageClass will be a prerequisite?
- If there is no default storageClass, what will happen?
There was a problem hiding this comment.
@Jooho Good point, I'll add this info.
RWO is what the manifests have been specifying so far, so I think we could keep with that.
Regarding the StorageClass, the initial implementation disables dynamic provisioning and binds to already existing PVs.
Along the lines of
spec:
storage:
pv: "mypv"
size: 1Gi
adr/ADR-0003-trustyai-service-deployment-using-operator-pattern.md
Outdated
Show resolved
Hide resolved
| ``` | ||
|
|
||
|
|
||
| Note that TrustyAI isn't currently implementing HTTPS endpoints, so the `tls` field will be set to `null` for now. Once HTTPS is implemented, the `tls` field will be updated to include the TLS configuration. |
There was a problem hiding this comment.
We can use Edge TLS. Is there a reason to use reencrypt or passthough TLS?
|
|
||
| ### ModelMesh Serving Integration | ||
|
|
||
| The operator also ensures the correct configuration of the ModelMesh Serving component. Once the TrustyAI Service is deployed and reachable, the operator will patch the ModelMesh Serving configuration to include a custom payload processor and it will be configured to point to the consumer endpoint of the deployed TrustyAI Service. |
There was a problem hiding this comment.
I wonder if this is really OK to do.
When installing ModelMesh via operator, this patch may be rolled back, no?
|
@anishasthana @danielezonca Related to the question of custom images, a new section was added on how to provide custom service images. |
This is a proposal for the deployment of the TrustyAI service using an Operator (ADR-0003).
Some questions (open for discussion) are: