Update plotter to use capabilities in template list and avoid api #6

revit13 · 2021-08-12T10:19:42Z

Signed-off-by: Revital Sur eres@il.ibm.com

Signed-off-by: Revital Sur <eres@il.ibm.com>

shlomitk1 · 2021-08-12T12:16:24Z

manager/testdata/new-blueprints/scenario3/plotter.yaml

        name: ghcr.io/mesh-for-data/arrow-flight-config-module:0.1.0
+      capabilities:
+      - capability: read
+        scope: cluster


scope: workload

I copied it from arrow-flight-config-module.yaml. I will change it there also. Thanks

@revit13 Scenario 3 describes a multi-tenant flight server. The scope should be cluster.

@simanadler Scenario 3 describes a config module for a multi-tenant server. Its scope is workload as it configures the server per workload. We do not deploy the server itself in the plotter/blueprint so the cluster scope is irrelevant.

@shlomitk1 if we have pre-installed flight service in running one cluster and we use it from different clusters in the same application (read datasets that reside on different clusters, I am not sure that this is a good example) then in each cluster we will need to deploy the module which configs the flight server to read the dataset. I am trying to see if such an example explains the cluster scope...

shlomitk1 · 2021-08-12T12:18:56Z

manager/testdata/new-blueprints/scenario4/plotter.yaml

        name: ghcr.io/mesh-for-data/m4d-arrow-flight-transform-conf:0.1.0
+      capabilities:
+      - capability: transform
+        scope: asset


scope: workload
I think the scope should be per template (the entire construct of a service + plugins). Plugins are always per service.

I copied it from transform-config-module.yaml spec. it should be changed there too? IIUC the enablement of plugin is per service but applying them is per asset. @simanadler

Applying plugins is beyond the scope. It is done internally by the service.

@shlomitk1 From what I understood applying the plugin is done by the blueprint-controller.

Blueprint controller deploys the plugins. If 10 different assets need to have a column redacted, the redact plugin will be deployed only once and called 10 times by the service. A config module will contain the specifics per an asset (action to perform, arguments,...)

@shlomitk1 - regarding - A config module will contain the specifics per an asset (action to perform, arguments,...) - what should be the config module in this case? is it what we send to arrow-flight-config-module? Thanks

Can we assume the following what scope means:

Scope of service/deployment module:

Asset - deployment per asset
Cluster - deployment per cluster
Workload- deployment per workload

Scope of config module depends on the scope of the service it configures :if the service is per asset than the config module will be per asset otherwise the config module will be per workload for each cluster. (assuming we do not do cross workload deployment of config modules)

plugin: per service.

Can we assume the following what scope means:

Scope of service/deployment module:

Asset - deployment per asset
Cluster - deployment per cluster
Workload- deployment per workload

Yes, agreed

Scope of config module depends on the scope of the service it configures :if the service is per asset than the config module will be per asset otherwise the config module will be per workload for each cluster. (assuming we do not do cross workload deployment of config modules)

So you are saying that a config module will never be used across workloads. I think it makes sense, certainly for now. We may want to revisit it in the future.

plugin: per service.

Makes sense. @roee88 @cdoron Please confirm

shlomitk1 · 2021-08-12T12:21:59Z

manager/testdata/new-blueprints/scenario4/plotter.yaml

        object: inventory.parq
    dataformat: parquet
+  - assetId: "m4d-notebook-sample/paysim-step2" # the sink of step 2, consumed by step 1
+    cluster: thegreendragon


cluster is per step or flow, not per asset.

I followed what was there before... @simanadler

@revit13 @shlomitk1 Originally there was no cluster in the assets. @revit13 I believe you suggested adding it as an indication of the cluster in which the data resides. In the steps the cluster is used to determine where the module should run. Same term different meanings. Maybe in assets we should call it storageCluster?

@simanadler For blueprints it is crucial to know the deployment cluster. But why would we need to know where an asset is located? It can be on some server outside the deployed clusters, for all we know.

I agree with @shlomitk1 here. The storage cluster is a nice to have information for the plotter for debugging and may not be in any cluster (Or may be cross regional (e.g. COS)). This information is crucial for the optimizer etc which should be done at this stage.

@shlomitk1 and @froesef I agree with you

OK will remove the cluster from asset list

Signed-off-by: Revital Sur <eres@il.ibm.com>

simanadler

See my comments

simanadler · 2021-08-15T08:45:24Z

manager/testdata/new-blueprints/scenario2/plotter.yaml

          parameters:
            source:
              assetId: "m4d-notebook-sample/paysim"
-            api:


Not clear why you are removing the api from the plotter. We removed it from the blueprint because it's not necessary for deployment. However, the plotter should contain everything needed to understand the flow of data between the components. We need to include via what (data source, api, ...) the data may be accessed/written

@simanadler can you please elaborate why it's not needed for the deployment? The deployment needs to know the protocol of the service that e.g. a module exposes. (Capabilities in a module are an array and thus multiple different interfaces could be supported by one module)

The deployment needs to know the chart image and the values to use. This information is passed via module arguments.

Agreed. A chart and it's values are in the end needed for the deployment. My question is where are these values (that say it's an arrow-flight service) coming from? If they are not defined in the plotter how should they be put into the helm chart?

These values have been defined in the asset list as a new type of connection. According to this approach, arrow-flight service does not declare how it can be accessed but rather any module that needs to access the arrow-flight service defines what it needs: "data source: format = arrow, protocol=arrow-flight, endpoint=arrow-flight-service:80, assetID=user-asset".

simanadler · 2021-08-15T08:50:52Z

manager/testdata/new-blueprints/scenario4/arrow-flight-config-module.yaml

  capabilities:
    - capability: read
-      scope: cluster
+      scope: workload


@revit13 This config module is configuring a cluster level service. That's why it's scope was cluster. It does of course pass data to that service that is for a particular workload. @froesef @roee88 Which do you think it should be?

After rethinking that I delete this change... IIUC the config module is deployed per cluster.

As I understand it, cluster scope means that something is deployed once per cluster, i.e. cross-workloads. This is not our case.

@shlomitk1 It's not now. The question is should it be? If we have 15k workloads do we really need 15k of these config modules?

@simanadler We need to configure the service, and the configuration differs from one workload to another. It is not a "global" service that can be accessed by the workload.

simanadler · 2021-08-15T08:54:51Z

manager/testdata/new-blueprints/scenario4/plotter.yaml

        object: inventory.parq
    dataformat: parquet
+  - assetId: "m4d-notebook-sample/paysim-step2" # the sink of step 2, consumed by step 1
+    cluster: thegreendragon


@revit13 @shlomitk1 Originally there was no cluster in the assets. @revit13 I believe you suggested adding it as an indication of the cluster in which the data resides. In the steps the cluster is used to determine where the module should run. Same term different meanings. Maybe in assets we should call it storageCluster?

simanadler · 2021-08-15T08:58:05Z

manager/testdata/new-blueprints/scenario4/plotter.yaml

+    cluster: thegreendragon
+    connection: 
+      arrow-flight:
+        endpoint:  app1-ns1-arrow-transform.m4d-blueprints


@revit13 Do we know what the endpoint will be prior to deploying the blueprints? The endpoint will differ per cluster I assume. I don't see an indication of that in the example.

We know the endpoint prior to deploying the blueprint. It is based on 1) cluster 2) release name 3) module specifics defined in FybrikModule.

Signed-off-by: Revital Sur <eres@il.ibm.com>

froesef · 2021-08-16T16:03:45Z

manager/testdata/new-blueprints/scenario2/plotter.yaml

      kind: M4DModule
      chart:
        name: ghcr.io/mesh-for-data/m4d-arrow-flight-transform-conf:0.1.0
+      capabilities:


Why do we need to add the capabilities here? Capabilities are used in the module decision process which is finished at this stage because this is the output of it.

We need to know the scope in order to determine the number of instances to deploy. We need to know the context (this module is used for read although it can also write) in order to construct the arguments: source for the read module, sink for the write, etc.

If we're using this to choose which capability of a module to use can this field be a struct and not an array?

Yes, it should be a struct.

if a chart has multiple capabilities and different capabilities are used in different steps then the chart will appear several times each time with the capability that the step needs, is that correct?

froesef · 2021-08-16T16:04:40Z

manager/testdata/new-blueprints/scenario2/plotter.yaml

          parameters:
            source:
              assetId: "m4d-notebook-sample/inventory"
-            api:


Why is the api field removed? Where in this current plotter is defined that the read service should offer the given asset using the arrow-flight protocol with a given asset id?

The read access here is done by the workload. Nothing is done in the deployment.
In a different scenario where another module reads the data from the read service, the module gets this information in its params (e.g. source).

froesef · 2021-08-16T16:17:53Z

manager/testdata/new-blueprints/scenario4/plotter.yaml

+      arrow-flight:
+        endpoint:  app1-ns1-arrow-transform.m4d-blueprints
+        assetId: "m4d-notebook-sample/paysim" # always the same as the assetId known to the user (assetId or advertisedAssetId)
+  - assetId: "m4d-notebook-sample/paysim-step1" # used by workload to read the data


I think having all the intermediate "virtual" assets in here is misleading. These are not assets that are specified by the user. And also in order to know what the steps are really doing one have to look up the assets in the top of the plotter. I specifically wanted to define everything within the step because it improves readability from the step point of view.

Signed-off-by: Revital Sur <eres@il.ibm.com>

Signed-off-by: Florian Froese <ffr@zurich.ibm.com>

Signed-off-by: Revital Sur <eres@il.ibm.com>

froesef · 2021-08-18T06:46:54Z

manager/testdata/new-blueprints/scenario2/blueprint.yaml

              bucket: srcbucket
              object: paysim.parq
          dataformat: parquet
+          actions:


Can you please add a comment how the read-module should know that the ghcr.io/mesh-for-data/m4d-arrow-flight-transform-conf:0.1.0 chart was chosen to configure this action as a plugin?

Signed-off-by: Revital Sur <eres@il.ibm.com>

Update tests

cacf23c

Signed-off-by: Revital Sur <eres@il.ibm.com>

revit13 marked this pull request as draft August 12, 2021 10:19

More changes

d95938c

Signed-off-by: Revital Sur <eres@il.ibm.com>

revit13 changed the title ~~Update tests~~ Update plotter to use capabilities in template list and avoid api Aug 12, 2021

revit13 added 2 commits August 12, 2021 14:02

Changes to senario 6

fca8cce

Signed-off-by: Revital Sur <eres@il.ibm.com>

Additional changes

b7b0960

Signed-off-by: Revital Sur <eres@il.ibm.com>

shlomitk1 reviewed Aug 12, 2021

View reviewed changes

revit13 added 2 commits August 12, 2021 20:56

Change scenario 4 based on Shlomit's comment

7ef085f

Signed-off-by: Revital Sur <eres@il.ibm.com>

Fix plotter in secnario5 according to Shlomit's comment

40c6d0a

Signed-off-by: Revital Sur <eres@il.ibm.com>

simanadler reviewed Aug 15, 2021

View reviewed changes

Retrive cluster scope in senario 4

fd22147

Signed-off-by: Revital Sur <eres@il.ibm.com>

froesef reviewed Aug 16, 2021

View reviewed changes

revit13 and others added 4 commits August 17, 2021 10:22

Remove cluster from assetId list

59db149

Signed-off-by: Revital Sur <eres@il.ibm.com>

Add status to blueprint and propose to have only one api per module

1cfccaa

Signed-off-by: Florian Froese <ffr@zurich.ibm.com>

Changes to plotter and blueprint when deploying transform plugin.

ced0519

Signed-off-by: Revital Sur <eres@il.ibm.com>

Remove virtual asserts from plotter.assets list

2a6d9e3

Signed-off-by: Revital Sur <eres@il.ibm.com>

froesef reviewed Aug 18, 2021

View reviewed changes

Add endpoint details to plotter.

44e06d0

Signed-off-by: Revital Sur <eres@il.ibm.com>

Update plotter to use capabilities in template list and avoid api #6

Are you sure you want to change the base?

Update plotter to use capabilities in template list and avoid api #6

Uh oh!

Conversation

revit13 commented Aug 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simanadler Aug 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

revit13 Aug 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shlomitk1 Aug 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

revit13 Aug 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

revit13 Aug 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simanadler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simanadler Aug 15, 2021 •

edited

Loading

revit13 Aug 15, 2021 •

edited

Loading

shlomitk1 Aug 12, 2021 •

edited

Loading

revit13 Aug 15, 2021 •

edited

Loading

revit13 Aug 16, 2021 •

edited

Loading