-
Notifications
You must be signed in to change notification settings - Fork 23
feat: enhance wait_for_ready() on policies. #847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
80700f6 to
24acf6d
Compare
24acf6d to
ddc603e
Compare
c3df06a to
4f11377
Compare
| if not kuadrant: | ||
| return None | ||
|
|
||
| route = OpenshiftRoute.create_instance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using OpenshiftRoute for every gateway will make the testsuite incompatible with Kind/Kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. I should use the exposer to expose that.
e304df5 to
5c2b364
Compare
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
9304fb3 to
2325864
Compare
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
|
/make smoke |
|
Test run completed ( Short Test SummaryFull Output |
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
|
/make smoke |
|
Test run completed ( Short Test SummaryFull Output |
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
Signed-off-by: Alexander Cristurean <acristur@redhat.com>
azgabur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has a lot of strange refactors I do not understand why are needed to achieve the PR's goal.
The suggested way to extend Policy class to add the wait for kuadrant_config update seems bit hack-y to me. The metrics are exposed in Gateway, so I recommend moving the code which handles the update check there. The Policies which are being updated/applied could have reference to their gateway stored as a field (similar how they store "cluster") and use it inside commit (to get current kuadrant_config generation number) and wait_for_* methods (to check the number was incremented).
I can see the problem in which we do commit fixture. In some situations (for example testsuite/tests/multicluster/coredns/two_clusters/conftest.py) there are multiple policies commited before their wait_for_ready methods are called. Meaning you cant expect that .commit() and .wait_for_ready() methods will be called right after each other so the generation field could be incremented by something else then the current policy waiting.
I do not know how the kuadrant_config structure looks like, but if it contains field about generation of specific policy being applied, you could use generation field of the policy similar how the current wait_for_ready works in Policy to wait until the gateway has the field in the metric.
In any case, this check seems like something that could be done on the Operator side and used to set the Ready/Enforced status more correctly, I would suggest that as a issue and have the implementation of such check in testsuite only if they reject implementing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why rename name -> _name ? And create duplicate name property?
| """Returns TLS cert bound to this Gateway, if the Gateway does not use TLS, returns None""" | ||
|
|
||
| @abstractmethod | ||
| def name(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already service_name
| class PlanPolicy(Policy): | ||
| """PlanPolicy object, used for applying plan-based policies to a Gateway/HTTPRoute""" | ||
|
|
||
| def __init__(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated refactor as well as in testsuite/kuadrant/extensions/telemetry_policy.py
| def gateway(request, authorino, cluster, blame, module_label, testconfig, keycloak): | ||
| """Deploys Envoy with additional JWT plain identity test setup""" | ||
| envoy = JwtEnvoy( | ||
| envoy = JwtEnvoy( # pylint: disable=abstract-class-instantiated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JwtEnvoy is not abstract class, why is this disable here?
| self._hostname = hostname | ||
| self.verify = verify | ||
| self.ip_getter = ip_getter | ||
| self.verify = verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wasn't matching the parameters order.
|
|
||
| @pytest.fixture(scope="module", autouse=True) | ||
| def commit(request, commit, authorization2): # pylint: disable=unused-argument | ||
| def commit(request, authorization2): # pylint: disable=unused-argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove parent commit fixture here?
| def gateway(request, authorino, cluster, blame, label, testconfig) -> Envoy: | ||
| """Deploys Envoy that wires up the Backend behind the reverse-proxy and Authorino instance""" | ||
| gw = Envoy( | ||
| gw = Envoy( # pylint: disable=abstract-class-instantiated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Envoy is not abstract class, why is this disable here?
|
|
||
| @pytest.fixture(scope="module", autouse=True) | ||
| def commit(request, commit, wristband_authorization): # pylint: disable=unused-argument | ||
| def commit(request, wristband_authorization): # pylint: disable=unused-argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove parent commit fixture here?
| metrics_service.commit() | ||
| metrics_service.wait_for_ready() | ||
|
|
||
| return exposer.expose_hostname(blame("metrics"), metrics_service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You dont need to create whole class MetricsServiceGateway just to run exposer on it. In the worst case this will create loadbalanced service targeting loadbalanced service doubling infrastructure strain which now mostly causes the sporadic failures.
I suggest using the Gateway's loadbalanced service which already exists to expose the port you need (15020) for the metrics. See https://istio.io/latest/docs/tasks/observability/metrics/secure-metrics/#secure-metrics-for-gateways
This will avoid whole exposing hassle.
| hostname, | ||
| backend, | ||
| module_label, | ||
| gateway_metrics_service, # pylint: disable=unused-argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add it, if its not used
Description
This PR introduces EnvoyWaitMixin which leverages the kuadrant_configs metric exposed by the WASM shim. This metric increments each time the WASM
configuration changes, providing a reliable indicator of actual data plane readiness.
Two-phase wait mechanism:
Changes
Closes: #846