Skip to content

Conversation

@mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Feb 5, 2026

Updates NetworkPolicy to reflect changes made in #447

Summary by CodeRabbit

  • Security
    • Allow ingress to operator metrics endpoints: ports 8443 (operator and controllers) and 8442 (controllers) in their respective namespaces.
    • Add default-deny network policies (block all ingress and egress) in both operator and controller namespaces.
    • Add explicit egress-allow policies for operator and controller pods, splitting egress permissions across the two namespaces.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 5, 2026
@openshift-ci-robot
Copy link

@mdbooth: This pull request explicitly references no jira issue.

Details

In response to this:

Updates NetworkPolicy to reflect changes made in #447

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Adds and updates Kubernetes NetworkPolicy manifests: two ingress policies for metrics access (including ports 8443 and 8442 and podSelector adjustments), two default-deny policies (ingress+egress) in both namespaces, and two egress-allow policies split across operator and controller namespaces.

Changes

Cohort / File(s) Summary
Metrics Ingress Policies
manifests/0000_30_cluster-api_14_allow-ingress-to-metrics-operators.yaml
Adds a second NetworkPolicy document and refines the first: explicit metadata/annotations for the operator namespace policy (port 8443, podSelector k8s-app=capi-operator) and a second policy in openshift-cluster-api (podSelector k8s-app=capi-controllers) permitting TCP ports 8443 and 8442.
Default-Deny Policies
manifests/0000_30_cluster-api_17_default-deny.yaml
Adds default-deny NetworkPolicy resources to both openshift-cluster-api-operator and openshift-cluster-api namespaces with empty pod selectors and policyTypes [Ingress, Egress]; updates comments/labels to reference CAPI operator and controllers.
Egress-Allow Split
manifests/0000_30_cluster-api_16_allow-egress-operators.yaml
Splits prior egress allowance into two NetworkPolicy resources: one for openshift-cluster-api-operator targeting k8s-app=capi-operator and one for openshift-cluster-api targeting k8s-app=capi-controllers, both permitting egress (empty ingress rule retained) and policyTypes [Egress].

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled through YAML, lines so neat,

Opened ports for metrics, kept defenses complete.
Two namespaces guarded, two policies strong,
Hop, hop—secure the network all day long! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title references fixing NetworkPolicy for 'openshift-capi-operator namespace', but the changes affect three files with network policies in both the openshift-cluster-api-operator and openshift-cluster-api namespaces, adding ingress rules, egress rules, and default-deny policies across both namespaces. Revise the title to accurately reflect the full scope, such as 'Fix NetworkPolicy for openshift-cluster-api and openshift-cluster-api-operator namespaces' or 'Add NetworkPolicy rules for operator and controller ingress/egress'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
manifests/0000_30_cluster-api_17_default-deny.yaml (1)

50-53: ⚠️ Potential issue | 🟡 Minor

Comment references incorrect label value for this namespace.

The comment mentions k8s-app=cluster-capi-operator, but the corresponding allow-ingress policy for the openshift-cluster-api namespace uses k8s-app: capi-controllers. This is a different label entirely from what the comment states.

📝 Suggested fix
   # Exclude CAPI pods that need network access: This is done by the other policies
   # - control-plane: CAPI controller manager pods (capg, capi, capa, capz, etc.)
-  # - k8s-app=cluster-capi-operator: Main cluster-capi-operator pod
+  # - k8s-app=capi-controllers: CAPI controller pods
   # This ensures these pods can communicate while still denying traffic to other pods
🤖 Fix all issues with AI agents
In `@manifests/0000_30_cluster-api_17_default-deny.yaml`:
- Around line 28-31: Update the explanatory comment to use the correct label
value used by the allow-ingress policy: replace "k8s-app=cluster-capi-operator"
with "k8s-app=capi-operator" so it matches the label in
0000_30_cluster-api_14_allow-ingress-to-metrics-operators.yaml; ensure the
comment and the allow-ingress policy both reference the exact string "k8s-app:
capi-operator" to avoid future confusion.
🧹 Nitpick comments (1)
manifests/0000_30_cluster-api_14_allow-ingress-to-metrics-operators.yaml (1)

27-33: Redundant podSelector in from clause.

The namespaceSelector: {} already matches all pods in all namespaces, making the separate podSelector: {} entry redundant. When listed as separate items in from, they're ORed together—but the first rule already covers everything.

If the intent is to allow from any pod in any namespace, only namespaceSelector: {} is needed. Alternatively, if you want a single combined rule, both selectors should be in the same list item.

♻️ Simplified version (either option works)

Option 1: Just namespaceSelector (allows all pods from all namespaces)

   ingress:
   - from:
     - namespaceSelector: {}
-    - podSelector: {}
     ports:
     - port: 8443
       protocol: TCP

Option 2: Combined selector (explicit AND, same effect)

   ingress:
   - from:
     - namespaceSelector: {}
+      podSelector: {}
-    - podSelector: {}
     ports:
     - port: 8443
       protocol: TCP

Comment on lines 28 to 31
# Exclude CAPI pods that need network access: This is done by the other policies
# - control-plane: CAPI controller manager pods (capg, capi, capa, capz, etc.)
# - k8s-app=cluster-capi-operator: Main cluster-capi-operator pod
# This ensures these pods can communicate while still denying traffic to other pods
Copy link

@coderabbitai coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment references incorrect label value.

The comment mentions k8s-app=cluster-capi-operator, but the corresponding allow-ingress policy in 0000_30_cluster-api_14_allow-ingress-to-metrics-operators.yaml uses k8s-app: capi-operator (without "cluster-" prefix). This inconsistency could confuse future maintainers trying to understand how the policies work together.

📝 Suggested fix
   # Exclude CAPI pods that need network access: This is done by the other policies
   # - control-plane: CAPI controller manager pods (capg, capi, capa, capz, etc.)
-  # - k8s-app=cluster-capi-operator: Main cluster-capi-operator pod
+  # - k8s-app=capi-operator: Main cluster-capi-operator pod
   # This ensures these pods can communicate while still denying traffic to other pods
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Exclude CAPI pods that need network access: This is done by the other policies
# - control-plane: CAPI controller manager pods (capg, capi, capa, capz, etc.)
# - k8s-app=cluster-capi-operator: Main cluster-capi-operator pod
# This ensures these pods can communicate while still denying traffic to other pods
# Exclude CAPI pods that need network access: This is done by the other policies
# - control-plane: CAPI controller manager pods (capg, capi, capa, capz, etc.)
# - k8s-app=capi-operator: Main cluster-capi-operator pod
# This ensures these pods can communicate while still denying traffic to other pods
🤖 Prompt for AI Agents
In `@manifests/0000_30_cluster-api_17_default-deny.yaml` around lines 28 - 31,
Update the explanatory comment to use the correct label value used by the
allow-ingress policy: replace "k8s-app=cluster-capi-operator" with
"k8s-app=capi-operator" so it matches the label in
0000_30_cluster-api_14_allow-ingress-to-metrics-operators.yaml; ensure the
comment and the allow-ingress policy both reference the exact string "k8s-app:
capi-operator" to avoid future confusion.

Copy link
Contributor

@theobarberbany theobarberbany Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdbooth ^ worth keeping consistent :D

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@theobarberbany
Copy link
Contributor

/lgtm

We'll have to remember about the default-deny network policy if we're adding stuff to the operator namespace.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2026
@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Thanks!

@damdo
Copy link
Member

damdo commented Feb 5, 2026

/assign @miyadav

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2026
@damdo
Copy link
Member

damdo commented Feb 5, 2026

/hold

@mdbooth I'm seeing these error in the AWS TP job

event [namespace/openshift-cluster-api-operator node/ip-10-0-18-135.us-west-2.compute.internal pod/capi-operator-57795fdd4-pzfv2 hmsg/37b9101a12 - Back-off restarting failed container capi-operator in pod capi-operator-57795fdd4-pzfv2_openshift-cluster-api-operator(bd06009f-8be9-47da-a3ad-7c025be9cc37)] happened 138 times
event [namespace/openshift-cluster-api-operator node/ip-10-0-18-135.us-west-2.compute.internal pod/capi-operator-57795fdd4-pzfv2 hmsg/37b9101a12 - Back-off restarting failed container capi-operator in pod capi-operator-57795fdd4-pzfv2_openshift-cluster-api-operator(bd06009f-8be9-47da-a3ad-7c025be9cc37)] happened 159 times
event [namespace/openshift-cluster-api-operator node/ip-10-0-18-135.us-west-2.compute.internal pod/capi-operator-57795fdd4-pzfv2 hmsg/37b9101a12 - Back-off restarting failed container capi-operator in pod capi-operator-57795fdd4-pzfv2_openshift-cluster-api-operator(bd06009f-8be9-47da-a3ad-7c025be9cc37)] happened 178 times
...
E0205 14:04:07.265245       1 main.go:115] failed to get infrastructure "": failed to get server groups: Get "https://172.30.0.1:443/api": dial tcp 172.30.0.1:443: i/o timeoutunable to get infrastructure

Looks like we might be doing a full network partition here?

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2026
@miyadav
Copy link
Member

miyadav commented Feb 5, 2026

I created a cluster with the PR , I could see below -

`miyadav@miyadav-mac ~ % oc project openshift-cluster-api-operator 
Now using project "openshift-cluster-api-operator" on server "https://api.miyadav-0502.qe.devcluster.openshift.com:6443".
miyadav@miyadav-mac ~ % oc get pods
NAME                             READY   STATUS             RESTARTS       AGE
capi-operator-564d4798f9-q9q7q   0/1     CrashLoopBackOff   12 (32s ago)   37m
miyadav@miyadav-mac ~ % oc get networkpolicy
NAME                                 POD-SELECTOR                 AGE
allow-ingress-to-metrics-operators   k8s-app in (capi-operator)   34m
default-deny                         <none>                       34m`

We see cluster installation as successful though ( i think we already know about that , co not being degraded )
it works fine if i scale down deployment after creating a network policy to allow all egress .

`miyadav@miyadav-mac debugnp % oc get pods
NAME                             READY   STATUS    RESTARTS   AGE
capi-operator-564d4798f9-ms78j   1/1     Running   0          4m41s
miyadav@miyadav-mac debugnp % oc get networkpolicy
NAME                                 POD-SELECTOR                 AGE
allow-egress-capi-operator           k8s-app=capi-operator        5m43s
allow-ingress-to-metrics-operators   k8s-app in (capi-operator)   42m
default-deny                         <none>                       42m
miyadav@miyadav-mac debugnp % cat networkpolicy.yaml 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: allow-egress-capi-operator
    namespace: openshift-cluster-api-operator
spec:
    podSelector:
      matchLabels:
        k8s-app: capi-operator
    policyTypes:
    - Egress
    egress:
    - {}  # Allow ALL egress (outbound connections)`

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2026
@openshift-ci-robot
Copy link

@mdbooth: This pull request explicitly references no jira issue.

Details

In response to this:

Updates NetworkPolicy to reflect changes made in #447

Summary by CodeRabbit

  • Security
  • Allow ingress to operator metrics endpoints: ports 8443 (operator and controllers) and 8442 (controllers) in their respective namespaces.
  • Add default-deny network policies (block all ingress and egress) in both operator and controller namespaces.
  • Add explicit egress-allow policies for operator and controller pods, splitting egress permissions across the two namespaces.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
manifests/0000_30_cluster-api_14_allow-ingress-to-metrics-operators.yaml (1)

42-71: ⚠️ Potential issue | 🟡 Minor

Update the NetworkPolicy comment to accurately reflect the diagnostics endpoint, not metrics.

Port 8442 is correctly exposed by the machine-api-migration container as shown in the deployment (lines 62, 68-69), but the NetworkPolicy comment is misleading—it describes 8442 as a "metrics endpoint" when it's actually a diagnostics endpoint (--diagnostics-address=:8442). Similarly, port 8443 used by capi-controllers is also a diagnostics endpoint, not metrics. Update the comment to accurately reflect this.

@miyadav
Copy link
Member

miyadav commented Feb 6, 2026

This looks good manually for now -

`miyadav@miyadav-mac ~ % oc project openshift-cluster-api-operator
Now using project "openshift-cluster-api-operator" on server "https://api.miyadav-0602.qe.devcluster.openshift.com:6443".
miyadav@miyadav-mac ~ % oc get pods
NAME                             READY   STATUS    RESTARTS   AGE
capi-operator-676b9f66fb-5244c   1/1     Running   0          36m
miyadav@miyadav-mac ~ % oc get networkpolicy
NAME                                 POD-SELECTOR                 AGE
allow-egress-operators               k8s-app in (capi-operator)   35m
allow-ingress-to-metrics-operators   k8s-app in (capi-operator)   35m
default-deny                         <none>                       35m`

As suggested automation is WIP

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
/lgtm

^ to trigger testing

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2026
@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@sunzhaohua2
Copy link
Contributor

/retest

@damdo
Copy link
Member

damdo commented Feb 9, 2026

Hey @miyadav did you want to run other checks aside from: #453 (comment)

or is this considered verified? Thanks :)

@miyadav
Copy link
Member

miyadav commented Feb 9, 2026

Hey @miyadav did you want to run other checks aside from: #453 (comment)

or is this considered verified? Thanks :)

hey @damdo , remaining tests are here , I used this same build. The test here was just a smoke test to make sure , no crashing of pod due to added network policy.

So consider it to be VERIFIED , if the other PR looks good , unless any other checks you might want us to add , we can do that.
From my understanding regression + the tests in other PR should be good to let this merge.

@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 10, 2026

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2026
@damdo
Copy link
Member

damdo commented Feb 10, 2026

@miyadav @mdbooth do we have a way to check in the CI job run if metrics were collected for this at all?
Promecleus maybe?

@miyadav
Copy link
Member

miyadav commented Feb 10, 2026

Thanks @damdo for pointers I checked here for techpreview job , seems no , metric for namespace - openshift-cluster-api-operator .
So we need service monitors created for the namespace first ?

@damdo
Copy link
Member

damdo commented Feb 10, 2026

@miyadav yeah it looks like we never had metrics collected/exposed, then I think we are good to merge!
Can you add your verified label? Thanks!

@miyadav
Copy link
Member

miyadav commented Feb 10, 2026

/verified by @miyadav

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Feb 10, 2026
@openshift-ci-robot
Copy link

@miyadav: This PR has been marked as verified by @miyadav.

Details

In response to this:

/verified by @miyadav

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@damdo
Copy link
Member

damdo commented Feb 11, 2026

/override ci/prow/e2e-openstack-ovn-techpreview

Unrelated failure

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 11, 2026

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-openstack-ovn-techpreview

Details

In response to this:

/override ci/prow/e2e-openstack-ovn-techpreview

Unrelated failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 11, 2026

@mdbooth: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 11ebf5b into openshift:main Feb 11, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants