CASMTRIAGE-9035/CASMTRIAGE-8987/CASMNET-2387 - NMN Isolation bugfixes#751
Open
spillerc-hpe wants to merge 8 commits intomainfrom
Open
CASMTRIAGE-9035/CASMTRIAGE-8987/CASMNET-2387 - NMN Isolation bugfixes#751spillerc-hpe wants to merge 8 commits intomainfrom
spillerc-hpe wants to merge 8 commits intomainfrom
Conversation
…ror as the switch always renders it by name not port
…ASMNET-2387) NCNs route the HMN cabinet network (10.104.0.0/22) via bond0.hmn0, so all SSH/UDP/ICMP traffic from NCNs to BMCs arrives at CDU switches with a source address in 10.254.0.0/17 (HMN), not the NMN /32s that make up the NCN object-group. Under NMN isolation the MANAGED_NODE_ISOLATION ACL on vlan2 dropped these packets before they could reach any routed-in ACL, causing SSH and UDP traceroute to fail. Changes: - services_acl.j2: add 'permit any HMN HMN_MTN' ACEs (rules 300/310) inside the HMN_MTN guard so HMN-sourced traffic bidirectionally reaches cabinet BMCs; NMN-sourced rules retained as dead-code intent - cdu_hmn_cabinet_acl.j2 (new): ACL for vlan3000 (cabinet HMN); permits full HMN<->HMN_MTN traffic, denies NMN<->HMN_MTN, then permits all - cdu_nmn_routed_acl.j2 (new): ACL for CDU routed-in/out on vlan2/2000; prepends HMN<->HMN_MTN permit-any rules before inter-zone deny rules - mtn_hmn_vlan.j2: apply cdu-hmn-cabinet on vlan3000 when NMN isolation is active (previously nmn-hmn, which had no HMN-sourced exceptions) - mtn_nmn_vlan.j2: use cdu-nmn-routed for routed-in/out ACL on NMN VLANs - sw-cdu.primary.j2, sw-cdu.secondary.j2: include new ACL templates and apply cdu-nmn-routed on interface vlan2/2000; fix HMN_MTN guard from HMN_MTN_NETWORK_IP to HMN_MTN - services_objects.j2: add HMN object-group for use in ACL rules - canu/config/network/network.py: include HMN_MTN, HMN_MTN_NETWORK_IP, HMN_MTN_NETMASK, HMN_NETWORK_IP, HMN_NETMASK in the variables dict passed to templates (parse_sls_for_config already computed them but they were omitted, causing UndefinedError in test_config_network_dry_run) - golden configs updated to reflect new ACL content Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary and Scope
Fix three bugs in the
MANAGED_NODE_ISOLATIONACL and CDU switch templates affecting NMN-isolated systems.Fix 1 — CASMTRIAGE-9035: BMC nodes cannot resolve DNS, FAS firmware updates fail
Root cause: OSPF advertises the HMNLB DNS VIP (
10.94.100.x) with a nexthop of10.252.0.xon vlan 2 (NMN). This means BMC DNS queries enter the spine via vlan 2 and are evaluated byMANAGED_NODE_ISOLATION. The previous DNS rules only matched:permit udp any eq dns any— traffic from port 53 (DNS replies), not queriespermit udp MANAGED_NODES NMN_K8S_SERVICE eq dns— DNS queries to the NMN K8S service range only; the BMC source (10.104.0.x) is not inMANAGED_NODESand the HMNLB DNS is not inNMN_K8S_SERVICEFix: Replace the three narrow DNS rules with four unrestricted rules covering DNS queries (dst-port 53) and replies (src-port 53) for both UDP and TCP, regardless of source or destination. The over-specific
permit udp MANAGED_NODES NMN_K8S_SERVICE eq dnsis now subsumed and removed (net +1 TCAM entry).Fix 2 — CASMTRIAGE-8987: IMS remote build node inaccessible to k8s containers
Root cause: IMS remote build jobs map containerised SSH servers to host ports starting at 2022 (incrementing per concurrent job). The ACL only permitted
eq ssh(port 22) between NCNs and managed nodes, blocking the non-standard ports.Fix: Add an
SSH_ALTERNATEport object group (eq ssh+range 2022 2040) toservices_objects.j2and replace the twoeq sshrules inservices_acl.j2withgroup SSH_ALTERNATE. The range 2022–2040 supports up to 19 concurrent IMS remote build jobs per node. Net cost: +2 TCAM entries.Fix 3 — CASMNET-2387: NCN cannot SSH or run UDP traceroute to HMN cabinet BMCs under NMN isolation
Root cause: NCNs route the HMN cabinet network (
10.104.0.0/22, vlan3000) viabond0.hmn0, so all traffic from NCNs to BMCs arrives at CDU switches with source10.254.1.x(HMN range), not the NMN/32s that make up theNCNobject-group. Under NMN isolation, theMANAGED_NODE_ISOLATIONACL applied inbound on vlan 2 dropped this traffic before it could reach any routed-in ACL — theNCN-sourced SSH and established rules (rules 280/290) never matched, and rule 430 (implicit deny) caught everything. ICMP traceroute worked because a separatepermit icmp any anyrule existed; UDP traceroute did not.Fix:
permit any HMN HMN_MTN/permit any HMN_MTN HMNACEs (rules 300/310) toMANAGED_NODE_ISOLATIONinservices_acl.j2, guarded byHMN_MTN. NCN HMN addresses need unrestricted management access to BMCs (SSH, HTTPS, IPMI/RMCP, traceroute, etc.) sopermit anyis intentional.cdu_hmn_cabinet_acl.j2(new): CDU-specific ACL for vlan3000 (cabinet HMN). Permits full HMN↔HMN_MTN traffic, denies NMN↔HMN_MTN, thenpermit any any any.cdu_nmn_routed_acl.j2(new): CDU routed-in/out ACL for interface vlan2/2000. Prepends HMN↔HMN_MTNpermit anyrules before the inter-zone deny rules, thenpermit any any any.mtn_hmn_vlan.j2: applycdu-hmn-cabineton vlan3000 when NMN isolation is active (previouslynmn-hmn, which had no HMN-sourced exceptions).mtn_nmn_vlan.j2,sw-cdu.primary.j2,sw-cdu.secondary.j2: usecdu-nmn-routedfor routed-in/out; fixHMN_MTNJinja2 guard fromHMN_MTN_NETWORK_IP(not always present in the variables dict) toHMN_MTN(always initialised).canu/config/network/network.py:HMN_MTN,HMN_MTN_NETWORK_IP,HMN_MTN_NETMASK,HMN_NETWORK_IP, andHMN_NETMASKwere computed byparse_sls_for_configbut omitted from thevariablesdict passed to templates, causingUndefinedErrorintest_config_network_dry_run.The two new utility scripts and their README are also included to help future developers regenerate golden configs after ACL changes.
If adding a new file, I have updated(scripts inpyinstaller.pytests/are not bundled)I have added entries in(no CHANGELOG.md exists)CHANGELOG.mdfor the changes in this PRIssues and Related PRs
CASMTRIAGE-9035— Odin: DNS servers are not reachable from BMCs. FAS update fails.CASMTRIAGE-8987— Remote build node inaccessible to k8s container.CASMNET-2387— NCN cannot SSH or traceroute to HMN cabinet BMCs under NMN isolation.Testing
I will fill out the manual testing done by hand.
Automated: All golden configs updated across full, TDS, and custom 1.7 architectures; isolation configs regenerated and unit tests pass.
Changed files:
network_modeling/configs/templates/1.7/aruba/common/services_acl.j2eq sshwithgroup SSH_ALTERNATE; addpermit any HMN↔HMN_MTNrules (Fix 3)network_modeling/configs/templates/1.7/aruba/common/services_objects.j2SSH_ALTERNATEport group; addHMNobject-group; remove redundantdnsfrom service groups; consolidate ports 6817–6819 to a rangenetwork_modeling/configs/templates/1.7/aruba/common/cdu_hmn_cabinet_acl.j2network_modeling/configs/templates/1.7/aruba/common/cdu_nmn_routed_acl.j2network_modeling/configs/templates/1.7/aruba/common/mtn_hmn_vlan.j2cdu-hmn-cabineton vlan3000 under NMN isolationnetwork_modeling/configs/templates/1.7/aruba/common/mtn_nmn_vlan.j2cdu-nmn-routedfor routed-in/out; fixHMN_MTNJinja2 guardnetwork_modeling/configs/templates/1.7/aruba/common/sw-cdu.primary.j2cdu-nmn-routedon vlan2/2000; fixHMN_MTNguardnetwork_modeling/configs/templates/1.7/aruba/common/sw-cdu.secondary.j2canu/config/network/network.pyHMN_MTN*andHMN_NETWORK*variables to template variables dicttests/data/golden_configs/**/*-isolation.cfg(7 files)tests/data/golden_configs/**/*.cfg(remaining files)tests/data/golden_configs/individual_templates_1.7/services_acl.j2.cfgtests/data/golden_configs/individual_templates_1.7/services_objects.j2.cfgtests/scripts/regenerate_golden_configs_1.7.shtests/scripts/regenerate_individual_templates_1.7.pytests/scripts/README.md