Skip to content

Conversation

@mharding-hpe
Copy link
Contributor

@mharding-hpe mharding-hpe commented Feb 4, 2026

CASMCMS-9623

Prior to this PR, the session-completion operator works as follows:

For each running BOS session (call it S):

  1. Get a list of every BOS component which meets either of the following criteria:
    • It is enabled and its session field has the name of session S.
    • Its staged_state.session field has the name of session S
  2. If there are no such components, mark S complete.

This runs into problems in cases where two running sessions have the same name (which is permitted due to tenancy namespacing). The effect is that in such cases, none of these same-named sessions will be marked complete until ALL of them are ready to be marked complete.

This PR modifies the operator so it behaves like everywhere else in BOS regarding components and tenancy. Specifically, it adds a new step in between steps 1 and 2 above:

1.5. If S is on behalf of a tenant, query TAPMS for all components owned by that tenant, and intersect that with the list from step 1.

This completely solves the case for multiple running sessions with the same name PROVIDED THAT all of them are on behalf of tenants. If one of them is untenanted, then there are still problems. However, those issues are not limited to just this operator, and the situation is trickier. See CASMCMS-9622 for details.

This PR also refactors the operator slightly, mainly for ease of readability.

Testing

I tested this on mug. The test scenario was as follows:

  1. Create two tenants, with each owning a different compute node.
  2. For each, create session templates, and then create sessions with the same name to boot their compute node.
  3. After both sessions are running, manually disable (in BOS) the node for one of the tenants (but NOT the other).
  4. Wait long enough to ensure that the session-completion operator has had time to run -- I waited 3x the polling frequency.
  5. Check to see if either session has completed.
  6. Disable the node for the other tenant, and wait 3x the polling frequency again.
  7. Wait long enough to ensure that the session-completion operator has had time to run -- I waited 3x the polling frequency.
  8. Check to see if either session has completed.

First, I ran this test on mug without my fix applied. As expected, in step 5, neither session had been marked completed, but both had been marked completed in step 8.

I then tested with my fix applied. This time, in step 5, the session whose only component had been marked disabled was marked complete, and the other was still running. And in step 8, they both had been marked complete. This is the correct behavior.

@mharding-hpe mharding-hpe merged commit 88e8577 into develop Feb 6, 2026
7 of 8 checks passed
@mharding-hpe mharding-hpe deleted the casmcms-9623 branch February 6, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants