Skip to content

Conversation

@janosh
Copy link
Member

@janosh janosh commented Nov 10, 2025

fix CI errors after merging #555 (which had green CI). looks like that there are some flaky tests in test_launchpad.py that this PR aims to improve. breakdown of changes:

test_launchpad.py fixes

1. Fixed MongoDB Command API Compatibility (Lines 60, 63)

Issue: TypeError: Database.command() takes 2 positional arguments but 3 were given

Root Cause: pymongo 4.x changed the Database.command() API from accepting multiple positional arguments to requiring a single dictionary argument.

Fix: Updated command calls to use dictionary format:

# Before
client.db.command("dropUser", "my-user")
client.db.command("createUser", "my-user", pwd="...", roles=[...])

# After
client.db.command({"dropUser": "my-user"})
client.db.command({"createUser": "my-user", "pwd": "...", "roles": [...]})

2. Fixed WFLockTest Race Condition (Lines 1147-1154)

Issue: test_fix_db_inconsistencies_completed failed intermittently when fw_id=2 timed out waiting for workflow lock.

Root Cause: The 1-second sleep wasn't sufficient to guarantee fw_id=1 acquired the workflow lock before fw_id=2 started.

Fix: Added active polling to wait for fw_id=1 to reach RUNNING state before launching fw_id=2:

timeout = 10
while timeout > 0:
    fw1 = self.lp.get_fw_by_id(1)
    if fw1.state == "RUNNING":
        break
    time.sleep(0.5)
    timeout -= 0.5

3. Fixed WorkflowFireworkStatesTest Cache Inconsistency (Lines 967-970, 1001-1004)

Issue: test_rerun_timed_fws failed with AssertionError: assert 'RESERVED' == 'READY'

Root Cause: When processes are terminated, fireworks can be left in RESERVED state while workflow cache shows READY. The cache is not automatically updated when reservations occur.

Fix: Before each state sync assertion, unreserve any RESERVED fireworks and refresh workflows:

# Unreserve any RESERVED fireworks left by terminated process
reserved_ids = self.lp.get_fw_ids({"state": "RESERVED"})
for fw_id in reserved_ids:
    self.lp.rerun_fw(fw_id)

# Detect lost runs and refresh workflows
detect_lostruns(expiration_secs=0.5, refresh=True)

- DataServer renamed `_register_launchpad` to `register_launchpad`.
- Enhanced argument handling in `mlaunch` and `rlaunch` functions, including function docstrings
- module-level error handling for missing `argcomplete` import
Catch NotImplementedError from mongomock's unsupported command() method
and skip authentication tests cleanly instead of showing as ERROR in CI.
@computron computron merged commit aa6a40f into materialsproject:main Nov 10, 2025
4 checks passed
@janosh janosh deleted the fix-ci branch November 10, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants