Skip to content

Add kernel execution cancellation via SIGUSR1 signal#339

Open
Edwardvaneechoud wants to merge 2 commits intofeauture/kernel-implementationfrom
claude/add-cancel-method-h35xI
Open

Add kernel execution cancellation via SIGUSR1 signal#339
Edwardvaneechoud wants to merge 2 commits intofeauture/kernel-implementationfrom
claude/add-cancel-method-h35xI

Conversation

@Edwardvaneechoud
Copy link
Owner

Summary

This PR implements execution cancellation support for kernel-based code execution by sending SIGUSR1 signals to running containers. When a user cancels a node execution, the system now interrupts the kernel's running code instead of waiting for it to complete.

Key Changes

  • Kernel Manager: Added interrupt_execution_sync() and interrupt_execution() methods to send SIGUSR1 signals to kernel containers, with proper state validation and error handling.

  • Kernel Runtime: Implemented signal handler infrastructure:

    • Added _cancel_signal_handler() that raises KeyboardInterrupt when SIGUSR1 is received during code execution
    • Registered SIGUSR1 handler in the lifespan startup
    • Wrapped exec() calls with _is_executing flag to track execution state
    • Added KeyboardInterrupt exception handling in the /execute endpoint to return a cancellation response
  • FlowNode: Enhanced cancel() method to:

    • Support kernel cancellation via _kernel_cancel_context attribute
    • Prioritize cancelling cached data fetches over kernel interrupts
    • Handle exceptions gracefully during kernel interrupt attempts
  • FlowGraph: Set and clear _kernel_cancel_context on nodes during kernel execution to enable cancellation.

  • Tests: Added comprehensive test coverage for:

    • Kernel manager interrupt functionality (signal delivery, state validation, error cases)
    • FlowNode cancellation with kernel context
    • Signal handler behavior and execution state tracking
    • End-to-end cancellation response handling

Implementation Details

The cancellation mechanism works by:

  1. When a user cancels a node, FlowNode.cancel() calls KernelManager.interrupt_execution_sync()
  2. The manager sends SIGUSR1 to the kernel container via Docker API
  3. The kernel's signal handler checks if code is executing (_is_executing flag)
  4. If executing, it raises KeyboardInterrupt to abort the running code
  5. The /execute endpoint catches KeyboardInterrupt and returns a cancellation response

This approach is non-blocking and works within the constraints of Python's signal handling (main thread only in production, tested component-wise in test environments).

When a user cancels a flow with a running python_script node, the kernel
code previously kept running until completion or the 300s httpx timeout.
This adds SIGUSR1-based interruption so kernel executions can be stopped
promptly.

- kernel_runtime: Register SIGUSR1 handler that raises KeyboardInterrupt
  during exec(), with _is_executing guard to ignore signals outside of
  code execution. The /execute endpoint now catches KeyboardInterrupt
  and returns a cancellation response.
- KernelManager: Add interrupt_execution_sync() that sends SIGUSR1 to
  the kernel container via Docker API (container.kill).
- FlowNode: Add _kernel_cancel_context attribute set during kernel
  execution. cancel() now checks this context and triggers the kernel
  interrupt alongside the existing worker fetcher cancellation.
- FlowGraph: Set/clear _kernel_cancel_context around execute_sync() in
  add_python_script._func().

https://claude.ai/code/session_018zriRonXcPshWgksMcZeCY
- Remove test that raised KeyboardInterrupt through TestClient (leaked
  through the ASGI thread boundary causing pytest exit code 130)
- Reset _is_executing flag in conftest between tests
- Trim verbose comments and docstrings

https://claude.ai/code/session_018zriRonXcPshWgksMcZeCY
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 72.72727% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
flowfile_core/flowfile_core/flowfile/flow_graph.py 0.00% 5 Missing ⚠️
flowfile_core/flowfile_core/kernel/manager.py 80.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants