Skip to content

Fix "Device or resource busy" error when switching audio inputs with automatic retry and orphaned process cleanup#11

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/handle-arecord-busy-error
Draft

Fix "Device or resource busy" error when switching audio inputs with automatic retry and orphaned process cleanup#11
Copilot wants to merge 3 commits intomasterfrom
copilot/handle-arecord-busy-error

Conversation

Copy link

Copilot AI commented Oct 22, 2025

Problem

When an iOS client switches audio input to a USB device (e.g., plughw:0,0), the server frequently reports an error:

arecord: main:850: audio open error: Device or resource busy

This occurs due to two issues:

  1. Race condition: When switching inputs, the old arecord process is killed and a new one is spawned immediately. However, the OS hasn't fully released the audio device yet, causing the new process to fail with EBUSY.
  2. Orphaned processes: Some arecord processes don't terminate properly and remain running even after babelpod attempts to kill them, holding the audio device.

Solution

This PR implements a three-layer approach to handle these issues:

1. Prevention: Cleanup Delay

Added a 500ms delay after killing the old arecord process before starting the new one. This gives the OS sufficient time to release the audio device in most cases, significantly reducing the frequency of busy errors.

2. Orphaned Process Cleanup

  • Enhanced process termination: Modified cleanupCurrentInput() to use SIGTERM first for graceful termination, then SIGKILL after 100ms if the process doesn't die
  • Orphaned process detection: New killOrphanedArecord() function that runs before starting a new arecord:
    • Uses pgrep -f "arecord.*${devId}" to find any existing processes for the device
    • Terminates them with pkill -9 (SIGKILL) to ensure they die
    • Handles the case where no processes are found gracefully

3. Recovery: Automatic Retry with Exponential Backoff

When a "Device or resource busy" error is detected:

  • The system automatically retries up to 5 times
  • Uses exponential backoff delays: 200ms, 400ms, 800ms, 1600ms, 3200ms
  • Users see status messages during retries: "Device busy, retrying... (1/5)"
  • After successful retry: "Input successfully connected to [device]"
  • If all retries fail: "Device is busy and could not be opened after 5 attempts. Please try again later."

Implementation Details

New function killOrphanedArecord(devId):

  • Detects and terminates any orphaned arecord processes before starting a new one
  • Uses system commands to find and kill processes holding the device
  • Prevents orphaned processes from blocking the audio device

New helper function startArecordForDevice(devId, isRetry):

  • Centralizes arecord startup logic
  • Calls killOrphanedArecord() before spawning new process
  • Handles both initial starts and retries consistently
  • Manages state and emits appropriate UI messages

Enhanced cleanupCurrentInput():

  • Uses SIGTERM first for graceful termination
  • Falls back to SIGKILL after 100ms if process doesn't terminate
  • Ensures processes are actually killed, not just requested to terminate

Enhanced setupArecordHandlers(devId, isRetry):

  • Buffers stderr output to detect busy errors
  • Distinguishes between retryable busy errors and permanent failures
  • Implements exponential backoff retry logic
  • Only retries during manual user switches (not automatic restarts)

Modified switch_input handler:

  • Uses cleanup delay before starting new arecord for all input types (PCM, Bluetooth)
  • Resets retry counters on new manual switches
  • Consistent behavior across different input device types

User Experience

Before: User encounters error and must manually retry, possibly multiple times. Orphaned processes can persist and block the device indefinitely.

After: System automatically handles retries transparently and cleans up orphaned processes. Most cases succeed within 700ms (cleanup delay + one retry). User only sees an error if the device is genuinely unavailable after 5 attempts (~6.7 seconds).

Testing

Added 10 comprehensive unit tests covering:

  • Detection of "Device or resource busy" and "audio open error" patterns
  • Exponential backoff delay calculation
  • Maximum retry attempt enforcement
  • Cleanup delay validation
  • SIGTERM/SIGKILL process termination behavior
  • Timeout between signals
  • Orphaned process detection and cleanup
  • pgrep exit code handling

All 26 tests pass (16 existing + 10 new) with no regressions.

Edge Cases Handled

  • Rapid device switching: Old retries are cancelled cleanly when user switches again
  • Device never available: Bounded retry attempts with clear error message
  • Different device types: Works consistently for PCM, Bluetooth, and void inputs
  • Concurrent operations: Separate retry counters for manual vs automatic restarts
  • Orphaned arecord processes: Detected and terminated before starting new process
  • Processes that don't respond to SIGTERM: Force killed with SIGKILL after timeout

Performance Impact

  • Memory: Negligible (small stderr buffer per active arecord)
  • CPU: Negligible (event-driven, no polling; pgrep only runs before starting arecord)
  • User wait time:
    • Best case: 500ms (cleanup only)
    • Typical: 700ms (cleanup + 1 retry)
    • Worst case: 6700ms (all retries exhausted)

Fixes #10

Original prompt

This section details on the original issue you should resolve

<issue_title>Server reports arecord is busy when set by iOS client</issue_title>
<issue_description>This should be a detected error and we should work around it if possible. Here are relevant logs:

Client sets input to USB.

Oct 21 23:32:11 PattyPi node[55559]: Receiving request: 192.168.4.50 SETUP
Oct 21 23:32:11 PattyPi node[55559]: Receiving request: 192.168.4.50 RECORD
Oct 21 23:32:11 PattyPi node[55559]: Receiving request: 192.168.4.50 SETVOLUME
Oct 21 23:32:11 PattyPi node[55559]: Receiving request: 192.168.4.50 SETPROGRESS
Oct 21 23:32:25 PattyPi node[55559]: Switching input to: plughw:0,0
Oct 21 23:32:25 PattyPi node[55559]: arecord stderr for plughw:0,0: arecord: main:850: audio open error: Device or resource busy
Oct 21 23:32:25 PattyPi node[55559]: arecord exited for plughw:0,0 - code: 1, signal: null, manual: true
Oct 21 23:32:25 PattyPi node[55559]: Receiving request: 192.168.4.50 OPTIONS

</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #10


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…tic retry

Co-authored-by: benlachman <100571+benlachman@users.noreply.github.com>
Copilot AI changed the title [WIP] Handle error when arecord is busy for iOS client Fix "Device or resource busy" error when switching audio inputs with automatic retry Oct 22, 2025
Copilot AI requested a review from benlachman October 22, 2025 03:50
@benlachman
Copy link
Owner

@copilot Can we also just terminate the orphaned arecord process? I've looked at ps on the server while this bug is happening and there is in fact a arecord process that is still running, even though babelpod thinks it has terminated it.

Co-authored-by: benlachman <100571+benlachman@users.noreply.github.com>
Copy link
Author

Copilot AI commented Oct 22, 2025

@copilot Can we also just terminate the orphaned arecord process? I've looked at ps on the server while this bug is happening and there is in fact a arecord process that is still running, even t...

Added orphaned process cleanup in commit 5053649. The solution includes:

  1. Enhanced cleanup: cleanupCurrentInput() now uses SIGTERM first, then SIGKILL after 100ms if the process doesn't terminate gracefully
  2. Orphaned process detection: New killOrphanedArecord() function that runs before starting a new arecord:
    • Uses pgrep -f "arecord.*${devId}" to find any existing processes for the device
    • Terminates them with pkill -9 (SIGKILL) to ensure they die
    • Handles the case where no processes are found gracefully

This ensures that any orphaned arecord processes holding the audio device are cleaned up before attempting to start a new one, preventing the "Device or resource busy" error even when processes haven't been properly terminated.

Copilot AI changed the title Fix "Device or resource busy" error when switching audio inputs with automatic retry Fix "Device or resource busy" error when switching audio inputs with automatic retry and orphaned process cleanup Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server reports arecord is busy when set by iOS client

2 participants