Skip to content

Feature Request: Progress Updates and Enhanced Response Delivery for Long-Running Jobs #52

@actuallyrizzn

Description

@actuallyrizzn

Feature Request: Progress Updates and Enhanced Response Delivery for Long-Running Jobs

Problem Statement

Broca currently processes messages that can take 30+ minutes to complete, but users receive no feedback during this time. This leads to:

  1. User Confusion: Users don't know if their message is being processed or if the system is stuck
  2. Lost Responses: When timeouts occur, responses may be generated but never delivered to users
  3. Poor UX: No visibility into which message is being processed or how long it's taking

Proposed Solution

Implement a comprehensive progress update system that:

  1. Sends Initial Status: Notifies users when processing begins
  2. Periodic Updates: Sends progress updates every 2 minutes for long-running operations
  3. Context-Aware Messaging: Provides different messages based on elapsed time
  4. Guaranteed Delivery: Ensures responses are sent even on timeout/error with retry logic
  5. Timeout Handling: Explicitly handles timeouts and notifies users appropriately

Implementation Details

Progress Update System

  • Update Interval: 2 minutes (120 seconds) - appropriate for 30+ minute operations
  • Initial Message: "🔄 Processing your message... This may take several minutes."
  • Periodic Updates:
    • 0-5 minutes: "⏳ Still processing... (Xm Ys)"
    • 5-10 minutes: "⏳ Still processing... (X minutes elapsed). This is taking longer than usual."
    • 10+ minutes: "⏳ Still processing... (X minutes elapsed). Complex operations can take 30+ minutes. Please be patient."

Enhanced Response Delivery

  • Multiple Routing Attempts: Up to 3 attempts to deliver responses
  • Timeout Handling: Catches asyncio.TimeoutError and sends appropriate notifications
  • Error Recovery: Sends user-friendly error messages when processing fails
  • Status Tracking: New "timeout" status to distinguish timeouts from other failures

Key Changes

File: runtime/core/queue.py

  1. Added PROGRESS_UPDATE_INTERVAL = 120 constant (2 minutes)
  2. Added _send_progress_update() method to send status messages to users
  3. Modified _process_with_core_block() to:
    • Accept message_id parameter for progress updates
    • Send initial "processing" status
    • Run periodic progress updates in background task
    • Handle timeouts explicitly with asyncio.TimeoutError
    • Send notifications on errors/timeouts
  4. Enhanced _route_response() with retry logic (up to 3 attempts)
  5. Modified main processing loop to:
    • Always attempt to send responses (even on timeout)
    • Send timeout notifications when appropriate
    • Retry response routing on failure

Benefits

  1. Improved User Experience: Users always know what's happening
  2. Reduced Support Burden: Fewer "is it working?" questions
  3. Better Error Handling: Users are notified of issues and timeouts
  4. Guaranteed Delivery: Responses are sent even when operations timeout
  5. Scalability: System handles very long-running operations gracefully

Example User Experience

Before:

  • User sends message
  • 30 minutes of silence
  • User assumes system is broken

After:

  • User sends message
  • Immediate: "🔄 Processing your message... This may take several minutes."
  • 2 minutes: "⏳ Still processing... (2m 0s)"
  • 4 minutes: "⏳ Still processing... (4m 0s)"
  • 10 minutes: "⏳ Still processing... (10 minutes elapsed). Complex operations can take 30+ minutes. Please be patient."
  • 30 minutes: Full response delivered (or timeout notification if applicable)

Technical Considerations

  • Progress updates run in background tasks that are properly cancelled when processing completes
  • Updates are sent through the existing plugin handler system (no new dependencies)
  • Timeout handling distinguishes between client-side timeouts and server-side timeouts
  • Retry logic prevents lost responses due to transient network issues

Testing Recommendations

  1. Test with long-running queries (30+ minutes)
  2. Verify progress updates appear every 2 minutes
  3. Test timeout scenarios
  4. Verify responses are sent even after timeouts
  5. Check that users see all status messages
  6. Test error recovery and retry logic

Related Issues

This addresses user feedback about:

  • Long wait times with no feedback
  • Lost responses on timeout
  • Unclear processing status

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions