Conversation
|
CI is green - we are using this branch for various projects. |
|
@echarles can this be merged? how can we push it? thank you |
I hope this PR can be reviewed soon. |
fasta2a/applications.py
Outdated
| # Serialize event to ensure proper camelCase conversion | ||
| event_dict = stream_event_ta.dump_python(event, mode='json', by_alias=True) | ||
|
|
||
| # Wrap in JSON-RPC response | ||
| jsonrpc_response = {'jsonrpc': '2.0', 'id': request_id, 'result': event_dict} | ||
|
|
||
| # Convert to JSON string | ||
| yield json.dumps(jsonrpc_response) |
There was a problem hiding this comment.
Do we need to do this serialization and then deserialization? Seems a bit weird. I'll check.
But for sure we are not going to use json.dumps. There's a function in pydantic_core for it.
There was a problem hiding this comment.
In d16abca
The 3-step serialize pattern in the SSE generator:
- stream_event_ta.dump_python→ intermediate Python dict
- Wrap in a plain dict {'jsonrpc': '2.0', ...}
- json.dumps(...) → JSON string
is replaced by a single call . This skips the intermediate dict materialization and uses pydantic_core' JSON serializer instead of json.dumps. The StreamMessageResponse TypeAdapter handles camelCase aliasing for the entire envelope + nested event in one pass.
fasta2a/applications.py
Outdated
| provider: AgentProvider | None = None, | ||
| skills: list[Skill] | None = None, | ||
| docs_url: str | None = '/docs', | ||
| streaming: bool = False, |
There was a problem hiding this comment.
Do you want it to be False by default?
fasta2a/applications.py
Outdated
| # Parse the streaming request | ||
| stream_request = stream_message_request_ta.validate_json(data) | ||
|
|
||
| # Create an async generator wrapper that formats events as JSON-RPC responses |
There was a problem hiding this comment.
There's no need for a comment explaining each line.
| # Create an async generator wrapper that formats events as JSON-RPC responses |
fasta2a/applications.py
Outdated
| if a2a_request['method'] == 'message/send': | ||
| jsonrpc_response = await self.task_manager.send_message(a2a_request) | ||
| elif a2a_request['method'] == 'message/stream': | ||
| # Parse the streaming request |
There was a problem hiding this comment.
| # Parse the streaming request |
fasta2a/broker.py
Outdated
| raise NotImplementedError('send_run_task is not implemented yet.') | ||
| ... |
fasta2a/broker.py
Outdated
| raise NotImplementedError('send_cancel_task is not implemented yet.') | ||
| ... |
fasta2a/storage.py
Outdated
| return self.contexts.get(context_id) | ||
|
|
||
|
|
||
| class StreamingStorageWrapper(Storage[ContextT]): |
There was a problem hiding this comment.
Why do we need this instead of using Storage?
tests/test_streaming.py
Outdated
| class TestFastA2AStreaming: | ||
| """Tests for the FastA2A message/stream SSE endpoint.""" |
| asyncio.create_task(self.broker.run_task(broker_params)) | ||
|
|
||
| # Stream events from broker | ||
| async for event in self.broker.subscribe_to_stream(task['id']): | ||
| yield event |
There was a problem hiding this comment.
🔴 Race condition: task execution starts before stream subscription is registered
In stream_message, asyncio.create_task(self.broker.run_task(broker_params)) at line 194 schedules the task for execution BEFORE the subscription is registered at line 197. Since subscribe_to_stream is an async generator, the subscriber registration (at fasta2a/broker.py:159-164) only happens when iteration begins — i.e., when __anext__ is called by the async for at line 197. By that time, the background task is already scheduled. The first checkpoint inside subscribe_to_stream (the anyio.Lock acquisition at fasta2a/broker.py:161, which always checkpoints even if uncontested) can allow the scheduled run_task to start executing. If the worker processes the task fast enough (e.g., with a near-zero delay or a remote broker), events emitted via send_stream_event will find no subscribers and be silently dropped, causing the client to hang forever waiting for events that were already lost.
Prompt for agents
In fasta2a/task_manager.py, the stream_message method (lines 161-198) must subscribe to the broker's event stream BEFORE starting task execution. However, since subscribe_to_stream is an async generator, simply reordering the lines won't work — the subscription registration only happens when iteration begins.
The fix requires restructuring so that subscription registration is decoupled from iteration. One approach:
1. In broker.py's subscribe_to_stream (or a new method), separate the registration step from the iteration step. For example, add a method like `register_subscriber(task_id)` that returns a (send_stream, receive_stream) pair and registers the subscriber, then have subscribe_to_stream accept the pre-registered streams.
2. Alternatively, refactor stream_message to manually register the subscriber before starting the task, then iterate the receive stream directly.
The key invariant: the subscriber MUST be registered in self._event_subscribers BEFORE self.broker.run_task(broker_params) is called.
Was this helpful? React with 👍 or 👎 to provide feedback.
| broker_params['metadata'] = metadata | ||
|
|
||
| # Start task execution in background | ||
| asyncio.create_task(self.broker.run_task(broker_params)) |
There was a problem hiding this comment.
🔴 Fire-and-forget asyncio.create_task loses exceptions and breaks anyio/trio compatibility
At line 194, asyncio.create_task(self.broker.run_task(broker_params)) has two problems: (1) The returned task reference is not stored, so if run_task raises an exception, it is silently lost (Python will only emit a warning about a never-retrieved exception on GC). The streaming client will hang forever waiting for events that will never arrive. (2) The codebase consistently uses anyio for async primitives (fasta2a/broker.py uses anyio.Lock, anyio.create_memory_object_stream; fasta2a/worker.py uses anyio.create_task_group). Using asyncio.create_task directly breaks compatibility with trio (which anyio supports) and violates the project's convention.
Prompt for agents
In fasta2a/task_manager.py line 194, replace the bare `asyncio.create_task(self.broker.run_task(broker_params))` with an anyio-based approach that properly handles errors. For example, use an anyio TaskGroup to manage the background task's lifecycle. The stream_message method needs restructuring: instead of fire-and-forget, use a task group that runs both the task execution and the event streaming concurrently, so that if run_task raises an exception, it properly propagates or is handled (e.g., by sending a 'failed' status event). Also remove the `import asyncio` at line 63 if no longer needed.
Was this helpful? React with 👍 or 👎 to provide feedback.
| async with self._subscriber_lock: | ||
| subscribers = self._event_subscribers.get(task_id, []) | ||
| if not subscribers: | ||
| return | ||
|
|
||
| # Send event to all subscribers, removing closed streams | ||
| active_subscribers: list[MemoryObjectSendStream[StreamEvent]] = [] | ||
| for stream in subscribers: | ||
| try: | ||
| await stream.send(event) | ||
| active_subscribers.append(stream) | ||
| except (anyio.ClosedResourceError, anyio.BrokenResourceError): | ||
| # Subscriber disconnected, remove from list | ||
| pass | ||
|
|
||
| # Update subscriber list with only active ones | ||
| if active_subscribers: | ||
| self._event_subscribers[task_id] = active_subscribers | ||
| elif task_id in self._event_subscribers: | ||
| # No active subscribers left, clean up | ||
| del self._event_subscribers[task_id] |
There was a problem hiding this comment.
🚩 Potential lock contention in send_stream_event while holding lock during await
In fasta2a/broker.py:131-151, send_stream_event holds _subscriber_lock for the entire duration of sending events to ALL subscribers. Each await stream.send(event) at line 140 could block if a subscriber's buffer is full (max_buffer_size=100). While holding the lock, no new subscriptions or unsubscriptions can proceed. I analyzed whether this could deadlock with subscribe_to_stream's finally block (line 176), which also needs the lock. The scenario where subscriber A's buffer is full while subscriber B tries to clean up is NOT a deadlock because: (1) A's consumer doesn't need the lock to read from its receive_stream, so it can drain the buffer, and (2) if A's receive_stream is already closed, the send raises BrokenResourceError which is caught at line 142. However, this pattern could cause significant latency spikes under load with many slow subscribers, as a single slow subscriber blocks event delivery to all others for that task_id.
Was this helpful? React with 👍 or 👎 to provide feedback.
fasta2a/storage.py
Outdated
| task = await self._storage.update_task(task_id, state, new_artifacts, new_messages) | ||
|
|
||
| # Determine if this is a final state | ||
| final = state in ('completed', 'failed', 'canceled') |
There was a problem hiding this comment.
🚩 TaskState completeness: 'rejected', 'auth-required', 'unknown' and 'input-required' not treated as final
In fasta2a/worker.py:74, final is determined by state in ('completed', 'failed', 'canceled'). Looking at the TaskState type alias at fasta2a/schema.py:436-438, there are additional terminal-like states: 'rejected', 'auth-required', and 'unknown'. If a worker implementation ever sets the state to 'rejected', no final event would be sent to stream subscribers, causing them to hang. The current EchoWorker in tests only uses 'working', 'completed', and 'canceled', so this isn't triggered now, but could be a problem for future worker implementations.
Was this helpful? React with 👍 or 👎 to provide feedback.
Co-authored-by: Marcelo Trylesinski <marcelotryle@gmail.com>
fasta2a/storage.py
Outdated
| task = await self._storage.update_task(task_id, state, new_artifacts, new_messages) | ||
|
|
||
| # Determine if this is a final state | ||
| final = state in ('completed', 'failed', 'canceled') |
There was a problem hiding this comment.
🔴 Missing 'rejected' terminal state causes streaming subscribers to hang forever
At fasta2a/storage.py:175, the final flag is computed as state in ('completed', 'failed', 'canceled'), but TaskState (defined at fasta2a/schema.py:436-438) also includes 'rejected' as a terminal state. If a task transitions to 'rejected', no final=True status event will ever be published — instead, a non-final status update (final=False) is sent. This means subscribers in InMemoryBroker.subscribe_to_stream (fasta2a/broker.py:170-173) will never see a final=True event and will block indefinitely, causing the SSE connection to hang.
Was this helpful? React with 👍 or 👎 to provide feedback.
| asyncio.create_task(self.broker.run_task(broker_params)) | ||
|
|
||
| # Stream events from broker | ||
| async for event in self.broker.subscribe_to_stream(task['id']): | ||
| yield event |
There was a problem hiding this comment.
🚩 Subscription registration vs task execution ordering relies on subtle asyncio scheduling guarantees
In fasta2a/task_manager.py:194-197, the code first calls asyncio.create_task(self.broker.run_task(broker_params)) (line 194) then subscribes via async for event in self.broker.subscribe_to_stream(task['id']) (line 197). This appears to be a race condition at first glance, but it actually works because asyncio.create_task only schedules the coroutine — it doesn't execute until the current coroutine yields. The subscribe_to_stream async generator executes synchronously through the subscriber registration code (fasta2a/broker.py:159-164) before hitting its first real await (the async for event in receive_stream at line 168). However, this ordering guarantee is fragile: if self._subscriber_lock at fasta2a/broker.py:161 is contended (held by a concurrent send_stream_event call), the subscription registration will yield to the event loop, potentially allowing the background task to start and emit events before the subscriber is registered. In the InMemoryBroker with a single task this is very unlikely, but worth noting for robustness.
Was this helpful? React with 👍 or 👎 to provide feedback.
| provider: AgentProvider | None = None, | ||
| skills: list[Skill] | None = None, | ||
| docs_url: str | None = '/docs', | ||
| streaming: bool = True, |
There was a problem hiding this comment.
🚩 Default streaming=True may break backward compatibility for non-streaming setups
The FastA2A.__init__ now defaults streaming=True (fasta2a/applications.py:49), which means the agent card will advertise streaming capability. If a user upgrades without wrapping their storage in StreamingStorageWrapper, the agent card will claim streaming support but message/stream requests will fail because no streaming events are published. This is a potential compatibility concern — previously streaming was hardcoded to False (fasta2a/applications.py:103). The default being True assumes all users are streaming-ready.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
🚩 InMemoryStorage.load_task mutates the stored task's history in place
Pre-existing issue: InMemoryStorage.load_task at fasta2a/storage.py:85-86 does task['history'] = task['history'][-history_length:], which mutates the stored task dict in place, permanently truncating the history. This means subsequent calls to load_task without history_length will still see the truncated history. This is a pre-existing bug, not introduced by this PR, but relevant context since streaming relies on storage state.
Was this helpful? React with 👍 or 👎 to provide feedback.
| @@ -56,6 +57,66 @@ async def _handle_task_operation(self, task_operation: TaskOperation) -> None: | |||
| except Exception: | |||
| await self.storage.update_task(task_operation['params']['id'], state='failed') | |||
There was a problem hiding this comment.
🔴 Exception handler in _handle_task_operation bypasses streaming event publishing
When a task raises an exception, _handle_task_operation (fasta2a/worker.py:57-58) catches it and calls self.storage.update_task(...) directly instead of self.update_task(...). The new self.update_task() method (line 60) is responsible for both updating storage AND publishing streaming events (including the crucial final=True status event). By calling self.storage.update_task() directly, no final=True TaskStatusUpdateEvent is ever published to the broker. Any SSE subscriber waiting on subscribe_to_stream will hang indefinitely, never receiving the termination signal.
| await self.storage.update_task(task_operation['params']['id'], state='failed') | |
| await self.update_task(task_operation['params']['id'], state='failed') |
Was this helpful? React with 👍 or 👎 to provide feedback.
| metadata = params.get('metadata') | ||
| if metadata is not None: | ||
| broker_params['metadata'] = metadata |
There was a problem hiding this comment.
🚩 Inconsistent metadata forwarding between send_message and stream_message
The stream_message method (lines 189-191) forwards metadata from params to broker_params, but the pre-existing send_message method (fasta2a/task_manager.py:117-132) does not. This is an inconsistency between the two code paths. While not a bug per se (since send_message was pre-existing), it means the two methods handle the same MessageSendParams differently. This should be harmonized.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
@Kludex Thx for the review, I have addressed them and CI is green. |
Fixes #27
Superseds #26
@echarles @physicsrob @samuelcolvin @Kludex