-
-
Notifications
You must be signed in to change notification settings - Fork 0
Major refactoring and enhancement of the core graph execution engine. This PR restructures the codebase into modular subpackages, improves type safety across the board, and introduces new capabilities for graph execution and data handling. #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced new graph data structures for traced computation, including GraphNode, CallNode, CondNode, WhileNode, and ScanNode. - Added OutputKind, InputRef, NodeRef, and OutputRef classes for better output management. - Implemented TracedValue to represent values during tracing, enforcing constraints on operations during tracing. - Created control flow operators (cond, while_loop, scan) for JAX-style tracing in computation graphs. - Removed the old node wrapper and sequential pipeline implementation, replacing them with a more streamlined tracing approach. - Established a caching mechanism for compiled graphs to optimize performance. - Added tracing context management to facilitate graph capture during function execution. - Introduced decorators for JIT compilation and node marking, enhancing the usability of the tracing framework.
- Removed the `ReplicaSpec` class and associated runtime configuration logic from `replica.py`. - Introduced a new `replication` module with `pool.py`, `replicate.py`, and `__init__.py` to handle replica management and distribution. - Implemented `Replica`, `LocalReplica`, and `RemoteReplica` classes for local and remote execution. - Added `ReplicaPool` for managing multiple replicas and load balancing. - Created a `replicate` decorator for marking functions for distributed execution. - Removed the `runtime.py` file and its backend management functionality. - Deleted the `scheduler.py` and `spawner.py` files, simplifying the architecture. - Added a new entry point in `serve.py` for serving distributed functions.
- Added CallExecutor, CondExecutor, WhileExecutor, ScanExecutor, and SwitchExecutor to handle various execution strategies in the computation graph. - Refactored GraphNode to Node, integrating the NodeExecutor protocol for better extensibility. - Implemented service execution capabilities with ServiceCallExecutor and ServiceExecutionProvider for distributed service calls. - Enhanced tracing with ServiceTracingHook to record service method calls during graph execution. - Introduced Replica infrastructure for stateful, replicated actors, including ReplicaConfig and ReplicaHandle. - Added service pooling for load balancing in distributed environments. - Updated the mesh context to support service execution and tracing hooks. - Improved error handling and validation across various execution paths.
Major architectural reorganization to improve code maintainability and separation of concerns: **Core restructuring:** - Split monolithic tracing.py (720 LOC) into focused modules: context, tracer, utils, validation - Organize execution-related modules under thinkagain/core/execution/ - Group graph operations under thinkagain/core/graph/ - Introduce literal_refs and literals modules for better graph literal handling **Distributed execution improvements:** - Refactor mesh and replication pool for cleaner architecture - Update execution hooks and service runtime for consistency - Remove deprecated replicate.py in favor of integrated replica system - Streamline gRPC server/client interactions **API and documentation updates:** - Update README to reflect @replica decorator (replacing @replicate) - Consolidate distributed examples into single comprehensive example - Remove outdated REMOTE_SETUP.md and legacy example files - Clean up test suite by removing deprecated test_agents.py **Code cleanup:** - Improve executor and executors modules with better abstractions - Update profiling integration with new tracing infrastructure - Enhance graph operations and control flow primitives - Remove obsolete benchmark_overhead.py This refactor maintains backward compatibility while establishing a cleaner foundation for future development. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit introduces a new @trace decorator and traceable container system to support custom types in traced graphs, along with comprehensive test improvements and infrastructure refinements. - Add core/traceable.py with @trace decorator and registration API - Implement decompose/compose protocol for custom traceable containers - Add auto-detection support for dataclasses with zero boilerplate - Enhance literal_refs and executor to handle traceable containers via map_traceable_refs - Add validation to reject TracedValues in dict keys and sets - Improve test infrastructure with _clear_compiled_cache fixture and _graph_for helper - Refactor test assertions to use cleaner helper patterns - Rename test_core_tracing.py to test_core.py for broader scope - Export trace decorator from top-level thinkagain module
- Add Union types to control flow operators (cond, while_loop, scan, switch) to properly represent both sync/async callables and TracedValue returns - Add type: ignore comments in examples and tests for known traced value type issues - Improve type annotations in distributed components (mesh, pool, execution_hook) - Replace mutable default arguments with field(default_factory=list) in dataclasses - Update pool registry documentation to clarify key tuple structures
- Updated tests in `test_distributed.py` to utilize the new `@bind_service` decorator for service handles in `@node` functions. - Introduced `bind_service` decorator in `services.py` to attach service handle metadata for tracing and caching. - Enhanced `replica.py` to enforce that replicas must be called from within `@node` functions, improving error handling during tracing. - Added `Bundle` and `make_inputs` utilities in `inputs.py` for structured input handling in pipelines. - Modified `node` decorator to support service bindings and ensure only async functions are accepted. - Updated service execution protocols to streamline service call handling. - Improved documentation and error messages related to service binding and execution patterns.
- Move apply_replica() and replica state handling to new replica_state.py module - Remove TracingHook protocol and ServiceTracingHook plugin-based approach - Simplify BoundReplicaMethod to delegate directly to service provider - Remove global replica hook registry (register_replica_hook, unregister_replica_hook) - Add get_service_bindings() and register_service_bindings() helpers for direct binding management - Streamline ServiceCallExecutor removal and integrate directly with execution context - Update graph tracing to reject replica calls without proper @node/@bind_service decoration - Update examples documentation to reference replica_state.py This refactoring removes unnecessary indirection from the tracing hook architecture while maintaining the same service execution functionality through a more direct approach.
…fields Adds BundleOps.unpack() static method that allows unpacking multiple fields from a bundle or traced dataclass, returning them as a tuple. Includes validation for missing keys with helpful error messages.
- Merge stateful_replica_patterns into distributed_example - Move test_bundle_unpack.py to tests/ directory - Update distributed module structure and dependencies - Remove redundant literals.py module
- Add TYPE_CHECKING blocks for better static type analysis in bundle_example.py - Remove unused imports (e.g., Context import in agent.py) - Add type: ignore comments for unavoidable type mismatches in tests and distributed code - Refactor type annotations in agents, core tracing, and distributed modules - Improve type validation and context handling in tracer and related modules
Summary of ChangesHello @BDHU, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a significant architectural overhaul, transitioning the framework to a JAX-inspired graph compilation model. The primary goal is to provide a more robust, type-safe, and performant foundation for building distributed AI pipelines. By introducing explicit graph tracing and control flow primitives, the system gains greater optimization potential and clearer execution semantics, enabling more efficient and scalable pipeline development. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5d5bd6803
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @property | ||
| def capture_values(self) -> dict[int, Any]: | ||
| return ( | ||
| self.parent_values if self.parent_values is not None else self.node_values | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use immediate parent values for nested subgraphs
The capture source for subgraph inputs is chosen via capture_values, which returns parent_values whenever it is set. For a graph that is itself a subgraph (e.g., a while body), this means prepare_subgraph_args will append captures from the outer graph, not from the current subgraph’s node_values. If that subgraph executes another nested control-flow graph that captures a value computed inside the subgraph, Graph.append_captures will look up that node id in the wrong dict and raise a KeyError (or use a mismatched value if ids collide). This breaks nested control flow like while_loop containing a cond that closes over an intermediate node value.
Useful? React with 👍 / 👎.
| def _create_local() -> Replica: | ||
| instance = handle.replica_class( | ||
| *handle.init_args, **dict(handle.init_kwargs) | ||
| ) | ||
| return LocalReplica(instance, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honor replica setup for class handles
For class-based replica handles, _create_local instantiates the class and returns LocalReplica(instance, None), which skips the ReplicaConfig.setup hook entirely. As a result, @replica(setup=...) never runs for class handles, and any __call__ expecting the setup state (per the documented API) will fail at runtime because no state is injected. This also means the setup side effects (e.g., initializing shared resources) are silently skipped.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an impressive and extensive refactoring that significantly modernizes the library's core API to a JAX-style, graph-based execution model. The new architecture with clear separation between core and distributed components, along with the introduction of an extensible hook system, is a massive improvement. The code is well-structured and the new features like Bundle, switch, and Mesh are powerful additions.
My review focuses on a few high-impact areas:
- Security: The new gRPC backend introduces critical security vulnerabilities, including a remote code execution risk due to insecure deserialization with
pickleand a lack of transport encryption. These should be addressed with high priority. - Testing: The tests for the agent functionality have been removed, leaving a part of the library without test coverage.
- Documentation: A minor inconsistency was found in the README regarding an example filename.
Overall, this is a fantastic step forward for the library. Addressing the security and testing concerns will make it ready for wider adoption.
| """ | ||
| try: | ||
| # Deserialize arguments | ||
| args, kwargs = pickle.loads(request.args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Security Vulnerability: Remote Code Execution via Insecure Deserialization
The server deserializes arguments using pickle.loads() directly from a network request. pickle is not secure and can be used to execute arbitrary code. An attacker who can send requests to this gRPC server can achieve remote code execution on the server machine.
It's highly recommended to switch to a safe serialization format like JSON. If using pickle is a requirement, you must ensure that the service is only exposed to a completely trusted network and add prominent security warnings to the documentation about this vulnerability.
| # Lazy initialization of gRPC client | ||
| from ..grpc.client import GrpcClient | ||
|
|
||
| self._client = GrpcClient(self.endpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security Vulnerability: Unencrypted Communication
The GrpcClient is initialized using grpc.aio.insecure_channel, which means all communication between distributed nodes is unencrypted. This is a security regression, as the previous implementation appeared to support SSH for transport security.
For any production or multi-tenant environment, it's critical to use secure channels. Please add support for grpc.aio.secure_channel with TLS and consider making it the default configuration.
- Remove @node decorator from unpack() and use asyncio.gather() to call get() for each key - This allows tuple unpacking syntax to work during JIT tracing - Update bundle examples to use @ta.jit with unpack() - Add test case verifying unpack works with @ta.jit - Update documentation to reflect that unpack now works everywhere
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Key Changes
New Features:
unpack()method to bundle operations for extracting multiple fieldsunpack()to work with@ta.jitdecoratorArchitecture & Refactoring:
core/execution,core/graph,core/tracingTesting & Examples:
test_core.py,test_bundle_unpack.pyFiles Changed
core/execution/,core/graph/,core/tracing/Breaking Changes
This refactoring includes structural changes to the core API. Existing code using the distributed backend may need updates due to the simplified execution framework.