Add python_script node code generation with flowfile API rewriting#318
Open
Edwardvaneechoud wants to merge 5 commits intofeauture/kernel-implementationfrom
Open
Add python_script node code generation with flowfile API rewriting#318Edwardvaneechoud wants to merge 5 commits intofeauture/kernel-implementationfrom
Edwardvaneechoud wants to merge 5 commits intofeauture/kernel-implementationfrom
Conversation
Implement AST-based rewriting of flowfile.* API calls in python_script
nodes to plain Python equivalents, enabling code export for flows that
use kernel-based Python execution.
New module python_script_rewriter.py handles:
- flowfile.read_input() → function parameter (input_df)
- flowfile.read_inputs() → function parameter (inputs dict)
- flowfile.publish_output(expr) → return statement
- flowfile.publish_artifact/read_artifact/delete_artifact → _artifacts dict
- flowfile.log() → print()
- Package dependency detection from kernel config
Integration with FlowGraphToPolarsConverter:
- New _handle_python_script handler method
- Artifact tracking across nodes for validation
- _artifacts = {} emitted in generated code when needed
- Graceful error handling for unsupported patterns
https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
Artifacts are now keyed per kernel in _artifacts, matching the runtime where each kernel container has its own independent artifact store. Cross-kernel artifact access is validated and rejected at code gen time. Also fixes read_inputs() to produce dict[str, list[pl.LazyFrame]] matching the runtime API where each input name maps to a list of LazyFrames (multiple connections can share a name). Input vars with suffixed names (main_0, main_1) are grouped under their base name. https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
…into claude/export-python-script-nodes-ZXprP
Unsupported calls (publish_global, display, etc.) and dynamic artifact names no longer block code generation. Instead, the generated function includes WARNING comments so users can see what won't work outside the kernel runtime. Also fixes mixed read_input/read_inputs usage: when both are present, read_input() is rewritten to inputs["main"][0] so it stays valid in the multi-input function signature. https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
Unsupported calls like flowfile.publish_global() are now replaced with inline comments showing the original call, so users can see what was skipped and why. Uses a marker-based approach to survive AST round-trips. https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for generating executable Python code from
python_scriptnodes by rewriting flowfile API calls into plain Python equivalents. This enables python_script nodes to be converted to standalone functions that can run outside Docker kernel containers.Key Changes
New module
python_script_rewriter.py: Implements AST-based analysis and transformation of flowfile API callsanalyze_flowfile_usage(): Parses user code to detect flowfile API patterns, artifact dependencies, and unsupported callsrewrite_flowfile_calls(): Transforms flowfile.* calls to plain Python (e.g.,flowfile.read_artifact("x")→_artifacts["x"])build_function_code(): Wraps rewritten code into a function definition with proper parameters and return statementsextract_imports(): Extracts non-flowfile imports from user codeget_required_packages(): Cross-references user imports with kernel package requirementsUpdated
code_generator.py:_handle_python_script()method to process python_script nodes_published_artifactsand_has_python_script_nodesstate tracking_artifacts = {}store in generated code when neededUpdated
__init__.py: Exports new rewriter functions and classes for public APIImplementation Details
Flowfile API Mapping:
flowfile.read_input()→ function parameterinput_dfflowfile.publish_output(expr)→returnstatementflowfile.publish_artifact("name", obj)→_artifacts["name"] = objflowfile.read_artifact("name")→_artifacts["name"]flowfile.log(msg, level)→print(f"[{level}] {msg}")Validation: Detects and reports:
Code Generation: Produces properly typed function signatures with LazyFrame parameters and return types, maintaining compatibility with the polars-based pipeline architecture