Skip to content

Add python_script node code generation with flowfile API rewriting#318

Open
Edwardvaneechoud wants to merge 5 commits intofeauture/kernel-implementationfrom
claude/export-python-script-nodes-ZXprP
Open

Add python_script node code generation with flowfile API rewriting#318
Edwardvaneechoud wants to merge 5 commits intofeauture/kernel-implementationfrom
claude/export-python-script-nodes-ZXprP

Conversation

@Edwardvaneechoud
Copy link
Owner

Summary

This PR adds support for generating executable Python code from python_script nodes by rewriting flowfile API calls into plain Python equivalents. This enables python_script nodes to be converted to standalone functions that can run outside Docker kernel containers.

Key Changes

  • New module python_script_rewriter.py: Implements AST-based analysis and transformation of flowfile API calls

    • analyze_flowfile_usage(): Parses user code to detect flowfile API patterns, artifact dependencies, and unsupported calls
    • rewrite_flowfile_calls(): Transforms flowfile.* calls to plain Python (e.g., flowfile.read_artifact("x")_artifacts["x"])
    • build_function_code(): Wraps rewritten code into a function definition with proper parameters and return statements
    • extract_imports(): Extracts non-flowfile imports from user code
    • get_required_packages(): Cross-references user imports with kernel package requirements
  • Updated code_generator.py:

    • Added _handle_python_script() method to process python_script nodes
    • Tracks published artifacts across nodes for validation of downstream dependencies
    • Validates artifact availability and detects unsupported API patterns
    • Emits kernel package requirements as comments
    • Added _published_artifacts and _has_python_script_nodes state tracking
    • Initializes shared _artifacts = {} store in generated code when needed
  • Updated __init__.py: Exports new rewriter functions and classes for public API

Implementation Details

  • Flowfile API Mapping:

    • flowfile.read_input() → function parameter input_df
    • flowfile.publish_output(expr)return statement
    • flowfile.publish_artifact("name", obj)_artifacts["name"] = obj
    • flowfile.read_artifact("name")_artifacts["name"]
    • flowfile.log(msg, level)print(f"[{level}] {msg}")
  • Validation: Detects and reports:

    • Dynamic artifact names (non-string literals)
    • Unsupported API calls (display, publish_global, etc.)
    • Missing artifact dependencies from upstream nodes
    • Syntax errors in user code
  • Code Generation: Produces properly typed function signatures with LazyFrame parameters and return types, maintaining compatibility with the polars-based pipeline architecture

Implement AST-based rewriting of flowfile.* API calls in python_script
nodes to plain Python equivalents, enabling code export for flows that
use kernel-based Python execution.

New module python_script_rewriter.py handles:
- flowfile.read_input() → function parameter (input_df)
- flowfile.read_inputs() → function parameter (inputs dict)
- flowfile.publish_output(expr) → return statement
- flowfile.publish_artifact/read_artifact/delete_artifact → _artifacts dict
- flowfile.log() → print()
- Package dependency detection from kernel config

Integration with FlowGraphToPolarsConverter:
- New _handle_python_script handler method
- Artifact tracking across nodes for validation
- _artifacts = {} emitted in generated code when needed
- Graceful error handling for unsupported patterns

https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
Artifacts are now keyed per kernel in _artifacts, matching the runtime
where each kernel container has its own independent artifact store.
Cross-kernel artifact access is validated and rejected at code gen time.

Also fixes read_inputs() to produce dict[str, list[pl.LazyFrame]]
matching the runtime API where each input name maps to a list of
LazyFrames (multiple connections can share a name). Input vars with
suffixed names (main_0, main_1) are grouped under their base name.

https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
…into claude/export-python-script-nodes-ZXprP
Unsupported calls (publish_global, display, etc.) and dynamic artifact
names no longer block code generation. Instead, the generated function
includes WARNING comments so users can see what won't work outside the
kernel runtime.

Also fixes mixed read_input/read_inputs usage: when both are present,
read_input() is rewritten to inputs["main"][0] so it stays valid in the
multi-input function signature.

https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
Unsupported calls like flowfile.publish_global() are now replaced with
inline comments showing the original call, so users can see what was
skipped and why. Uses a marker-based approach to survive AST round-trips.

https://claude.ai/code/session_01Cn56TDT4iPpFpgFL8Fp1pn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants