Python Code Execution Agent

A sophisticated LangGraph-based agent for Python code execution in Databricks environments, specialized for financial analysis, data science, and quantitative methods.

Overview

The Python Agent is designed to:

Generate and execute Python code based on natural language requests
Perform financial analysis and modeling
Execute machine learning workflows
Conduct risk analysis and statistical computations
Interface with Databricks Delta tables and vector search
Ensure secure code execution with validation

Architecture

The agent follows a structured workflow:

Planning - Analyzes the request and creates an execution plan
Code Generation - Generates Python code based on the plan
Validation - Validates code for security and correctness
Execution - Executes the code in a controlled environment
Analysis - Analyzes results and provides insights

Key Features

Security

Code Validation: All generated code is validated before execution
Package Restrictions: Only allowed packages can be imported
Forbidden Operations: Dangerous operations (exec, eval, file I/O) are blocked
Sandboxed Execution: Code runs in a controlled environment

Financial Capabilities

Portfolio analysis and optimization
Risk metrics calculation (VaR, CVaR, Sharpe ratio)
Financial modeling (NPV, IRR, CAGR)
Time series analysis
Monte Carlo simulations

Data Science Features

Machine learning model training and evaluation
Statistical analysis and hypothesis testing
Feature engineering for financial data
Model persistence with MLflow
Automated EDA (Exploratory Data Analysis)

Integration

Databricks Delta table access
Vector search for document analysis
Spark and Pandas DataFrame support
MLflow integration for experiment tracking

Configuration

Basic Configuration

from PYTHON import create_python_agent, DeploymentProfile

# Create a production agent
agent = create_python_agent(
    catalog="finance",
    database="analytics",
    table_id_list=["stock_prices", "portfolio_data"],
    profile=DeploymentProfile.PRODUCTION
)

Advanced Configuration

from PYTHON import PythonAgentConfig, PythonAgent

# Create custom configuration
config = PythonAgentConfig(
    profile=DeploymentProfile.DEVELOPMENT,
    table_id_list=["table1", "table2"],
    vector_index_list=["doc_index"]
)

# Customize execution settings
config.python_execution.max_execution_time = 600  # 10 minutes
config.python_execution.max_memory_mb = 4096  # 4GB

# Add allowed packages
config.add_allowed_package("scipy")
config.add_allowed_package("tensorflow")

# Create agent with custom config
agent = PythonAgent(config)

Usage Examples

Basic Calculation

from scalata_agent import ChatAgentMessage

messages = [
    ChatAgentMessage(
        role="user",
        content="Calculate the compound annual growth rate for an investment that grew from $10,000 to $25,000 over 5 years"
    )
]

response = agent.predict(messages)
print(response.content)

Financial Analysis

messages = [
    ChatAgentMessage(
        role="user",
        content="""
        Using the stock_prices table:
        1. Calculate daily returns
        2. Compute portfolio volatility
        3. Calculate Sharpe ratio with 2% risk-free rate
        4. Generate a risk-return scatter plot
        """
    )
]

response = agent.predict(messages)

Machine Learning

messages = [
    ChatAgentMessage(
        role="user",
        content="""
        Build a credit risk model:
        1. Load credit_data table
        2. Perform feature engineering
        3. Split data 80/20
        4. Train XGBoost classifier
        5. Evaluate with ROC-AUC
        6. Save model with MLflow
        """
    )
]

response = agent.predict(messages)

Streaming Responses

# Stream execution progress
for chunk in agent.predict_stream(messages):
    print(f"[{chunk.metadata['step']}] {chunk.content}")

Allowed Packages

The agent allows the following packages by default:

Data Manipulation

pandas, numpy, pyspark, databricks

Financial Libraries

quantlib, yfinance, pandas_ta, ta

Machine Learning

sklearn, xgboost, lightgbm, statsmodels, scipy, mlflow

Visualization

matplotlib, seaborn, plotly

Utilities

datetime, math, statistics, json, re

Databricks Specific

delta, databricks.sdk, databricks.vector_search

Security Considerations

Forbidden Operations

No file system operations (open, write)
No system calls or subprocess execution
No network requests except Databricks APIs
No exec() or eval() functions
No importing of non-allowed packages

Code Validation Process

AST parsing to detect forbidden operations
Import validation against allowed packages
Security pattern matching
LLM-based validation for complex cases

Environment Variables

Required:

DATABRICKS_HOST: Databricks workspace URL
DATABRICKS_TOKEN: Authentication token

Optional:

DATABRICKS_CATALOG: Default catalog
DATABRICKS_DATABASE: Default database
LLM_TEMPERATURE: Model temperature (0.0-2.0)
PYTHON_MAX_EXECUTION_TIME: Max execution time in seconds
DEPLOYMENT_PROFILE: Profile (development/staging/production)

Error Handling

The agent provides comprehensive error handling:

response = agent.predict(messages)

if response.metadata.get('execution_success'):
    print("Execution successful!")
    print(f"Result: {response.content}")
else:
    print("Execution failed!")
    print(f"Error: {response.metadata.get('error')}")

Performance Optimization

Circuit Breakers

Prevents cascading failures
Configurable thresholds and timeouts
Separate breakers for LLM and execution

Caching

Results caching for repeated queries
Configurable TTL
Memory-efficient storage

Resource Limits

Execution time limits
Memory usage limits
Result size limits

Integration with MLflow

The agent automatically logs:

Generated code
Execution results
Performance metrics
Error information

# MLflow tracking is automatic
response = agent.predict(messages)
# Check MLflow UI for experiment tracking

Testing

Run the test suite:

python -m pytest src/PYTHON/test_python_agent.py

Run the demo:

python src/PYTHON/demo_python_agent.py

Troubleshooting

Common Issues

Import Error: Package not in allowed list
- Solution: Add package using config.add_allowed_package()
Execution Timeout: Code takes too long
- Solution: Increase max_execution_time in config
Memory Error: Result too large
- Solution: Increase max_memory_mb or reduce data size
Validation Failed: Security issue detected
- Solution: Review code for forbidden operations

Debug Mode

Enable debug logging:

config = PythonAgentConfig(profile=DeploymentProfile.DEVELOPMENT)
config.verbose = True
agent = PythonAgent(config)

Best Practices

Use Specific Requests: Be clear about what you want to calculate
Specify Data Sources: Mention table names explicitly
Set Reasonable Limits: Use LIMIT clauses for large datasets
Handle Errors: Check execution_success in metadata
Monitor Resources: Watch execution time and memory usage

Contributing

When extending the Python agent:

Follow the established patterns from SQL agent
Add security validation for new features
Include comprehensive error handling
Add unit tests for new functionality
Update documentation

License

This agent is part of the Scalata.ai platform and follows the same licensing terms.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
.gitignore		.gitignore
68507553b2c0ab37323fe2d1.xlsx		68507553b2c0ab37323fe2d1.xlsx
AI_EXCEL_GUIDE.md		AI_EXCEL_GUIDE.md
EXCEL_ANALYSIS_README.md		EXCEL_ANALYSIS_README.md
FINAL_SYSTEM_SUMMARY.md		FINAL_SYSTEM_SUMMARY.md
QA_SYSTEM_GUIDE.md		QA_SYSTEM_GUIDE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SYSTEM_SUMMARY.md		SYSTEM_SUMMARY.md
TEST_RESULTS.md		TEST_RESULTS.md
USAGE_GUIDE.md		USAGE_GUIDE.md
Untitled Notebook 2025-07-22 13_15_17.py		Untitled Notebook 2025-07-22 13_15_17.py
__init__.py		__init__.py
ai_excel_analyzer.py		ai_excel_analyzer.py
ai_excel_interactive.py		ai_excel_interactive.py
counter_store.py		counter_store.py
databricks.yml		databricks.yml
demo_python_agent.py		demo_python_agent.py
excel_agent.py		excel_agent.py
excel_agent_fixed.py		excel_agent_fixed.py
excel_column_analysis.py		excel_column_analysis.py
excel_data_operations.py		excel_data_operations.py
excel_demo.py		excel_demo.py
excel_output_handler.py		excel_output_handler.py
excel_parser.py		excel_parser.py
excel_prediction_engine.py		excel_prediction_engine.py
excel_qa_system.py		excel_qa_system.py
interactive_excel_agent.py		interactive_excel_agent.py
interactive_qa.py		interactive_qa.py
logging_config.py		logging_config.py
pandas_test.py		pandas_test.py
python_agent.py		python_agent.py
python_config.py		python_config.py
python_prompts.py		python_prompts.py
python_tools.py		python_tools.py
response_formatter.py		response_formatter.py
run_excel_agent.sh		run_excel_agent.sh
simple_test.py		simple_test.py
standalone_test.py		standalone_test.py
test_ai_system.py		test_ai_system.py
test_excel_system.py		test_excel_system.py
test_python_agent.py		test_python_agent.py

vj-m/scallata-assignment

Folders and files

Latest commit

History

Repository files navigation