Skip to content

vj-m/scallata-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Code Execution Agent

A sophisticated LangGraph-based agent for Python code execution in Databricks environments, specialized for financial analysis, data science, and quantitative methods.

Overview

The Python Agent is designed to:

  • Generate and execute Python code based on natural language requests
  • Perform financial analysis and modeling
  • Execute machine learning workflows
  • Conduct risk analysis and statistical computations
  • Interface with Databricks Delta tables and vector search
  • Ensure secure code execution with validation

Architecture

The agent follows a structured workflow:

  1. Planning - Analyzes the request and creates an execution plan
  2. Code Generation - Generates Python code based on the plan
  3. Validation - Validates code for security and correctness
  4. Execution - Executes the code in a controlled environment
  5. Analysis - Analyzes results and provides insights

Key Features

Security

  • Code Validation: All generated code is validated before execution
  • Package Restrictions: Only allowed packages can be imported
  • Forbidden Operations: Dangerous operations (exec, eval, file I/O) are blocked
  • Sandboxed Execution: Code runs in a controlled environment

Financial Capabilities

  • Portfolio analysis and optimization
  • Risk metrics calculation (VaR, CVaR, Sharpe ratio)
  • Financial modeling (NPV, IRR, CAGR)
  • Time series analysis
  • Monte Carlo simulations

Data Science Features

  • Machine learning model training and evaluation
  • Statistical analysis and hypothesis testing
  • Feature engineering for financial data
  • Model persistence with MLflow
  • Automated EDA (Exploratory Data Analysis)

Integration

  • Databricks Delta table access
  • Vector search for document analysis
  • Spark and Pandas DataFrame support
  • MLflow integration for experiment tracking

Configuration

Basic Configuration

from PYTHON import create_python_agent, DeploymentProfile

# Create a production agent
agent = create_python_agent(
    catalog="finance",
    database="analytics",
    table_id_list=["stock_prices", "portfolio_data"],
    profile=DeploymentProfile.PRODUCTION
)

Advanced Configuration

from PYTHON import PythonAgentConfig, PythonAgent

# Create custom configuration
config = PythonAgentConfig(
    profile=DeploymentProfile.DEVELOPMENT,
    table_id_list=["table1", "table2"],
    vector_index_list=["doc_index"]
)

# Customize execution settings
config.python_execution.max_execution_time = 600  # 10 minutes
config.python_execution.max_memory_mb = 4096  # 4GB

# Add allowed packages
config.add_allowed_package("scipy")
config.add_allowed_package("tensorflow")

# Create agent with custom config
agent = PythonAgent(config)

Usage Examples

Basic Calculation

from scalata_agent import ChatAgentMessage

messages = [
    ChatAgentMessage(
        role="user",
        content="Calculate the compound annual growth rate for an investment that grew from $10,000 to $25,000 over 5 years"
    )
]

response = agent.predict(messages)
print(response.content)

Financial Analysis

messages = [
    ChatAgentMessage(
        role="user",
        content="""
        Using the stock_prices table:
        1. Calculate daily returns
        2. Compute portfolio volatility
        3. Calculate Sharpe ratio with 2% risk-free rate
        4. Generate a risk-return scatter plot
        """
    )
]

response = agent.predict(messages)

Machine Learning

messages = [
    ChatAgentMessage(
        role="user",
        content="""
        Build a credit risk model:
        1. Load credit_data table
        2. Perform feature engineering
        3. Split data 80/20
        4. Train XGBoost classifier
        5. Evaluate with ROC-AUC
        6. Save model with MLflow
        """
    )
]

response = agent.predict(messages)

Streaming Responses

# Stream execution progress
for chunk in agent.predict_stream(messages):
    print(f"[{chunk.metadata['step']}] {chunk.content}")

Allowed Packages

The agent allows the following packages by default:

Data Manipulation

  • pandas, numpy, pyspark, databricks

Financial Libraries

  • quantlib, yfinance, pandas_ta, ta

Machine Learning

  • sklearn, xgboost, lightgbm, statsmodels, scipy, mlflow

Visualization

  • matplotlib, seaborn, plotly

Utilities

  • datetime, math, statistics, json, re

Databricks Specific

  • delta, databricks.sdk, databricks.vector_search

Security Considerations

Forbidden Operations

  • No file system operations (open, write)
  • No system calls or subprocess execution
  • No network requests except Databricks APIs
  • No exec() or eval() functions
  • No importing of non-allowed packages

Code Validation Process

  1. AST parsing to detect forbidden operations
  2. Import validation against allowed packages
  3. Security pattern matching
  4. LLM-based validation for complex cases

Environment Variables

Required:

  • DATABRICKS_HOST: Databricks workspace URL
  • DATABRICKS_TOKEN: Authentication token

Optional:

  • DATABRICKS_CATALOG: Default catalog
  • DATABRICKS_DATABASE: Default database
  • LLM_TEMPERATURE: Model temperature (0.0-2.0)
  • PYTHON_MAX_EXECUTION_TIME: Max execution time in seconds
  • DEPLOYMENT_PROFILE: Profile (development/staging/production)

Error Handling

The agent provides comprehensive error handling:

response = agent.predict(messages)

if response.metadata.get('execution_success'):
    print("Execution successful!")
    print(f"Result: {response.content}")
else:
    print("Execution failed!")
    print(f"Error: {response.metadata.get('error')}")

Performance Optimization

Circuit Breakers

  • Prevents cascading failures
  • Configurable thresholds and timeouts
  • Separate breakers for LLM and execution

Caching

  • Results caching for repeated queries
  • Configurable TTL
  • Memory-efficient storage

Resource Limits

  • Execution time limits
  • Memory usage limits
  • Result size limits

Integration with MLflow

The agent automatically logs:

  • Generated code
  • Execution results
  • Performance metrics
  • Error information
# MLflow tracking is automatic
response = agent.predict(messages)
# Check MLflow UI for experiment tracking

Testing

Run the test suite:

python -m pytest src/PYTHON/test_python_agent.py

Run the demo:

python src/PYTHON/demo_python_agent.py

Troubleshooting

Common Issues

  1. Import Error: Package not in allowed list

    • Solution: Add package using config.add_allowed_package()
  2. Execution Timeout: Code takes too long

    • Solution: Increase max_execution_time in config
  3. Memory Error: Result too large

    • Solution: Increase max_memory_mb or reduce data size
  4. Validation Failed: Security issue detected

    • Solution: Review code for forbidden operations

Debug Mode

Enable debug logging:

config = PythonAgentConfig(profile=DeploymentProfile.DEVELOPMENT)
config.verbose = True
agent = PythonAgent(config)

Best Practices

  1. Use Specific Requests: Be clear about what you want to calculate
  2. Specify Data Sources: Mention table names explicitly
  3. Set Reasonable Limits: Use LIMIT clauses for large datasets
  4. Handle Errors: Check execution_success in metadata
  5. Monitor Resources: Watch execution time and memory usage

Contributing

When extending the Python agent:

  1. Follow the established patterns from SQL agent
  2. Add security validation for new features
  3. Include comprehensive error handling
  4. Add unit tests for new functionality
  5. Update documentation

License

This agent is part of the Scalata.ai platform and follows the same licensing terms.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published