A sophisticated LangGraph-based agent for Python code execution in Databricks environments, specialized for financial analysis, data science, and quantitative methods.
The Python Agent is designed to:
- Generate and execute Python code based on natural language requests
- Perform financial analysis and modeling
- Execute machine learning workflows
- Conduct risk analysis and statistical computations
- Interface with Databricks Delta tables and vector search
- Ensure secure code execution with validation
The agent follows a structured workflow:
- Planning - Analyzes the request and creates an execution plan
- Code Generation - Generates Python code based on the plan
- Validation - Validates code for security and correctness
- Execution - Executes the code in a controlled environment
- Analysis - Analyzes results and provides insights
- Code Validation: All generated code is validated before execution
- Package Restrictions: Only allowed packages can be imported
- Forbidden Operations: Dangerous operations (exec, eval, file I/O) are blocked
- Sandboxed Execution: Code runs in a controlled environment
- Portfolio analysis and optimization
- Risk metrics calculation (VaR, CVaR, Sharpe ratio)
- Financial modeling (NPV, IRR, CAGR)
- Time series analysis
- Monte Carlo simulations
- Machine learning model training and evaluation
- Statistical analysis and hypothesis testing
- Feature engineering for financial data
- Model persistence with MLflow
- Automated EDA (Exploratory Data Analysis)
- Databricks Delta table access
- Vector search for document analysis
- Spark and Pandas DataFrame support
- MLflow integration for experiment tracking
from PYTHON import create_python_agent, DeploymentProfile
# Create a production agent
agent = create_python_agent(
catalog="finance",
database="analytics",
table_id_list=["stock_prices", "portfolio_data"],
profile=DeploymentProfile.PRODUCTION
)from PYTHON import PythonAgentConfig, PythonAgent
# Create custom configuration
config = PythonAgentConfig(
profile=DeploymentProfile.DEVELOPMENT,
table_id_list=["table1", "table2"],
vector_index_list=["doc_index"]
)
# Customize execution settings
config.python_execution.max_execution_time = 600 # 10 minutes
config.python_execution.max_memory_mb = 4096 # 4GB
# Add allowed packages
config.add_allowed_package("scipy")
config.add_allowed_package("tensorflow")
# Create agent with custom config
agent = PythonAgent(config)from scalata_agent import ChatAgentMessage
messages = [
ChatAgentMessage(
role="user",
content="Calculate the compound annual growth rate for an investment that grew from $10,000 to $25,000 over 5 years"
)
]
response = agent.predict(messages)
print(response.content)messages = [
ChatAgentMessage(
role="user",
content="""
Using the stock_prices table:
1. Calculate daily returns
2. Compute portfolio volatility
3. Calculate Sharpe ratio with 2% risk-free rate
4. Generate a risk-return scatter plot
"""
)
]
response = agent.predict(messages)messages = [
ChatAgentMessage(
role="user",
content="""
Build a credit risk model:
1. Load credit_data table
2. Perform feature engineering
3. Split data 80/20
4. Train XGBoost classifier
5. Evaluate with ROC-AUC
6. Save model with MLflow
"""
)
]
response = agent.predict(messages)# Stream execution progress
for chunk in agent.predict_stream(messages):
print(f"[{chunk.metadata['step']}] {chunk.content}")The agent allows the following packages by default:
Data Manipulation
- pandas, numpy, pyspark, databricks
Financial Libraries
- quantlib, yfinance, pandas_ta, ta
Machine Learning
- sklearn, xgboost, lightgbm, statsmodels, scipy, mlflow
Visualization
- matplotlib, seaborn, plotly
Utilities
- datetime, math, statistics, json, re
Databricks Specific
- delta, databricks.sdk, databricks.vector_search
- No file system operations (open, write)
- No system calls or subprocess execution
- No network requests except Databricks APIs
- No exec() or eval() functions
- No importing of non-allowed packages
- AST parsing to detect forbidden operations
- Import validation against allowed packages
- Security pattern matching
- LLM-based validation for complex cases
Required:
DATABRICKS_HOST: Databricks workspace URLDATABRICKS_TOKEN: Authentication token
Optional:
DATABRICKS_CATALOG: Default catalogDATABRICKS_DATABASE: Default databaseLLM_TEMPERATURE: Model temperature (0.0-2.0)PYTHON_MAX_EXECUTION_TIME: Max execution time in secondsDEPLOYMENT_PROFILE: Profile (development/staging/production)
The agent provides comprehensive error handling:
response = agent.predict(messages)
if response.metadata.get('execution_success'):
print("Execution successful!")
print(f"Result: {response.content}")
else:
print("Execution failed!")
print(f"Error: {response.metadata.get('error')}")- Prevents cascading failures
- Configurable thresholds and timeouts
- Separate breakers for LLM and execution
- Results caching for repeated queries
- Configurable TTL
- Memory-efficient storage
- Execution time limits
- Memory usage limits
- Result size limits
The agent automatically logs:
- Generated code
- Execution results
- Performance metrics
- Error information
# MLflow tracking is automatic
response = agent.predict(messages)
# Check MLflow UI for experiment trackingRun the test suite:
python -m pytest src/PYTHON/test_python_agent.pyRun the demo:
python src/PYTHON/demo_python_agent.py-
Import Error: Package not in allowed list
- Solution: Add package using
config.add_allowed_package()
- Solution: Add package using
-
Execution Timeout: Code takes too long
- Solution: Increase
max_execution_timein config
- Solution: Increase
-
Memory Error: Result too large
- Solution: Increase
max_memory_mbor reduce data size
- Solution: Increase
-
Validation Failed: Security issue detected
- Solution: Review code for forbidden operations
Enable debug logging:
config = PythonAgentConfig(profile=DeploymentProfile.DEVELOPMENT)
config.verbose = True
agent = PythonAgent(config)- Use Specific Requests: Be clear about what you want to calculate
- Specify Data Sources: Mention table names explicitly
- Set Reasonable Limits: Use LIMIT clauses for large datasets
- Handle Errors: Check execution_success in metadata
- Monitor Resources: Watch execution time and memory usage
When extending the Python agent:
- Follow the established patterns from SQL agent
- Add security validation for new features
- Include comprehensive error handling
- Add unit tests for new functionality
- Update documentation
This agent is part of the Scalata.ai platform and follows the same licensing terms.