Get up and running with autonomous workflow management in 10 minutes
By the end of this guide, you'll have:
- A working manager agent that orchestrates complex workflows
- Understanding of how AI agents coordinate and execute tasks
- A complete example running on your machine
- Python 3.12+
- uv package manager
- OpenAI API key (get one at platform.openai.com)
# Clone the repository
git clone https://github.com/your-org/manager-agent-gym
cd manager-agent-gym
# Install with uv (recommended)
uv pip install -e .
# Alternative: Install with pip
pip install -e .# Copy the example environment file
cp .env.example .env
# Edit .env file with your API keys
# The file should contain:
# OPENAI_API_KEY=sk-your-key-here
# ANTHROPIC_API_KEY=sk-ant-your-key-here # OptionalNote: The library uses
pydantic-settingswhich automatically picks up variables from the.envfile - no need to export environment variables manually.
# Run the hello world example
python examples/getting_started/hello_manager_agent.py
# Or run simulations via the CLI (recommended)
python -m examples.cliYou should see output like:
π Welcome to Manager Agent Gym!
π Creating workflow...
β
Created workflow 'ICAAP Workflow' with 8 tasks
π₯ Setting up agent registry...
β
Registered 4 agents
π§ Creating manager agent...
β
Manager agent created with quality-focused preferences
π Setting up execution engine...
β
Execution engine ready
π¬ Starting workflow execution...
Your first manager agent just:
- π Analyzed a complex workflow with 8 interconnected tasks
- π§ Made strategic decisions about task assignment and timing
- π₯ Coordinated a team of AI and simulated human agents
- βοΈ Balanced multiple objectives (quality, time, cost, oversight)
- π Tracked progress through discrete timesteps
Let's break down the key components:
from manager_agent_gym import ChainOfThoughtManagerAgent, PreferenceWeights, Preference
# Define what the manager cares about
preferences = PreferenceWeights(
preferences=[
Preference(name="quality", weight=0.4, description="High-quality deliverables"),
Preference(name="time", weight=0.3, description="Reasonable timeline"),
Preference(name="cost", weight=0.2, description="Cost-effective execution"),
Preference(name="oversight", weight=0.1, description="Manageable oversight"),
]
)
# Create the AI manager
manager = ChainOfThoughtManagerAgent(
preferences=preferences,
model_name="gpt-4o", # Choose your LLM
manager_persona="Strategic Project Coordinator"
)from manager_agent_gym import WorkflowExecutionEngine, AgentRegistry
# Set up the execution environment
engine = WorkflowExecutionEngine(
workflow=workflow, # The work to be done
agent_registry=agent_registry, # Available workers
manager_agent=manager, # The AI manager
stakeholder_agent=stakeholder, # The stakeholder
max_timesteps=20, # Maximum simulation steps
seed=42 # For reproducible results
)
# Run the simulation
results = await engine.run_full_execution()# Strategic LLM-based manager (default)
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o")
# Random baseline for comparison
from manager_agent_gym.core.manager_agent import RandomManagerAgentV2
manager = RandomManagerAgentV2(preferences=prefs, seed=42)
# Simple one-shot delegation
from manager_agent_gym.core.manager_agent import OneShotDelegateManagerAgent
manager = OneShotDelegateManagerAgent(preferences=prefs)# OpenAI models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o")
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="o3")
# Anthropic models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="claude-3-5-sonnet")
# Google models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gemini-2.0-flash")# Quality-focused preferences
quality_focused = PreferenceWeights(preferences=[
Preference(name="quality", weight=0.6, description="Exceptional deliverables"),
Preference(name="time", weight=0.2, description="Reasonable timeline"),
Preference(name="cost", weight=0.1, description="Cost consideration"),
Preference(name="oversight", weight=0.1, description="Minimal oversight"),
])
# Speed-focused preferences
speed_focused = PreferenceWeights(preferences=[
Preference(name="time", weight=0.5, description="Fast delivery"),
Preference(name="quality", weight=0.3, description="Adequate quality"),
Preference(name="cost", weight=0.1, description="Cost consideration"),
Preference(name="oversight", weight=0.1, description="Minimal oversight"),
])
# Cost-focused preferences
cost_focused = PreferenceWeights(preferences=[
Preference(name="cost", weight=0.5, description="Minimize expenses"),
Preference(name="quality", weight=0.2, description="Acceptable quality"),
Preference(name="time", weight=0.2, description="Reasonable timeline"),
Preference(name="oversight", weight=0.1, description="Efficient oversight"),
])# Interactive example selector with full scenario menu
python -m examples.cliThis opens an interactive menu where you can:
- Choose from 20+ realistic business scenarios
- Select different manager types and models
- Run parallel experiments
- Compare results across configurations
The CLI is the recommended way to run simulations as it provides the most comprehensive interface for experimentation.
# Run a banking compliance workflow
python -m examples.cli --scenarios banking_license_application --manager-mode cot
# Run multiple scenarios in parallel
python -m examples.cli \
--scenarios data_science_analytics marketing_campaign \
--manager-mode cot \
--model-name gpt-4o \
--parallel-jobs 2
# Compare different manager types
python -m examples.cli \
--scenarios icaap \
--manager-mode cot random \
--model-name gpt-4ofrom examples.run_examples import run_demo
# Run a specific scenario
results = await run_demo(
workflow_name="data_science_analytics", # ML model development workflow
max_timesteps=30,
model_name="gpt-4o",
manager_agent_mode="cot",
seed=42
)
# Analyze results
print(f"Completion rate: {results.completion_rate:.1%}")
print(f"Total cost: ${results.total_cost:.2f}")
print(f"Manager actions taken: {len(results.manager_actions)}")After running an example, you'll see:
π SUMMARY:
β’ Total timesteps: 15
β’ Tasks completed: 8/8
β’ Completion rate: 100.0%
β’ Final execution state: COMPLETED
π§ MANAGER ACTIONS TAKEN:
β’ assign_task: 5 times
β’ refine_task: 2 times
β’ send_message: 3 times
β’ create_task: 1 times
- Completion rate: Percentage of tasks successfully finished
- Timesteps: Discrete simulation steps taken
- Manager actions: Types and frequency of decisions made
- Cost tracking: Estimated and actual costs
- Quality scores: Evaluation against preferences
The manager agent observes the workflow state and makes strategic decisions without human intervention.
Balances competing goals like quality vs. speed vs. cost based on your preferences.
Adapts to changing conditions, task failures, and new requirements in real-time.
Models human agent availability, AI agent capabilities, and real-world constraints.
Tracks multiple metrics beyond just task completion.
- Financial Services:
banking_license_application,icaap,orsa - Legal & Compliance:
legal_global_data_breach,legal_contract_negotiation - Technology:
genai_feature_launch,data_science_analytics - Marketing:
marketing_campaign,brand_crisis_management
- Create your own workflows by extending the
Workflowclass - Define custom preferences for your specific domain
- Add specialized agents with domain-specific capabilities
- Implement custom evaluation metrics for your success criteria
- Read the full Library Documentation
- Explore the research paper (
paper.md) - Check out advanced examples in
examples/ - Review the API reference in
docs/
# Use faster models for experimentation
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
# Limit timesteps for faster iteration
engine = WorkflowExecutionEngine(..., max_timesteps=10)
# Run multiple scenarios in parallel
python -m examples.cli --parallel-jobs 4# Enable detailed logging
engine = WorkflowExecutionEngine(..., enable_timestep_logging=True)
# Save outputs for analysis
engine = WorkflowExecutionEngine(..., output_config=OutputConfig(base_dir="my_results/"))
# Use deterministic seeds
engine = WorkflowExecutionEngine(..., seed=42)# Use cost-effective models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
# Monitor token usage in outputs
# Check the execution results for API cost trackingAPI Key Not Found
# Check your .env file exists and contains your API key
cat .env
# Should show: OPENAI_API_KEY=sk-your-key-here
# If missing, copy from example and edit
cp .env.example .env
# Edit .env with your actual API keysModule Import Errors
# Ensure you're in the project directory
cd manager-agent-gym
# Reinstall with uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .Slow Execution
- Use smaller models (
gpt-4o-mini) - Reduce
max_timesteps - Check your internet connection
Out of API Credits
- Check your OpenAI usage at platform.openai.com
- Consider using smaller models for testing
You now have a working autonomous manager agent system! The AI manager can:
- π§© Break down complex goals into manageable tasks
- π₯ Coordinate teams of specialized agents
- βοΈ Balance multiple competing objectives
- π Adapt to changing conditions in real-time
- π Maintain governance and compliance
What's next? Try different scenarios, experiment with preferences, and see how the manager adapts to various challenges!
Happy orchestrating! πΌ