Skip to content

Agent Integration Prompts #10

@sdley

Description

@sdley

For src/agents/, extend existing DQN/PPO with new algos (Objective 1).

  • Extend agents/base_agent.py with A2C agent using PyTorch. Policy/value nets share GAT encoder from encoders/. Actor-critic: multiple envs parallel rollout, synchronous update with advantage A = R + gamma V(s') - V(s). EVRP action mask: logits[mask] = -inf. Train loop: sample actions, env step, compute returns/advs, PPO-clip or A2C loss. Config from YAML: algo='a2c', lr=3e-4.

  • Implement SAC agent in agents/sac_agent.py for continuous exploration in discrete EVRP (Gumbel-Softmax actions?). Off-policy: actor/critic/Q-nets with encoder. Max entropy: KL(Q - alpha log pi). Replay buffer for EVRP episodes. Reference proposal SAC . YAML config: tau=0.005, alpha='auto'.

  • Refactor existing DQN/PPO to AgentFactory: load YAML config['agent'], instantiate with encoder/reward_fn. Train script train.py: load config, env=EVREnv(), agent=factory(config), loop episodes/train_steps.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions