Agent Integration Prompts

For src/agents/, extend existing DQN/PPO with new algos (Objective 1).

- Extend agents/base_agent.py with A2C agent using PyTorch. Policy/value nets share GAT encoder from encoders/. Actor-critic: multiple envs parallel rollout, synchronous update with advantage A = R + gamma V(s') - V(s). EVRP action mask: logits[mask] = -inf. Train loop: sample actions, env step, compute returns/advs, PPO-clip or A2C loss. Config from YAML: algo='a2c', lr=3e-4.

- Implement SAC agent in agents/sac_agent.py for continuous exploration in discrete EVRP (Gumbel-Softmax actions?). Off-policy: actor/critic/Q-nets with encoder. Max entropy: KL(Q - alpha log pi). Replay buffer for EVRP episodes. Reference proposal SAC . YAML config: tau=0.005, alpha='auto'.

- Refactor existing DQN/PPO to AgentFactory: load YAML config['agent'], instantiate with encoder/reward_fn. Train script train.py: load config, env=EVREnv(), agent=factory(config), loop episodes/train_steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Integration Prompts #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent Integration Prompts #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions