Implemented a reinforcement learning agent to master continuous walking dynamics in OpenAI Gym’s BipedalWalker-v3 environment.
To explore continuous control and policy gradient optimization through TD3.
- Environment
BipedalWalker-v3from OpenAI Gym.
- Algorithm
- Actor-Critic network architecture with:
- Twin Q-Networks
- Delayed policy updates
- Target policy smoothing
- Actor-Critic network architecture with:
- Training
- Replay buffer, target network updates, and Ornstein-Uhlenbeck noise.
- Performance
- Achieved average rewards >300 after 100 episodes.