both single apple spawn and multi apple spawn models trained on following specs:
- 400x400 grid
- single reward or multi reward apple spawn at a time
*note: model was tested on a 640x640 grid and generalized well. larger grid was used for better demonstration.
model training visualizer (example from multi apple spawn training):
training_visualizer.mp4
following video shows a demo of the snake_dqn.pth model (single apple spawn) in action:
snake_demo_model.mp4
following video shows a demo of the snake_dqn_multi_path_reward.pth (multi apple spawn) in action:
snake_demo_2.mp4
this struggles big time with accurate pathfinding (need to implement a better pathfinding algorithm). maybe more episodes of training might help?
(made as a part of internal internship presentation. thank you to all the advisers. ofc thank you perplexity pro + github copilot for debugging help besties.)