Hello,Thank you for sharing your work and providing the G2RL implementation in the POGEMA environment.
I tried increasing the number of iterations (num_episodes) and the replay buffer size, the results remain similar to the examples provided in the notebooks, and the value of results['done'].append(scalars['done']) is consistently low.

Are there specific parameters or settings that need to be adjusted to significantly improve the model's performance?
Any guidance you could provide on where adjustments might be most effective would be greatly appreciated.