Hi,
I tried to log policy and critic losses as well as reward using Tensorboard. I run training using default setting with sz50.
I noticed that critic losses keep increasing. Does this even make sense?

I wonder is there any issue with the code regarding critic losses, could you please have a check/comment on this.
Thank you.