First of all, thank you for the repo, it's one of the most concise ACER implementations.
While running the agent, all the agents started failing (in different orders) in this part:
Episode #650, episode rewards 2.8400000035762787
Episode #644, episode rewards -1.1599999964237213
Episode #676, episode rewards 13.655000007711351
Episode #690, episode rewards -1.2799999937415123
Episode #692, episode rewards -1.0599999986588955
Process Process-6:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "<ipython-input-1-1e65d43062d2>", line 992, in run_agent
local_agent.run()
File "<ipython-input-1-1e65d43062d2>", line 854, in run
self.learning_iteration(trajectory)
File "<ipython-input-1-1e65d43062d2>", line 890, in learning_iteration
Variable(average_action_probabilities.data))
File "<ipython-input-1-1e65d43062d2>", line 964, in discrete_trust_region_update
kullback_leibler_gradients = torch.autograd.grad(negative_kullback_leibler.mean(),
RuntimeError: value cannot be converted to type double without overflow: -inf
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
Episode #677 File "<ipython-input-1-1e65d43062d2>", line 992, in run_agent
local_agent.run()
File "<ipython-input-1-1e65d43062d2>", line 854, in run
self.learning_iteration(trajectory)
File "<ipython-input-1-1e65d43062d2>", line 890, in learning_iteration
Variable(average_action_probabilities.data))
File "<ipython-input-1-1e65d43062d2>", line 964, in discrete_trust_region_update
kullback_leibler_gradients = torch.autograd.grad(negative_kullback_leibler.mean(),
RuntimeError: value cannot be converted to type double without overflow: -inf
Episode #651Episode #691, episode rewards 2.005000022239983
Do I need to clip the value to something like -500 or should I use the maximum float value? Is it expected?
Hi there, @dchetelat !
First of all, thank you for the repo, it's one of the most concise ACER implementations.
While running the agent, all the agents started failing (in different orders) in this part:
Do I need to clip the value to something like -500 or should I use the maximum float value? Is it expected?
Regards,
Victor.