"-INF" overflow while computing Kullback Leibler gradients

Hi there, @dchetelat !

First of all, thank you for the repo, it's one of the most concise ACER implementations.

While running the agent, all the agents started failing (in different orders) in this part:

~~~~
Episode #650, episode rewards 2.8400000035762787
Episode #644, episode rewards -1.1599999964237213
Episode #676, episode rewards 13.655000007711351
Episode #690, episode rewards -1.2799999937415123
Episode #692, episode rewards -1.0599999986588955
Process Process-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-1-1e65d43062d2>", line 992, in run_agent
    local_agent.run()
  File "<ipython-input-1-1e65d43062d2>", line 854, in run
    self.learning_iteration(trajectory)
  File "<ipython-input-1-1e65d43062d2>", line 890, in learning_iteration
    Variable(average_action_probabilities.data))
  File "<ipython-input-1-1e65d43062d2>", line 964, in discrete_trust_region_update
    kullback_leibler_gradients = torch.autograd.grad(negative_kullback_leibler.mean(),
RuntimeError: value cannot be converted to type double without overflow: -inf
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Episode #677  File "<ipython-input-1-1e65d43062d2>", line 992, in run_agent
    local_agent.run()
  File "<ipython-input-1-1e65d43062d2>", line 854, in run
    self.learning_iteration(trajectory)
  File "<ipython-input-1-1e65d43062d2>", line 890, in learning_iteration
    Variable(average_action_probabilities.data))
  File "<ipython-input-1-1e65d43062d2>", line 964, in discrete_trust_region_update
    kullback_leibler_gradients = torch.autograd.grad(negative_kullback_leibler.mean(),
RuntimeError: value cannot be converted to type double without overflow: -inf
Episode #651Episode #691, episode rewards 2.005000022239983
~~~~

Do I need to clip the value to something like -500 or should I use the maximum float value? Is it expected?

Regards,
Victor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"-INF" overflow while computing Kullback Leibler gradients #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"-INF" overflow while computing Kullback Leibler gradients #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions