Skip to content

"-INF" overflow while computing Kullback Leibler gradients #2

@voaneves

Description

@voaneves

Hi there, @dchetelat !

First of all, thank you for the repo, it's one of the most concise ACER implementations.

While running the agent, all the agents started failing (in different orders) in this part:

Episode #650, episode rewards 2.8400000035762787
Episode #644, episode rewards -1.1599999964237213
Episode #676, episode rewards 13.655000007711351
Episode #690, episode rewards -1.2799999937415123
Episode #692, episode rewards -1.0599999986588955
Process Process-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-1-1e65d43062d2>", line 992, in run_agent
    local_agent.run()
  File "<ipython-input-1-1e65d43062d2>", line 854, in run
    self.learning_iteration(trajectory)
  File "<ipython-input-1-1e65d43062d2>", line 890, in learning_iteration
    Variable(average_action_probabilities.data))
  File "<ipython-input-1-1e65d43062d2>", line 964, in discrete_trust_region_update
    kullback_leibler_gradients = torch.autograd.grad(negative_kullback_leibler.mean(),
RuntimeError: value cannot be converted to type double without overflow: -inf
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Episode #677  File "<ipython-input-1-1e65d43062d2>", line 992, in run_agent
    local_agent.run()
  File "<ipython-input-1-1e65d43062d2>", line 854, in run
    self.learning_iteration(trajectory)
  File "<ipython-input-1-1e65d43062d2>", line 890, in learning_iteration
    Variable(average_action_probabilities.data))
  File "<ipython-input-1-1e65d43062d2>", line 964, in discrete_trust_region_update
    kullback_leibler_gradients = torch.autograd.grad(negative_kullback_leibler.mean(),
RuntimeError: value cannot be converted to type double without overflow: -inf
Episode #651Episode #691, episode rewards 2.005000022239983

Do I need to clip the value to something like -500 or should I use the maximum float value? Is it expected?

Regards,
Victor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions