-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
This can easily happen if you lose your network connection after submitting, and should be grounds for a retry on the next connection attempt, not a move into SystemError. Should check RemoteJobFiles.update_job() as well.
[2019-01-12 19:03:14.060] [CRITICAL] Traceback (most recent call last):
File "cerise/../cerise/back_end/execution_manager.py", line 290, in _process_jobs
self._job_runner.update_job(job_id)
File "cerise/../cerise/back_end/job_runner.py", line 52, in update_job
status = self._sched.get_status(job.remote_job_id)
File "/usr/local/lib/python3.5/dist-packages/cerulean/slurm_scheduler.py", line 66, in get_status
10, command, ['-j', job_id, '-h', '-o', '%T'], None, None)
File "/usr/local/lib/python3.5/dist-packages/cerulean/ssh_terminal.py", line 137, in run
raise ConnectionError(str(last_exception))
ConnectionError: Timeout opening channel.
[cerise.back_end.execution_manager]
Metadata
Metadata
Assignees
Labels
No labels