-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I will have to explain a little bit first. I was seeing crashes and incorrect behavior while using lparallel in an application and started to investigate. That issue is discussed here and here. That one is not the issue I am reporting. Just mentioning it for context.
While investigating the above issue, I ran the lparallel test suite in multiple CL implementations and got multiple errors as well as hangs in SBCL 2.5.8 and CCL 1.12, both on x86-64:
$ sbcl --noinform --no-userinit --load .local/share/common-lisp/quicklisp/setup.lisp --eval '(print (lisp-implementation-version))' --eval '(asdf:test-system "lparallel")' --quit
"2.5.8.debian"
[...testing...]
LPARALLEL/TEST::TRY-RECEIVE-TEST................................................
............
debugger invoked on a SIMPLE-ERROR in thread
#<THREAD tid=3471989 "main thread" RUNNING {1200BE0003}>:
The assertion (LPARALLEL/TEST::ALL-WORKERS-ALIVE-P) failed.
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [CONTINUE ] Retry assertion.
1: [RETRY ] Retry
#<TEST-OP > on #<SYSTEM "lparallel/test">.
2: [ACCEPT ] Continue, treating
#<TEST-OP > on #<SYSTEM "lparallel/test">
as having been successful.
3: Retry ASDF operation.
4: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the
configuration.
5: Retry ASDF operation.
6: Retry ASDF operation after resetting the
configuration.
7: Ignore runtime option --eval "(asdf:test-system \"lparallel\")".
8: [ABORT ] Skip rest of --eval and --load options.
9: Skip to toplevel READ/EVAL/PRINT loop.
10: [EXIT ] Exit SBCL (calling #'EXIT, killing the process).
((LAMBDA NIL :IN LPARALLEL/TEST::SLEEPING-WORKER-REPLACEMENT-TEST))
source: (IS (ALL-WORKERS-ALIVE-P))
0]$ sbcl --noinform --no-userinit --load .local/share/common-lisp/quicklisp/setup.lisp --eval '(print (lisp-implementation-version))' --eval '(asdf:test-system "lparallel")' --quit
"2.5.8.debian"
[...testing...]
LPARALLEL-TEST::SLET-UNBOUND-TEST.
LPARALLEL-TEST::PTREE-KILL-TEST.....
LPARALLEL-TEST::NO-KERNEL-RESTART-TEST...
[hangs]$ sbcl --noinform --no-userinit --load .local/share/common-lisp/quicklisp/setup.lisp --eval '(print (lisp-implementation-version))' --eval '(asdf:test-system "lparallel")' --quit
"2.5.8.debian"
[...testing...]
LPARALLEL-TEST::PREMOVE-TEST.....................................
.................................................................
[...]
......................................
debugger invoked on a LPARALLEL.KERNEL:TASK-KILLED-ERROR in thread
#<THREAD tid=3483579 "main thread" RUNNING {1200BF0003}>:
The task was killed.
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [RETRY ] Retry
#<TEST-OP > on #<SYSTEM "lparallel-test">.
1: [ACCEPT ] Continue, treating
#<TEST-OP > on #<SYSTEM "lparallel-test">
as having been successful.
2: Retry #<TEST-OP > on #<SYSTEM "lparallel">.
3: Continue, treating
#<TEST-OP > on #<SYSTEM "lparallel"> as
having been successful.
4: Retry ASDF operation.
5: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the
configuration.
6: Retry ASDF operation.
7: Retry ASDF operation after resetting the
configuration.
8: [CONTINUE ] Ignore runtime option --eval "(asdf:test-system \"lparallel\")".
9: [ABORT ] Skip rest of --eval and --load options.
10: Skip to toplevel READ/EVAL/PRINT loop.
11: [EXIT ] Exit SBCL (calling #'EXIT, killing the process).
((LAMBDA NIL :IN LPARALLEL-TEST::NON-ERROR-CONDITION-TEST))
source: (RECEIVE-RESULT CHANNEL)
0]$ opt/ccl/lx86cl64 --no-init --load .local/share/common-lisp/quicklisp/setup.lisp --eval '(print (lisp-implementation-version))' --eval '(asdf:test-system "lparallel")'
"Version 1.12 (v1.12) LinuxX8664" LPARALLEL/TEST::PTREE-LONE-FN-TEST..........LPARALLEL/TEST::CUSTOM-KILL-TASK-TEST
[hangs]$ opt/ccl/lx86cl64 --no-init --load .local/share/common-lisp/quicklisp/setup.lisp --eval '(print (lisp-implementation-version))' --eval '(asdf:test-system "lparallel")'
"Version 1.12 (v1.12) LinuxX8664"
[...testing...]
........................................................LPARALLEL-TEST::PTREE-REDEFINITION-TEST.....LPARALLEL-TEST::BASIC-THREADING-TEST..LPARALLEL-TEST::WORKER-SUICIDE-TEST
[hangs]The behavior is very non-deterministic. From what I tried, it is rare to see two runs fail in the same way or at the same point in the test suite.
I experimented a bit and it seems that the most recent commit for which the test suite works in these implementations is 80fc295. The following commit, 6950400, is the first to exhibit the problems for me.