When batch_size is set to BATCH_SIZE_ALL on a stateful node in multithreaded execution, some data is dropped and input buffers are not combined before being passed to the node. For example,
gen = Generate("gen", size=5)
double = Double("double")
printer = Printer("printer", batch_size=Node.BATCH_SIZE_ALL)
p = Pipeline(gen | double | printer, n_threads=4)
Expected result is that the output buffer from all double nodes is combined before being passed to the printer and the printer will print [0, 2, 4, 6, 8] (order may be different). However, actual result is different whenever run.