use thread-privatized array? Related works from Matt. http://cacs.usc.edu/papers/Kunaseth-HybridMD-JSC13.pdf http://cacs.usc.edu/papers/kunaseth-ScalableHybridMD-PDPTA20110.pdf