Large queues (> 2GB) are broken into smaller pieces (tagged with tags::piece) to work around MPI limitations. detail::VectorWindow<char> is used to put subsets of the MemoryBuffers into MPI i[s]send routines without having to make a copy. shared_ptrs are used to keep track of the original buffer as long as some in-flight message needs it.
We could use the same mechanism during, e.g., all-to-all operations to avoid making copies between buffers.