-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
(Where TIP is "Tinsel Improvement Proposal", al a PIP)
The idea of a tinsel variant where devices only have access to the slots
continues to intrigue, particularly if there is a big advantage in thread-count
(I believe 4x was mentioned...).
Currently we have 1024 bytes per thread, and that is enough for a very
carefully designed system. Assume we still have 64-byte messages, and
reserve 8 message slots for send/receive. We then get 1024 bytes for
stack/working space. Assume:
- 256 bytes for "stack" (a misnomer at this scale)
- 128 bytes for local connectivity (topology info)
- 128 bytes for state.
A 4-vector is 16 bytes, so we can keep hold of 8 4-vectors in that space,
and send 2 4-vectors's per message.
So I think we could do a reasonably interesting 2x2x2 agglomerated finite-volume
and soak up the extra cores quite well.
Not saying we must do it, but the idea of quadrupling the core count is very interesting,
even at the expense of DRAM>