`int` overflow for exstack above ~46K PEs

While running indexgather at scale on some larger Slingshot-11 machines I hit verification failures that turned out to be from failed allocations during `exstack_init()`.

I believe this is from the `THREADS*THREADS` multiplication in the following allocation overflowing: https://github.com/jdevinney/bale/blob/1b8f673b56645b2bd74a8af6213462bbc9d559fe/src/bale_classic/exstack/exstack.upc#L86

For `2**16` PEs this was `2**16` * `2**16`, which overflows `int` (I think it's UB, but for my runs it was trying to do a zero sized allocation). I'd expect any number of PEs larger than 46,340 `sqrt(2**31-1)` to trigger this behavior.

Looking through the code, the only other obvious overflow I saw was: https://github.com/jdevinney/bale/blob/1b8f673b56645b2bd74a8af6213462bbc9d559fe/src/bale_classic/exstack/exstack.upc#L112

---

I wanted to check if this issue was surprising and whether anybody else has run exstack at this scale. I think `2**16` PEs is 128G of aggregator buffers per node, so it may just be beyond the scale exstack was designed for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`int` overflow for exstack above ~46K PEs #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

int overflow for exstack above ~46K PEs #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`int` overflow for exstack above ~46K PEs #2