If a server gets OOM killed, there isn't currently a great way to debug it remotely.
While we can't do something like Java's HeapDumpOnOutOfMemoryError, we can probably enable jemalloc's prof.gdump to generate heap profiles every time memory usage hits a new high. We won't see the exact cause of the OOM kill, but we should get close as long as the OOM wasn't caused by a single huge allocation. We could then make a diagnostic that returns the previous process's last dump.
We'd need to do some work to manage the files it generates so they don't accumulate too much.
If a server gets OOM killed, there isn't currently a great way to debug it remotely.
While we can't do something like Java's HeapDumpOnOutOfMemoryError, we can probably enable jemalloc's
prof.gdumpto generate heap profiles every time memory usage hits a new high. We won't see the exact cause of the OOM kill, but we should get close as long as the OOM wasn't caused by a single huge allocation. We could then make a diagnostic that returns the previous process's last dump.We'd need to do some work to manage the files it generates so they don't accumulate too much.