b200 local NVMe caching to speed up server start time

currently b200 dgcx gcp cluster stores ckpts on the lustre shared cluster level storage instead of local node level `/raid/` NVMe leading to 1-2 hour loads for kimi k2.5

switching to `/raid/` will lead to 6-7x more job completions throughput per hour for b200