CS Undergrad
messing with rl training on a100s rn — benchmarking efficiency, vram hacks, and ways to speed it up. speculative decoding history in the works too.
Check out the organization we're building: HyperKuvid-Labs → https://github.com/HyperKuvid-Labs
have a sweet spot for a100s — matching the vram needs perfectly at low cost for experiments. also renting gpus from primeintellect.ai for bigger rl runs.
stuff i'm using:
python, pytorch, cuda, trl, unsloth, a100 gpus...
benchmarking and tweaking for faster rl loops.



