Skip to content

Add FSDP support, for faster training of large models. #59

@tomtseng

Description

@tomtseng

My understanding of how the code handles multi-GPU training is that it relies on AutoModelForCausalLM.from_pretrained(..., device_map="auto"), which I think does naive model parallelism where computation is done sequentially on each GPU rather than getting parallelism across GPUs. For larger models and sweeps we may get significant compute savings by switching to FSDP. (Benchmark this hypothesis with full parameter fine-tuning first before refactoring all the training attacks).

We can use accelerate for FSDP. This will need refactoring.
E.g., instead of run_in_isolation for training, launch an accelerate subprocess? Or we separate training & evaluation into separate runs, where training uses accelerate (evaluation often uses vLLM so I think that means it doesn't need to be run with accelerate).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions