Skip to content

First-batch inference significantly slower due to initialization overhead (~6x slowdown) #18

@v-shaoningli

Description

@v-shaoningli

Body

When running AlphaFast inference for 64 identical proteins (~450 tokens each) across 8 A800 GPUs (8 jobs per GPU), I observe a consistent pattern where the first batch of jobs takes ~6x longer than steady-state:

Batch Inference Time Relative to Steady State
1st wave (8 jobs) ~203–208s ~5.8x
2nd wave (8 jobs) ~128–134s ~3.6x
Remaining ~48 jobs ~35–36s 1.0x (baseline)

All 64 inputs are the same protein sequence (~450 tokens), so the difference isn't data-dependent. The slowdown is strictly correlated with job launch order, not GPU identity — every GPU's first job is slow.

This pattern is consistent with:

  • CUDA kernel JIT compilation / caching on first run
  • Model weight initialization or compilation (e.g., torch.compile warmup)
  • Lazy module initialization triggered on first forward pass

Questions:

  1. Is there a recommended warm-up step or pre-compilation flag to avoid this cold-start penalty?

Environment:

  • GPUs: 8× A800
  • Protein length: ~450 tokens
  • Batch: 64 identical sequences

Details

In inference_timing.jsonl

{"name": "TEST_0", "inference_seconds": 202.777, "status": "success"}
{"name": "TEST_60", "inference_seconds": 202.819, "status": "success"}
{"name": "TEST_16", "inference_seconds": 203.013, "status": "success"}
{"name": "TEST_23", "inference_seconds": 203.193, "status": "success"}
{"name": "TEST_45", "inference_seconds": 204.145, "status": "success"}
{"name": "TEST_38", "inference_seconds": 207.325, "status": "success"}
{"name": "TEST_30", "inference_seconds": 207.503, "status": "success"}
{"name": "TEST_52", "inference_seconds": 208.323, "status": "success"}
{"name": "TEST_10", "inference_seconds": 127.831, "status": "success"}
{"name": "TEST_61", "inference_seconds": 128.208, "status": "success"}
{"name": "TEST_24", "inference_seconds": 128.659, "status": "success"}
{"name": "TEST_46", "inference_seconds": 129.326, "status": "success"}
{"name": "TEST_17", "inference_seconds": 134.414, "status": "success"}
{"name": "TEST_39", "inference_seconds": 131.029, "status": "success"}
{"name": "TEST_53", "inference_seconds": 131.546, "status": "success"}
{"name": "TEST_31", "inference_seconds": 133.127, "status": "success"}
{"name": "TEST_11", "inference_seconds": 35.991, "status": "success"}
{"name": "TEST_62", "inference_seconds": 35.671, "status": "success"}
{"name": "TEST_25", "inference_seconds": 35.692, "status": "success"}
{"name": "TEST_47", "inference_seconds": 35.766, "status": "success"}
{"name": "TEST_18", "inference_seconds": 35.647, "status": "success"}
{"name": "TEST_40", "inference_seconds": 35.446, "status": "success"}
{"name": "TEST_54", "inference_seconds": 35.336, "status": "success"}
{"name": "TEST_32", "inference_seconds": 35.688, "status": "success"}
{"name": "TEST_12", "inference_seconds": 35.476, "status": "success"}
{"name": "TEST_63", "inference_seconds": 35.805, "status": "success"}
{"name": "TEST_26", "inference_seconds": 36.004, "status": "success"}
{"name": "TEST_48", "inference_seconds": 35.582, "status": "success"}
{"name": "TEST_19", "inference_seconds": 35.776, "status": "success"}
{"name": "TEST_41", "inference_seconds": 35.204, "status": "success"}
{"name": "TEST_55", "inference_seconds": 35.839, "status": "success"}
{"name": "TEST_33", "inference_seconds": 35.711, "status": "success"}
{"name": "TEST_13", "inference_seconds": 35.447, "status": "success"}
{"name": "TEST_6", "inference_seconds": 35.555, "status": "success"}
{"name": "TEST_27", "inference_seconds": 36.377, "status": "success"}
{"name": "TEST_49", "inference_seconds": 35.813, "status": "success"}
{"name": "TEST_20", "inference_seconds": 36.003, "status": "success"}
{"name": "TEST_42", "inference_seconds": 35.897, "status": "success"}
{"name": "TEST_56", "inference_seconds": 35.436, "status": "success"}
{"name": "TEST_34", "inference_seconds": 35.526, "status": "success"}
{"name": "TEST_14", "inference_seconds": 35.443, "status": "success"}
{"name": "TEST_7", "inference_seconds": 35.334, "status": "success"}
{"name": "TEST_28", "inference_seconds": 35.61, "status": "success"}
{"name": "TEST_50", "inference_seconds": 35.972, "status": "success"}
{"name": "TEST_43", "inference_seconds": 35.364, "status": "success"}
{"name": "TEST_21", "inference_seconds": 36.21, "status": "success"}
{"name": "TEST_57", "inference_seconds": 35.628, "status": "success"}
{"name": "TEST_35", "inference_seconds": 35.69, "status": "success"}
{"name": "TEST_15", "inference_seconds": 35.604, "status": "success"}
{"name": "TEST_8", "inference_seconds": 35.592, "status": "success"}
{"name": "TEST_29", "inference_seconds": 35.692, "status": "success"}
{"name": "TEST_51", "inference_seconds": 36.135, "status": "success"}
{"name": "TEST_44", "inference_seconds": 35.455, "status": "success"}
{"name": "TEST_22", "inference_seconds": 35.851, "status": "success"}
{"name": "TEST_58", "inference_seconds": 35.377, "status": "success"}
{"name": "TEST_36", "inference_seconds": 35.837, "status": "success"}
{"name": "TEST_1", "inference_seconds": 35.344, "status": "success"}
{"name": "TEST_9", "inference_seconds": 35.371, "status": "success"}
{"name": "TEST_3", "inference_seconds": 35.845, "status": "success"}
{"name": "TEST_5", "inference_seconds": 35.987, "status": "success"}
{"name": "TEST_4", "inference_seconds": 35.401, "status": "success"}
{"name": "TEST_2", "inference_seconds": 35.755, "status": "success"}
{"name": "TEST_59", "inference_seconds": 35.712, "status": "success"}
{"name": "TEST_37", "inference_seconds": 35.723, "status": "success"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions