Fix potential issues when the 'enforce_ceager' mode is set to 'False' #72 by LeeWant · Pull Request #73 · Wenyueh/MinivLLM

LeeWant · 2026-03-31T06:05:45Z

The main issues include the following:

slot_mapping was originally mapped at block granularity, while input_ids operate at token granularity. As a result, when a block is not fully occupied, slot_mapping ignores the unfilled block and proceeds starting from the next new block.

  # by block
  # for i, block_id in enumerate(seq.block_table[seq.num_cached_blocks:]):
  #     if seq.num_cached_blocks + i != seq.num_blocks - 1:
  #         slot_mappings.extend(list(range(block_id * self.block_size, (block_id+1) * self.block_size)))
  #     else:
  #         slot_mappings.extend(list(range(block_id * self.block_size, block_id * self.block_size + seq.last_block_num_tokens)))

  # by token
  for pos in range(num_cached_tokens, len(token_ids)):
      block_idx = pos // self.block_size
      block_offset = pos % self.block_size
      block_id = seq.block_table[block_idx]
      slot_mappings.append(block_id * self.block_size + block_offset)

The variable shape setting in the CUDA graph has an issue: the parameter name max_num_seqs is inconsistent with the setting in the config; the model's final output is the output without Softmax, not the logits.

   # max_bs = self.config['max_num_seqs']
  max_bs = self.config['max_num_sequences']

  # outputs = torch.zeros(max_bs, self.config['vocab_size'], device=f'cuda:{self.rank}')
  outputs = torch.zeros(max_bs, self.config['hidden_size'], device=f'cuda:{self.rank}')

CUDA graph should be captured after the graph generation is completed.

    with torch.cuda.graph(graph, graph_pool):
        outputs[:batch_size] = self.model(input_ids[:batch_size])
        # The CUDA image capture process is not yet completed
        # if graph_pool is None:
        #     graph_pool = graph.pool()

    #After capturing the CUDA graph, save it
    if graph_pool is None:
        graph_pool = graph.pool()

LeeWant added 2 commits March 31, 2026 11:46

fix enforce_eager mode Wenyueh#72

c24a37b

enforce_eager mode Wenyueh#72

2718029

LeeWant linked an issue Apr 2, 2026 that may be closed by this pull request

[Bug] enforce_eager mode cannot be set correctly #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix potential issues when the 'enforce_ceager' mode is set to 'False' #72#73

Fix potential issues when the 'enforce_ceager' mode is set to 'False' #72#73
LeeWant wants to merge 2 commits intoWenyueh:mainfrom
LeeWant:enforce_eager_fix

LeeWant commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeeWant commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeeWant commented Mar 31, 2026 •

edited

Loading