Skip to content

Fix potential issues when the 'enforce_ceager' mode is set to 'False' #72#73

Open
LeeWant wants to merge 2 commits intoWenyueh:mainfrom
LeeWant:enforce_eager_fix
Open

Fix potential issues when the 'enforce_ceager' mode is set to 'False' #72#73
LeeWant wants to merge 2 commits intoWenyueh:mainfrom
LeeWant:enforce_eager_fix

Conversation

@LeeWant
Copy link
Copy Markdown
Collaborator

@LeeWant LeeWant commented Mar 31, 2026

The main issues include the following:

  1. slot_mapping was originally mapped at block granularity, while input_ids operate at token granularity. As a result, when a block is not fully occupied, slot_mapping ignores the unfilled block and proceeds starting from the next new block.
  # by block
  # for i, block_id in enumerate(seq.block_table[seq.num_cached_blocks:]):
  #     if seq.num_cached_blocks + i != seq.num_blocks - 1:
  #         slot_mappings.extend(list(range(block_id * self.block_size, (block_id+1) * self.block_size)))
  #     else:
  #         slot_mappings.extend(list(range(block_id * self.block_size, block_id * self.block_size + seq.last_block_num_tokens)))

  # by token
  for pos in range(num_cached_tokens, len(token_ids)):
      block_idx = pos // self.block_size
      block_offset = pos % self.block_size
      block_id = seq.block_table[block_idx]
      slot_mappings.append(block_id * self.block_size + block_offset)
  1. The variable shape setting in the CUDA graph has an issue: the parameter name max_num_seqs is inconsistent with the setting in the config; the model's final output is the output without Softmax, not the logits.
   # max_bs = self.config['max_num_seqs']
  max_bs = self.config['max_num_sequences']

  # outputs = torch.zeros(max_bs, self.config['vocab_size'], device=f'cuda:{self.rank}')
  outputs = torch.zeros(max_bs, self.config['hidden_size'], device=f'cuda:{self.rank}')
  1. CUDA graph should be captured after the graph generation is completed.
    with torch.cuda.graph(graph, graph_pool):
        outputs[:batch_size] = self.model(input_ids[:batch_size])
        # The CUDA image capture process is not yet completed
        # if graph_pool is None:
        #     graph_pool = graph.pool()

    #After capturing the CUDA graph, save it
    if graph_pool is None:
        graph_pool = graph.pool()

@LeeWant LeeWant linked an issue Apr 2, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] enforce_eager mode cannot be set correctly

1 participant