Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 30, 2025

Replace list comprehension with individual decode() calls with batch_decode() for potential performance gains when processing batches.

Changes

  • WordStoppingCriteria.__call__(): Use self.tokenizer.batch_decode(input_ids.tolist()) instead of [self.tokenizer.decode(ids) for ids in input_ids]
  • Test coverage: Add test_word_stopping_criteria_utf8_tokenizer() to verify batch_decode() and decode() produce identical results with UTF8Tokenizer
# Before
texts = [self.tokenizer.decode(ids) for ids in input_ids]

# After  
texts = self.tokenizer.batch_decode(input_ids.tolist())

No behavioral changes - batch_decode() is part of the PreTrainedTokenizer interface and produces identical output to individual calls.

Original prompt

This section details on the original issue you should resolve

<issue_title>WordStoppingCriteria efficiency</issue_title>
<issue_description>From #2

WordStoppingCriteria can be made more efficient by using
self.tokenizer.batch_decode(input_ids.tolist()) instead of [self.tokenizer.decode(ids) for ids in input_ids]

We need to test this works with utf8-tokenizer for example, which is already a dependency - UTF8Tokenizer should have the same behavior for both options, and we should use batch_decode because it could be faster</issue_description>

<agent_instructions>implement the change, and test it</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

… test

Co-authored-by: AmitMY <5757359+AmitMY@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve efficiency of WordStoppingCriteria using batch_decode Use batch_decode in WordStoppingCriteria for efficiency Oct 30, 2025
Copilot AI requested a review from AmitMY October 30, 2025 09:44
@AmitMY AmitMY marked this pull request as ready for review October 30, 2025 09:51
@AmitMY AmitMY merged commit e9a3902 into main Oct 30, 2025
4 checks passed
@AmitMY AmitMY deleted the copilot/optimize-word-stopping-criteria branch October 30, 2025 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WordStoppingCriteria efficiency

2 participants