feat: memory-efficient token counting functionality#13
feat: memory-efficient token counting functionality#13bluescreen10 merged 2 commits intotiktoken-go:mainfrom
Conversation
|
Hi, thanks for this pull request I'll look into this. |
This uses iterators for the common logic and the function Count and Encode would do different things
|
Hey @amalucelli can you take a look, I've modified the code slightly to put common logic between encode and count into a method. |
amalucelli
left a comment
There was a problem hiding this comment.
LGTM, thank you!
|
@bluescreen10 I was running this change in production and noticed the
After reviewing the changes one more time, I think it's likely due to this since the returns are just discarded but still allocated: for _, _ = range c.tokenize(input) {
count++
}In my initial suggestion, there was a different and isolated method for counting: cd27b29#diff-8b594c387517838ab44fd9d9f7a5f8c1e3efa32486a1ba4b923858e4e6538955R22-R45 Would you reconsider maybe an approach to isolate |
|
Thanks, I will look into this. What I didn't like about the old approach was the fact that we had code duplication between count and the normal tokenizing process. I'll try to see if we can avoid some of that penalty. I'm curious about your use case, can you tell me more? |
|
Thanks for that! I can't get into details but we have some token budget constraints for the content we send to the LLM, and we rely on it for that, so for our case, we only ever need to know the count/total of tokens for a string. |
|
Lol, I didn't mean to put you into trouble. I was thinking that maybe it's an embedded use case in which memory/cpu is critical. |
|
That's all good, the thing is that we have frequent spikes in memory and frequently relate to this. If you want I can try to review it again and propose something, but I'm not that familiar with this code base, so I'm not sure how much optimization you want on that. |


This PR adds a new
Count()method to the Codec interface. This method allows counting tokens without allocating memory (len(Encode())) for the actual token IDs and strings. This is particularly useful when you only need to know the token count of a text, such as when checking if text fits within a model's context window.