-
-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Had a great tip on Discord about tokenziers - which says: https://huggingface.co/docs/tokenizers/python/latest/quicktour.html#using-a-pretrained-tokenizer
You can load any tokenizer from the Hugging Face Hub as long as a
tokenizer.jsonfile is available in the repository.
And sure enough, this seems to work:
>>> import tokenizers
>>> from tokenizers import Tokenizer
>>> tokenizer = Tokenizer.from_pretrained("TheBloke/Llama-2-70B-fp16")
Downloaded 1.76MiB in 0s
>>> tokenizer.encode("hello world")
Encoding(num_tokens=3, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request