Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
src	src
.gitignore	.gitignore
.npmignore	.npmignore
Cargo.toml	Cargo.toml
README.md	README.md
package.json	package.json
test.js	test.js

Name

Last commit message

Last commit date

kitoken

Tokenizer for language models.

^{Tokenize text for Llama, Gemini, GPT-4, DeepSeek, Mistral and many others; in the web, on the client and any platform.}

import { Kitoken } from "kitoken/node"

const model = fs.readFileSync("models/llama4.model")
const encoder = new Kitoken(model)

const tokens = encoder.encode("hello world!", true)
const string = TextDecoder().decode(encoder.decode(tokens))

Overview

Kitoken is a fast and versatile tokenizer for language models compatible with SentencePiece, HuggingFace Tokenizers, OpenAI Tiktoken and Mistral Tekken, supporting BPE, Unigram and WordPiece tokenization.

Fast and efficient tokenization
Faster than most other tokenizers in both common and uncommon scenarios; see the benchmarks for comparisons with different datasets.
Runs in all environments
Native in Rust and with bindings for Web, Node and Python; see kitoken.dev for a web demo.
Supports input and output processing
Including unicode-aware normalization, pre-tokenization and post-processing options.
Compact data encoding
Definitions are stored in an efficient binary format and without merge list.

See the main README for more information.

Usage

The JavaScript package provides multiple exports:

Export	Description
`kitoken`	The default export, importing the WebAssembly file directly. Usable with Webpack and other bundlers.
`kitoken/node`	Uses Node.js functions to read the WebAssembly file from the file system. Provides support for additional split strategies and regex optimizations.
`kitoken/web`	Can be used in web browsers without a bundler, uses `new URL(..., import.meta.url)` to load the WebAssembly file.
`kitoken/minimal`	Smallest file size. Similar to the default export, but only supports initialization from `.kit` definitions.
`kitoken/full`	Largest file size. Similar to the default export, but provides support for additional split strategies and regex optimizations.

See also the Node test and the Web example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

kitoken

Overview

Usage

Uh oh!

FilesExpand file tree

javascript

Directory actions

More options

Directory actions

More options

Latest commit

History

javascript

Folders and files

parent directory

README.md

kitoken

Overview

Usage