Suzume

Japanese Tokenizer That Actually Works in the Browser

No more 50MB dictionary files. Lightweight Japanese tokenization under 300KB — runs entirely in the browser, no server required.

Suzume is a feature-driven tokenizer designed for real-world Japanese text on the web. The best of both worlds: lightweight footprint meets practical accuracy.

📖 Documentation · 🎮 Live Demo

Why Suzume?

Feature	Traditional Analyzers	Suzume
Bundle Size	20–50MB+ (dictionary)	< 300KB gzipped
Browser Support	Limited or none	Full support
Server Required	Usually yes	No
Unknown Words	May struggle	Robust by design
POS Tagging	✓	✓
Lemmatization	✓	✓

Designed for frontend and edge environments where large dictionaries and server-side processing are not viable.

Key Features

🚫 No Dictionary Hell — Forget about managing 50MB+ dictionary files
🖥️ True Client-Side — Runs 100% in the browser, no API calls, no CORS headaches
🔮 Robust to Unknown Words — Brand names, slang, technical terms — stable tokenization every time
⚡ Production Ready — C++ compiled to WASM, TypeScript support, works everywhere

When to Use Suzume

Suzume is ideal for:

Frontend applications that need client-side Japanese processing
Edge/serverless environments with size constraints
User-generated content where unknown words are common

For deep linguistic research or corpus analysis where dictionary coverage is critical, traditional server-side analyzers may be more appropriate.

Installation

npm install @libraz/suzume

Or use yarn/pnpm/bun:

yarn add @libraz/suzume
pnpm add @libraz/suzume
bun add @libraz/suzume

Quick Start

JavaScript / TypeScript

import { Suzume } from '@libraz/suzume'

const suzume = await Suzume.create()

const tokens = suzume.analyze('すもももももももものうち')
for (const t of tokens) {
  console.log(`${t.surface} [${t.posJa}]`)
}

// Tag extraction
const tags = suzume.generateTags('東京スカイツリーに行きました')
console.log(tags) // ['東京', 'スカイツリー']

suzume.destroy()

Browser (CDN)

<script type="module">
  import { Suzume } from 'https://esm.sh/@libraz/suzume'

  const suzume = await Suzume.create()
  console.log(suzume.analyze('こんにちは'))
</script>

C++

#include "suzume.h"

suzume::Suzume tokenizer;
auto tokens = tokenizer.analyze("東京に行きました");

for (const auto& t : tokens) {
    std::cout << t.surface << "\t" << t.lemma << std::endl;
}

Build from source (requires C++17, CMake 3.15+):

make          # Build
make test     # Run tests

Documentation

Full documentation is available at suzume.libraz.net:

Getting Started — Installation and basic usage
API Reference — Complete API documentation
User Dictionary — Adding custom words
How It Works — Technical deep-dive

Use Cases

Search indexing — Tokenize text for full-text search
Tag extraction — Generate keywords for classification
Browser apps — Client-side Japanese processing without a server
User-generated content — Stable tokenization for noisy input

License

Apache License 2.0

Contributing

Contributions welcome! Please submit issues and pull requests on GitHub.

Author

libraz libraz@libraz.net

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.vscode		.vscode
data		data
examples		examples
js		js
scripts		scripts
src		src
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.yarnrc.yml		.yarnrc.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_ja.md		README_ja.md
biome.json		biome.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Suzume

Why Suzume?

Key Features

When to Use Suzume

Installation

Quick Start

JavaScript / TypeScript

Browser (CDN)

C++

Documentation

Use Cases

License

Contributing

Author

About

Uh oh!

Releases

Packages

Languages

License

libraz/suzume

Folders and files

Latest commit

History

Repository files navigation

Suzume

Why Suzume?

Key Features

When to Use Suzume

Installation

Quick Start

JavaScript / TypeScript

Browser (CDN)

C++

Documentation

Use Cases

License

Contributing

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages