@aid-on/fuzztok

Fast and lightweight fuzzy token estimation library with CJK support

Features

🚀 High Performance: Optimized for speed and low memory usage
🌏 CJK Support: Advanced support for Chinese, Japanese, and Korean text
🔧 Flexible Architecture: Dependency injection pattern for model configurations
📊 Detailed Analysis: Character type breakdown and composition analysis
⚡ Batch Processing: Support for batch estimation and streaming text
💰 Cost Calculation: Built-in token-to-cost conversion utilities
🐛 Debug Tools: Visualization tools for estimation breakdown

Installation

npm install @aid-on/fuzztok

Quick Start

import { createSimpleFuzzyEstimator } from '@aid-on/fuzztok';

// Configure models
const modelConfigs = {
  'gpt-3.5-turbo': {
    charsPerToken: 4,
    overhead: 10,
    cjkTokensPerChar: 1.2,
    mixedTextMultiplier: 1.05,
    numberTokensPerChar: 3.5,
    symbolTokensPerChar: 2.5,
    whitespaceHandling: 'compress'
  }
};

// Create estimator
const estimator = createSimpleFuzzyEstimator(modelConfigs, 'gpt-3.5-turbo');

// Simple estimation
const tokens = estimator.estimate('Hello, world! こんにちは！');
console.log(\`Estimated tokens: \${tokens}\`);

// Detailed estimation
const detailed = estimator.estimateDetailed('Hello, world! こんにちは！');
console.log(detailed);

API Reference

Core Classes

`FuzzyTokenEstimator`

Main estimation engine with dependency injection for model configurations.

constructor(
  modelProvider: ModelConfigProvider,
  options?: {
    fallbackConfig?: FuzzyModelConfig;
    defaultModel?: string;
  }
)

Methods:

estimate(text: string, modelName?: string): number - Simple token count
estimateDetailed(text: string, modelName?: string): EstimationResult - Detailed analysis
estimatePayload(payload: TextPayload): number - Estimate from text payload
estimateBatch(texts: string[], modelName?: string): EstimationResult[] - Batch processing

`CharacterClassifier`

Utility for character type detection and text analysis.

// Static methods
CharacterClassifier.isCJKCharacter(char: string): boolean
CharacterClassifier.getCharacterType(char: string): CharacterType
CharacterClassifier.analyzeTextComposition(text: string): TextComposition

Configuration

`FuzzyModelConfig`

interface FuzzyModelConfig extends BaseTokenConfig {
  cjkTokensPerChar: number;           // CJK characters per token
  mixedTextMultiplier: number;        // Mixed text adjustment factor
  numberTokensPerChar?: number;       // Number tokenization rate
  symbolTokensPerChar?: number;       // Symbol tokenization rate
  whitespaceHandling?: 'ignore' | 'count' | 'compress';
}

Factory Functions

// Using ModelConfigProvider
createFuzzyEstimator(
  modelProvider: ModelConfigProvider,
  options?: ConfigOptions
): FuzzyTokenEstimator

// Using simple config object
createSimpleFuzzyEstimator(
  modelConfigs: Record<string, FuzzyModelConfig>,
  defaultModel?: string
): FuzzyTokenEstimator

Advanced Usage

Custom Model Provider

import { FuzzyTokenEstimator } from '@aid-on/fuzztok';

class CustomModelProvider {
  getConfig(modelName) {
    // Fetch from database, API, etc.
    return {
      charsPerToken: 4,
      overhead: 10,
      cjkTokensPerChar: 1.2,
      mixedTextMultiplier: 1.05
    };
  }
  
  getSupportedModels() {
    return ['custom-model-1', 'custom-model-2'];
  }
}

const estimator = new FuzzyTokenEstimator(new CustomModelProvider());

Cost Calculation

import { TokenCostCalculator } from '@aid-on/fuzztok';

class MyCostProvider {
  getCost(model) {
    return { input: 0.0015, output: 0.002 }; // per 1K tokens
  }
}

const calculator = new TokenCostCalculator(new MyCostProvider());
const cost = calculator.calculate('gpt-3.5-turbo', 1000, 500);
console.log(cost.formattedTotal); // "$2.25"

Streaming Support

async function* textStream() {
  yield "Hello ";
  yield "world ";
  yield "こんにちは！";
}

for await (const result of estimator.estimateStream(textStream())) {
  console.log(\`Chunk: \${result.chunk}, Tokens: \${result.tokens}, Total: \${result.total}\`);
}

CJK Support

This library provides comprehensive support for CJK text:

Chinese: Simplified and Traditional Chinese characters
Japanese: Hiragana, Katakana, and Kanji
Korean: Hangul syllables and compatibility characters
Extended Unicode: CJK Extension A-G, compatibility forms, and more

License

MIT

Contributing

Issues and pull requests are welcome on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
.github/workflows		.github/workflows
demo-dist		demo-dist
demo		demo
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.ja.md		README.ja.md
README.md		README.md
deploy.sh		deploy.sh
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@aid-on/fuzztok

Features

Installation

Quick Start

API Reference

Core Classes

`FuzzyTokenEstimator`

`CharacterClassifier`

Configuration

`FuzzyModelConfig`

Factory Functions

Advanced Usage

Custom Model Provider

Cost Calculation

Streaming Support

CJK Support

License

Contributing

About

Uh oh!

Releases

Packages

Languages

Aid-On/fuzztok

Folders and files

Latest commit

History

Repository files navigation

@aid-on/fuzztok

Features

Installation

Quick Start

API Reference

Core Classes

FuzzyTokenEstimator

CharacterClassifier

Configuration

FuzzyModelConfig

Factory Functions

Advanced Usage

Custom Model Provider

Cost Calculation

Streaming Support

CJK Support

License

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`FuzzyTokenEstimator`

`CharacterClassifier`

`FuzzyModelConfig`

Packages