Skip to content

KindleTools/kindle-tools-ts

Repository files navigation

Kindle Tools TS Logo

kindle-tools-ts

npm version CI TypeScript Node.js Downloads install size license Biome GitHub Pages

A robust TypeScript library to parse and process Amazon Kindle My Clippings.txt files with smart merging, deduplication, and multiple export formats.

👉 Live Demo / Docs: https://kindletools.github.io/kindle-tools-ts/

v1.0 - Feature Complete. This library is stable and production-ready. Only accepting bug fixes.


TL;DR (Quick Install)

npm install kindle-tools-ts
import { parseFile } from 'kindle-tools-ts/node';
import { JsonExporter } from 'kindle-tools-ts';

const result = await parseFile('./My Clippings.txt');
console.log(`Found ${result.stats.total} clippings`);

const exporter = new JsonExporter();
const json = await exporter.export(result.clippings, { pretty: true });

📑 Table of Contents


💡 Why This Library?

Problem Solution
Duplicate entries when you re-highlight or edit highlights Smart deduplication with deterministic IDs
Overlapping highlights when extending a selection in Kindle Automatic merging keeps the longest version
Notes disconnected from their highlights Intelligent linking based on location proximity
Messy titles with _EBOK, (Spanish Edition), .mobi noise Advanced cleaning preserves original in titleRaw
Multi-language files (device language changes) Automatic detection for 11 languages
PDF artifacts like pro-\nblem word breaks De-hyphenation and text cleaning
Export lock-in to a single format 6 formats: JSON, CSV, Markdown, Obsidian, Joplin, HTML

Perfect for:

  • 📚 Obsidian/Joplin users — Import highlights directly into your vault/notebooks
  • 🤖 Automation enthusiasts — Integrate into your scripts and workflows
  • 💻 Developers — Full TypeScript types, tree-shakeable, dual ESM/CJS build
  • 🌐 Multilingual readers — Works with Kindle devices in any supported language

✨ Features

Core Capabilities

Feature Description
🌍 Multi-language support EN, ES, PT, DE, FR, IT, ZH, JA, KO, NL, RU
🔍 Auto language detection Automatically detects your file's language
🧠 Smart merging Merges overlapping highlights when you extend a selection
🔄 Deduplication Removes exact duplicates with deterministic IDs
🔗 Note linking Links notes to their associated highlights
🏷️ Tag extraction Extracts tags from notes (comma/semicolon separated)

Quality & Security

Feature Description
🧹 Text cleaning De-hyphenation, space normalization, edition markers removal
⚠️ Quality flags Detects suspicious highlights (accidental, incomplete)
📊 Fuzzy duplicates Jaccard similarity to find near-duplicates
🛡️ CSV protection Formula injection protection (OWASP compliant)

Export Formats

Format Description
JSON Standard JSON with metadata
CSV Excel-compatible with BOM
Markdown Blockquotes with Handlebars templates
Obsidian YAML frontmatter, folder structures
Joplin JEX Importable archive with notebooks
HTML Standalone page with dark mode & search

Technical

Feature Description
📘 TypeScript-first Full type definitions with strict mode
🪶 Lightweight 7 runtime dependencies
📥 Multiple inputs Parse TXT, re-import JSON/CSV
🌐 Isomorphic Works in Node.js and browsers
🔒 Non-destructive Always preserves original data (*Raw fields)

📦 Installation

npm install kindle-tools-ts
# or
pnpm add kindle-tools-ts
# or
yarn add kindle-tools-ts

Requirements

  • Node.js: 18.18.0+ (use .nvmrc for consistency)
  • Platforms: Windows, macOS, Linux
  • TypeScript: 5.0+ (optional, JS works too)

🚀 Quick Start

Basic Usage (Node.js)

import { JsonExporter } from 'kindle-tools-ts';
import { parseFile } from 'kindle-tools-ts/node';

// Parse your clippings file
const result = await parseFile('./My Clippings.txt');

console.log(`Found ${result.stats.total} clippings from ${result.stats.totalBooks} books`);
console.log(`  - Highlights: ${result.stats.totalHighlights}`);
console.log(`  - Notes: ${result.stats.totalNotes}`);
console.log(`  - Bookmarks: ${result.stats.totalBookmarks}`);

// Export to JSON
const exporter = new JsonExporter();
const jsonOutput = await exporter.export(result.clippings, { pretty: true });
console.log(jsonOutput.output);

Parse from String (Browser-safe)

import { parseString } from 'kindle-tools-ts';

const content = `
The Art of War (Sun Tzu)
- Your Highlight on page 42 | Location 100-105 | Added on Friday, January 1, 2024 10:30:45 AM

All warfare is based on deception.

==========
`;

const result = parseString(content);
console.log(result.clippings[0]);
// {
//   id: 'a6439ae5832a',
//   title: 'The Art of War',
//   author: 'Sun Tzu',
//   content: 'All warfare is based on deception.',
//   type: 'highlight',
//   page: 42,
//   location: { raw: '100-105', start: 100, end: 105 },
//   ...
// }

Export to Different Formats

import { MarkdownExporter, ObsidianExporter, JoplinExporter } from 'kindle-tools-ts';
import { parseFile } from 'kindle-tools-ts/node';

const result = await parseFile('./My Clippings.txt');

// Markdown
const mdExporter = new MarkdownExporter();
const markdown = await mdExporter.export(result.clippings);

// Obsidian (one file per book)
const obsidianExporter = new ObsidianExporter();
const obsidian = await obsidianExporter.export(result.clippings, {
  outputPath: './vault/books/',
  folderStructure: 'by-author'
});

// Joplin JEX archive
const joplinExporter = new JoplinExporter();
const jex = await joplinExporter.export(result.clippings, {
  outputPath: './clippings.jex'
});

📚 API Reference

Core Functions

parseFile(filePath, options?)

Parse a Kindle clippings file from disk. Node.js only.

import { parseFile } from 'kindle-tools-ts/node';

const result = await parseFile('./My Clippings.txt', {
  language: 'auto',           // 'auto' | 'en' | 'es' | 'pt' | ...
  removeDuplicates: true,     // Remove exact duplicates
  mergeOverlapping: true,     // Merge extended highlights
  mergeNotes: true,           // Link notes to highlights
  extractTags: false,         // Extract tags from notes
  tagCase: 'lowercase',       // 'original' | 'uppercase' | 'lowercase'
});

parseString(content, options?)

Parse clippings from a string. Works in browser.

import { parseString } from 'kindle-tools-ts';

const result = parseString(fileContent, { language: 'en' });

ParseResult

interface ParseResult {
  clippings: Clipping[];      // Parsed clippings
  stats: ClippingsStats;      // Statistics
  warnings: ParseWarning[];   // Any parsing warnings
  meta: {
    fileSize: number;         // Original file size
    parseTime: number;        // Parse time in ms
    detectedLanguage: string; // Detected language code
    totalBlocks: number;      // Total blocks found
    parsedBlocks: number;     // Successfully parsed
  };
}

Clipping

interface Clipping {
  id: string;                    // Deterministic hash (12 chars)
  title: string;                 // Clean book title
  author: string;                // Extracted author
  content: string;               // Highlight/note content
  type: 'highlight' | 'note' | 'bookmark' | 'clip' | 'article';
  page: number | null;
  location: { raw: string; start: number; end: number | null };
  date: Date | null;
  dateRaw: string;
  isLimitReached: boolean;       // DRM limit detected
  isEmpty: boolean;
  language: string;
  source: 'kindle' | 'sideload';
  wordCount: number;
  charCount: number;
  note?: string;                 // Linked note content
  tags?: string[];               // Extracted tags
  linkedNoteId?: string;
  linkedHighlightId?: string;
}

Exporters

All exporters implement the same interface:

interface Exporter {
  name: string;
  extension: string;
  export(clippings: Clipping[], options?: ExporterOptions): Promise<ExportResult>;
}

ExporterOptions

interface ExporterOptions {
  outputPath?: string;           // Output file/directory
  groupByBook?: boolean;         // Group by book
  includeStats?: boolean;        // Include statistics
  pretty?: boolean;              // Pretty print (JSON)
  includeRaw?: boolean;          // Include *Raw fields
  folderStructure?: 'flat' | 'by-book' | 'by-author' | 'by-author-book';
  authorCase?: 'original' | 'uppercase' | 'lowercase';
  includeClippingTags?: boolean; // Include extracted tags
  title?: string;                // Custom export title
  creator?: string;              // Custom author
}

Available Exporters

Exporter Format Output
JsonExporter JSON Single file, optional grouping
CsvExporter CSV Excel-compatible with BOM
MarkdownExporter Markdown Single file with blockquotes
ObsidianExporter Obsidian MD Multiple files with YAML frontmatter
JoplinExporter JEX Importable Joplin archive
HtmlExporter HTML Standalone page with dark mode

Importers

Re-process previously exported files:

import { JsonImporter, CsvImporter } from 'kindle-tools-ts';

// Import from JSON
const jsonImporter = new JsonImporter();
const result = await jsonImporter.import(jsonContent);

// Import from CSV (tolerates fuzzy headers)
const csvImporter = new CsvImporter();
const csvResult = await csvImporter.import(csvContent);

Custom Logger

Inject your own logger (Pino, Winston, etc.):

import { setLogger, resetLogger, nullLogger } from 'kindle-tools-ts';

// Custom logger
setLogger({
  error: (entry) => myLogger.error(entry),
  warn: (entry) => myLogger.warn(entry),
  info: (msg, ctx) => myLogger.info(ctx, msg),
  debug: (msg, ctx) => myLogger.debug(ctx, msg),
});

// Silence all logs
setLogger(nullLogger);

// Reset to console
resetLogger();

Error Handling

import { parseFile, isImportError, isValidationError } from 'kindle-tools-ts';

try {
  const result = await parseFile('file.txt');
  
  if (result.isErr()) {
    console.error(`[${result.error.code}] ${result.error.message}`);
  }
} catch (error) {
  if (error instanceof AppException) {
    if (isImportError(error.appError)) {
      console.error('Import error:', error.appError.detailedMessage);
    }
  }
}

Error Codes

Domain Code Description
Import IMPORT_PARSE_ERROR Failed to parse content
Import IMPORT_EMPTY_FILE Empty file
Import IMPORT_INVALID_FORMAT Schema validation failed
Export EXPORT_UNKNOWN_FORMAT Unsupported format
Export EXPORT_WRITE_FAILED Failed to write output
Export EXPORT_TEMPLATE_ERROR Template compilation failed

Advanced: processClippings()

For advanced use cases, you can call the processor directly:

import { processClippings, parseString } from 'kindle-tools-ts';

// Parse without processing
const raw = parseString(content, { 
  removeDuplicates: false,
  mergeOverlapping: false,
  mergeNotes: false 
});

// Process manually with custom options
const processed = processClippings(raw.clippings, {
  removeDuplicates: true,
  mergeOverlapping: true,
  extractTags: true,
  tagCase: 'lowercase',
});

console.log(`Duplicates removed: ${processed.duplicatesRemoved}`);
console.log(`Highlights merged: ${processed.mergedHighlights}`);
console.log(`Notes linked: ${processed.linkedNotes}`);

ProcessResult

interface ProcessResult {
  clippings: Clipping[];           // Processed clippings
  duplicatesRemoved: number;       // Exact duplicates removed
  mergedHighlights: number;        // Overlapping merges
  linkedNotes: number;             // Notes linked to highlights
  emptyRemoved: number;            // Empty clippings removed
  suspiciousFlagged: number;       // Flagged as suspicious
  fuzzyDuplicatesFlagged: number;  // Flagged as fuzzy duplicates
  tagsExtracted: number;           // Clippings with tags extracted
  notesConsumed: number;           // Notes removed by mergedOutput
}

ExportResult

Each exporter returns a different output type:

// String-based exporters (JSON, CSV, Markdown, HTML)
interface ExportResultString {
  output: string;                  // The exported content
  stats?: ClippingsStats;          // Optional statistics
}

// File-based exporters (Obsidian)
interface ExportResultFiles {
  files: ExportedFile[];           // Array of files
  stats?: ClippingsStats;
}
// ExportedFile: { path: string; content: string }

// Binary exporters (Joplin JEX)
interface ExportResultBinary {
  output: Uint8Array;              // TAR archive bytes
  stats?: ClippingsStats;
}

Usage example:

// JSON returns string
const json = await new JsonExporter().export(clippings);
console.log(json.output); // string

// Obsidian returns files array
const obsidian = await new ObsidianExporter().export(clippings);
for (const file of obsidian.files) {
  await fs.writeFile(file.path, file.content);
}

// Joplin returns Uint8Array
const jex = await new JoplinExporter().export(clippings);
await fs.writeFile('export.jex', jex.output);

Utils Namespace

Utility functions are grouped under the Utils namespace:

import { Utils } from 'kindle-tools-ts';

// Text utilities
Utils.normalizeText("  multiple   spaces  "); // "multiple spaces"
Utils.normalizeUnicode("café");               // NFC normalized

// Similarity
Utils.jaccardSimilarity("hello world", "world hello"); // 1.0

// Page utilities  
Utils.formatPage(42);                         // "[0042]"
Utils.estimatePageFromLocation(160);          // 10 (160 / 16)
Utils.getPageInfo(clipping);                  // { page: 42, estimated: false }

// Date utilities
Utils.formatDateHuman(new Date());            // "January 18, 2024"

// Geo utilities
Utils.formatLocation({ lat: 40.7, lon: -74 }); // "40.7, -74"

Available modules:

Module Functions
Utils.normalizeText() Collapse whitespace, trim
Utils.normalizeUnicode() NFC normalization
Utils.jaccardSimilarity() Word overlap (0-1)
Utils.formatPage() Zero-padded [0042]
Utils.estimatePageFromLocation() Location ÷ 16
Utils.getPageInfo() Page with estimation flag
Utils.formatDateHuman() Human-readable date
Utils.formatLocation() Geo coordinates

📤 Export Formats

JSON

{
  "clippings": [
    {
      "id": "a6439ae5832a",
      "title": "The Art of War",
      "author": "Sun Tzu",
      "content": "All warfare is based on deception.",
      "type": "highlight",
      "page": 42
    }
  ],
  "meta": { "total": 1 }
}

CSV

Excel-compatible with BOM for UTF-8:

id,title,author,type,page,location,date,content,wordCount
a6439ae5832a,The Art of War,Sun Tzu,highlight,42,100-105,2024-01-01,"All warfare...",6

Markdown

# Kindle Highlights

## The Art of War
*Sun Tzu*

> All warfare is based on deception.
> — Page 42, Location 100-105

Template Customization

const exporter = new MarkdownExporter();

// Use presets
exporter.export(clippings, { templatePreset: 'obsidian' });
// Presets: 'default', 'minimal', 'obsidian', 'notion', 'academic', 'compact', 'verbose'

// Custom templates
exporter.export(clippings, {
  customTemplates: {
    clipping: `> {{content}}\n> *({{page}})*\n`,
    book: `# {{title}}\n\n{{#each clippings}}{{> clipping}}{{/each}}`,
  }
});

Template Variables:

Variable Description
content Highlight text
title Book title
author Book author
page Page number
location Location string
date Formatted date
tags Extracted tags array
tagsHashtags #tag1 #tag2
note Linked note content
hasNote Boolean
hasTags Boolean

Obsidian

One file per book with YAML frontmatter:

---
title: "The Art of War"
author: "Sun Tzu"
source: kindle
total_highlights: 25
tags:
  - strategy
---

# The Art of War

**Author:** [[Sun Tzu]]

## 📝 Highlights

> [!quote] Page 42, Location 100-105
> All warfare is based on deception.

Folder structures:

  • flat — All files in root
  • by-bookbooks/Title/Title.md
  • by-authorbooks/Author/Title.md
  • by-author-bookbooks/Author/Title/Title.md

Joplin JEX

Importable archive with notebooks, notes, and tags. Uses deterministic IDs for idempotent imports.

Hierarchy:

  • flatKindle Highlights > Book > Notes
  • by-authorKindle Highlights > AUTHOR > Book > Notes

HTML

Standalone page with:

  • Responsive design
  • Dark mode toggle
  • Search/filter
  • Print-friendly styles
  • XSS protection

🌍 Supported Languages

Code Language "Added on" Pattern
en English "Added on Friday, January 1, 2024"
es Spanish "Añadido el viernes, 1 de enero de 2024"
pt Portuguese "Adicionado em sexta-feira, 1 de janeiro de 2024"
de German "Hinzugefügt am Freitag, 1. Januar 2024"
fr French "Ajouté le vendredi 1 janvier 2024"
it Italian "Aggiunto il venerdì 1 gennaio 2024"
zh Chinese "添加于 2024年1月1日星期五"
ja Japanese "追加日 2024年1月1日金曜日"
ko Korean "추가됨 2024년 1월 1일 금요일"
nl Dutch "Toegevoegd op vrijdag 1 januari 2024"
ru Russian "Добавлено пятница, 1 января 2024 г."

Language is auto-detected by analyzing the file content.


🌐 Browser Compatibility

The library is isomorphic — works in both Node.js and browsers:

Environment Parsing Exporting File System
Node.js parseFile() + parseString() ✅ All formats ✅ Native fs
Browser parseString() only ✅ All formats ❌ Use File API

Browser example:

import { parseString, JsonExporter } from 'kindle-tools-ts';

// From <input type="file">
const file = inputElement.files[0];
const content = await file.text();

const result = parseString(content);
const exporter = new JsonExporter();
const json = await exporter.export(result.clippings);

⚙️ Configuration

ParseOptions

interface ParseOptions {
  // Language
  language?: 'auto' | 'en' | 'es' | 'pt' | 'de' | 'fr' | 'it' | 'zh' | 'ja' | 'ko' | 'nl' | 'ru';

  // Processing
  removeDuplicates?: boolean;      // Default: true
  mergeOverlapping?: boolean;      // Default: true
  mergeNotes?: boolean;            // Default: true
  extractTags?: boolean;           // Default: false
  tagCase?: 'original' | 'uppercase' | 'lowercase';  // Default: 'uppercase'
  highlightsOnly?: boolean;        // Default: false

  // Normalization
  normalizeUnicode?: boolean;      // Default: true
  cleanContent?: boolean;          // Default: true
  cleanTitles?: boolean;           // Default: true

  // Filtering
  excludeTypes?: ClippingType[];   // Exclude specific types
  excludeBooks?: string[];         // Exclude books by title
  onlyBooks?: string[];            // Only these books
  minContentLength?: number;       // Minimum content length
}

🧠 Technical Details

Smart Merging

Overlapping highlights are merged using:

  • Location check: Overlap or within 5 characters
  • Content check: Substring match OR >50% word overlap (Jaccard)
  • Result: Longest content, combined range, latest date

Note Linking

Two-phase algorithm:

  1. Range match: Note location within highlight's range
  2. Proximity fallback: Within 10 locations

Quality Flags

Flags added via isSuspiciousHighlight:

  • too_short: < 5 characters
  • fragment: Starts lowercase
  • incomplete: No ending punctuation

Deduplication

Uses deterministic SHA-256 hash of:

  • Normalized title
  • Location
  • Type
  • First 50 chars of content

Same input = same ID = idempotent imports.

Security

  • CSV injection: Fields starting with =+−@ are prefixed with '
  • XSS protection: HTML export escapes all content
  • Schema validation: All inputs validated with Zod

❓ FAQ

Q: Does it modify my original file? A: No. It only reads the file and produces new output.

Q: Can I use it in React/Next.js/Vue? A: Yes! Use parseString for client-side operations.

Q: Why are there *Raw fields? A: They preserve original data for debugging and re-processing.

Q: How do I silence logs? A: setLogger(nullLogger) from the main export.

Q: Can I re-import exported JSON/CSV? A: Yes! Use JsonImporter or CsvImporter.


🛠️ Development

# Clone
git clone https://github.com/KindleTools/kindle-tools-ts.git
cd kindle-tools-ts

# Install
pnpm install

# Test
pnpm test

# Build
pnpm build

# Visual workbench
pnpm run gui   # Opens http://localhost:5173

🤝 Contributing

PRs are welcome! Please:

  1. Run pnpm run check before submitting
  2. Add tests for new features
  3. Follow Conventional Commits

See CONTRIBUTING.md for details.


📄 License

MIT © Andrés M. Jiménez


📖 More Resources

About

A robust TypeScript library to parse and process Amazon Kindle 'My Clippings.txt' files with smart merging & deduplication. Export to Obsidian, Joplin, JSON, CSV, HTML & more.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors