A robust TypeScript library to parse and process Amazon Kindle My Clippings.txt files with smart merging, deduplication, and multiple export formats.
👉 Live Demo / Docs: https://kindletools.github.io/kindle-tools-ts/
v1.0 - Feature Complete. This library is stable and production-ready. Only accepting bug fixes.
npm install kindle-tools-tsimport { parseFile } from 'kindle-tools-ts/node';
import { JsonExporter } from 'kindle-tools-ts';
const result = await parseFile('./My Clippings.txt');
console.log(`Found ${result.stats.total} clippings`);
const exporter = new JsonExporter();
const json = await exporter.export(result.clippings, { pretty: true });- Why This Library?
- Features
- Installation
- Quick Start
- API Reference
- Export Formats
- Supported Languages
- Browser Compatibility
- Configuration
- Technical Details
- FAQ
- Development
- Contributing
- License
| Problem | Solution |
|---|---|
| Duplicate entries when you re-highlight or edit highlights | Smart deduplication with deterministic IDs |
| Overlapping highlights when extending a selection in Kindle | Automatic merging keeps the longest version |
| Notes disconnected from their highlights | Intelligent linking based on location proximity |
Messy titles with _EBOK, (Spanish Edition), .mobi noise |
Advanced cleaning preserves original in titleRaw |
| Multi-language files (device language changes) | Automatic detection for 11 languages |
PDF artifacts like pro-\nblem word breaks |
De-hyphenation and text cleaning |
| Export lock-in to a single format | 6 formats: JSON, CSV, Markdown, Obsidian, Joplin, HTML |
Perfect for:
- 📚 Obsidian/Joplin users — Import highlights directly into your vault/notebooks
- 🤖 Automation enthusiasts — Integrate into your scripts and workflows
- 💻 Developers — Full TypeScript types, tree-shakeable, dual ESM/CJS build
- 🌐 Multilingual readers — Works with Kindle devices in any supported language
| Feature | Description |
|---|---|
| 🌍 Multi-language support | EN, ES, PT, DE, FR, IT, ZH, JA, KO, NL, RU |
| 🔍 Auto language detection | Automatically detects your file's language |
| 🧠 Smart merging | Merges overlapping highlights when you extend a selection |
| 🔄 Deduplication | Removes exact duplicates with deterministic IDs |
| 🔗 Note linking | Links notes to their associated highlights |
| 🏷️ Tag extraction | Extracts tags from notes (comma/semicolon separated) |
| Feature | Description |
|---|---|
| 🧹 Text cleaning | De-hyphenation, space normalization, edition markers removal |
| Detects suspicious highlights (accidental, incomplete) | |
| 📊 Fuzzy duplicates | Jaccard similarity to find near-duplicates |
| 🛡️ CSV protection | Formula injection protection (OWASP compliant) |
| Format | Description |
|---|---|
| JSON | Standard JSON with metadata |
| CSV | Excel-compatible with BOM |
| Markdown | Blockquotes with Handlebars templates |
| Obsidian | YAML frontmatter, folder structures |
| Joplin JEX | Importable archive with notebooks |
| HTML | Standalone page with dark mode & search |
| Feature | Description |
|---|---|
| 📘 TypeScript-first | Full type definitions with strict mode |
| 🪶 Lightweight | 7 runtime dependencies |
| 📥 Multiple inputs | Parse TXT, re-import JSON/CSV |
| 🌐 Isomorphic | Works in Node.js and browsers |
| 🔒 Non-destructive | Always preserves original data (*Raw fields) |
npm install kindle-tools-ts
# or
pnpm add kindle-tools-ts
# or
yarn add kindle-tools-ts- Node.js: 18.18.0+ (use
.nvmrcfor consistency) - Platforms: Windows, macOS, Linux
- TypeScript: 5.0+ (optional, JS works too)
import { JsonExporter } from 'kindle-tools-ts';
import { parseFile } from 'kindle-tools-ts/node';
// Parse your clippings file
const result = await parseFile('./My Clippings.txt');
console.log(`Found ${result.stats.total} clippings from ${result.stats.totalBooks} books`);
console.log(` - Highlights: ${result.stats.totalHighlights}`);
console.log(` - Notes: ${result.stats.totalNotes}`);
console.log(` - Bookmarks: ${result.stats.totalBookmarks}`);
// Export to JSON
const exporter = new JsonExporter();
const jsonOutput = await exporter.export(result.clippings, { pretty: true });
console.log(jsonOutput.output);import { parseString } from 'kindle-tools-ts';
const content = `
The Art of War (Sun Tzu)
- Your Highlight on page 42 | Location 100-105 | Added on Friday, January 1, 2024 10:30:45 AM
All warfare is based on deception.
==========
`;
const result = parseString(content);
console.log(result.clippings[0]);
// {
// id: 'a6439ae5832a',
// title: 'The Art of War',
// author: 'Sun Tzu',
// content: 'All warfare is based on deception.',
// type: 'highlight',
// page: 42,
// location: { raw: '100-105', start: 100, end: 105 },
// ...
// }import { MarkdownExporter, ObsidianExporter, JoplinExporter } from 'kindle-tools-ts';
import { parseFile } from 'kindle-tools-ts/node';
const result = await parseFile('./My Clippings.txt');
// Markdown
const mdExporter = new MarkdownExporter();
const markdown = await mdExporter.export(result.clippings);
// Obsidian (one file per book)
const obsidianExporter = new ObsidianExporter();
const obsidian = await obsidianExporter.export(result.clippings, {
outputPath: './vault/books/',
folderStructure: 'by-author'
});
// Joplin JEX archive
const joplinExporter = new JoplinExporter();
const jex = await joplinExporter.export(result.clippings, {
outputPath: './clippings.jex'
});Parse a Kindle clippings file from disk. Node.js only.
import { parseFile } from 'kindle-tools-ts/node';
const result = await parseFile('./My Clippings.txt', {
language: 'auto', // 'auto' | 'en' | 'es' | 'pt' | ...
removeDuplicates: true, // Remove exact duplicates
mergeOverlapping: true, // Merge extended highlights
mergeNotes: true, // Link notes to highlights
extractTags: false, // Extract tags from notes
tagCase: 'lowercase', // 'original' | 'uppercase' | 'lowercase'
});Parse clippings from a string. Works in browser.
import { parseString } from 'kindle-tools-ts';
const result = parseString(fileContent, { language: 'en' });interface ParseResult {
clippings: Clipping[]; // Parsed clippings
stats: ClippingsStats; // Statistics
warnings: ParseWarning[]; // Any parsing warnings
meta: {
fileSize: number; // Original file size
parseTime: number; // Parse time in ms
detectedLanguage: string; // Detected language code
totalBlocks: number; // Total blocks found
parsedBlocks: number; // Successfully parsed
};
}interface Clipping {
id: string; // Deterministic hash (12 chars)
title: string; // Clean book title
author: string; // Extracted author
content: string; // Highlight/note content
type: 'highlight' | 'note' | 'bookmark' | 'clip' | 'article';
page: number | null;
location: { raw: string; start: number; end: number | null };
date: Date | null;
dateRaw: string;
isLimitReached: boolean; // DRM limit detected
isEmpty: boolean;
language: string;
source: 'kindle' | 'sideload';
wordCount: number;
charCount: number;
note?: string; // Linked note content
tags?: string[]; // Extracted tags
linkedNoteId?: string;
linkedHighlightId?: string;
}All exporters implement the same interface:
interface Exporter {
name: string;
extension: string;
export(clippings: Clipping[], options?: ExporterOptions): Promise<ExportResult>;
}interface ExporterOptions {
outputPath?: string; // Output file/directory
groupByBook?: boolean; // Group by book
includeStats?: boolean; // Include statistics
pretty?: boolean; // Pretty print (JSON)
includeRaw?: boolean; // Include *Raw fields
folderStructure?: 'flat' | 'by-book' | 'by-author' | 'by-author-book';
authorCase?: 'original' | 'uppercase' | 'lowercase';
includeClippingTags?: boolean; // Include extracted tags
title?: string; // Custom export title
creator?: string; // Custom author
}| Exporter | Format | Output |
|---|---|---|
JsonExporter |
JSON | Single file, optional grouping |
CsvExporter |
CSV | Excel-compatible with BOM |
MarkdownExporter |
Markdown | Single file with blockquotes |
ObsidianExporter |
Obsidian MD | Multiple files with YAML frontmatter |
JoplinExporter |
JEX | Importable Joplin archive |
HtmlExporter |
HTML | Standalone page with dark mode |
Re-process previously exported files:
import { JsonImporter, CsvImporter } from 'kindle-tools-ts';
// Import from JSON
const jsonImporter = new JsonImporter();
const result = await jsonImporter.import(jsonContent);
// Import from CSV (tolerates fuzzy headers)
const csvImporter = new CsvImporter();
const csvResult = await csvImporter.import(csvContent);Inject your own logger (Pino, Winston, etc.):
import { setLogger, resetLogger, nullLogger } from 'kindle-tools-ts';
// Custom logger
setLogger({
error: (entry) => myLogger.error(entry),
warn: (entry) => myLogger.warn(entry),
info: (msg, ctx) => myLogger.info(ctx, msg),
debug: (msg, ctx) => myLogger.debug(ctx, msg),
});
// Silence all logs
setLogger(nullLogger);
// Reset to console
resetLogger();import { parseFile, isImportError, isValidationError } from 'kindle-tools-ts';
try {
const result = await parseFile('file.txt');
if (result.isErr()) {
console.error(`[${result.error.code}] ${result.error.message}`);
}
} catch (error) {
if (error instanceof AppException) {
if (isImportError(error.appError)) {
console.error('Import error:', error.appError.detailedMessage);
}
}
}| Domain | Code | Description |
|---|---|---|
| Import | IMPORT_PARSE_ERROR |
Failed to parse content |
| Import | IMPORT_EMPTY_FILE |
Empty file |
| Import | IMPORT_INVALID_FORMAT |
Schema validation failed |
| Export | EXPORT_UNKNOWN_FORMAT |
Unsupported format |
| Export | EXPORT_WRITE_FAILED |
Failed to write output |
| Export | EXPORT_TEMPLATE_ERROR |
Template compilation failed |
For advanced use cases, you can call the processor directly:
import { processClippings, parseString } from 'kindle-tools-ts';
// Parse without processing
const raw = parseString(content, {
removeDuplicates: false,
mergeOverlapping: false,
mergeNotes: false
});
// Process manually with custom options
const processed = processClippings(raw.clippings, {
removeDuplicates: true,
mergeOverlapping: true,
extractTags: true,
tagCase: 'lowercase',
});
console.log(`Duplicates removed: ${processed.duplicatesRemoved}`);
console.log(`Highlights merged: ${processed.mergedHighlights}`);
console.log(`Notes linked: ${processed.linkedNotes}`);interface ProcessResult {
clippings: Clipping[]; // Processed clippings
duplicatesRemoved: number; // Exact duplicates removed
mergedHighlights: number; // Overlapping merges
linkedNotes: number; // Notes linked to highlights
emptyRemoved: number; // Empty clippings removed
suspiciousFlagged: number; // Flagged as suspicious
fuzzyDuplicatesFlagged: number; // Flagged as fuzzy duplicates
tagsExtracted: number; // Clippings with tags extracted
notesConsumed: number; // Notes removed by mergedOutput
}Each exporter returns a different output type:
// String-based exporters (JSON, CSV, Markdown, HTML)
interface ExportResultString {
output: string; // The exported content
stats?: ClippingsStats; // Optional statistics
}
// File-based exporters (Obsidian)
interface ExportResultFiles {
files: ExportedFile[]; // Array of files
stats?: ClippingsStats;
}
// ExportedFile: { path: string; content: string }
// Binary exporters (Joplin JEX)
interface ExportResultBinary {
output: Uint8Array; // TAR archive bytes
stats?: ClippingsStats;
}Usage example:
// JSON returns string
const json = await new JsonExporter().export(clippings);
console.log(json.output); // string
// Obsidian returns files array
const obsidian = await new ObsidianExporter().export(clippings);
for (const file of obsidian.files) {
await fs.writeFile(file.path, file.content);
}
// Joplin returns Uint8Array
const jex = await new JoplinExporter().export(clippings);
await fs.writeFile('export.jex', jex.output);Utility functions are grouped under the Utils namespace:
import { Utils } from 'kindle-tools-ts';
// Text utilities
Utils.normalizeText(" multiple spaces "); // "multiple spaces"
Utils.normalizeUnicode("café"); // NFC normalized
// Similarity
Utils.jaccardSimilarity("hello world", "world hello"); // 1.0
// Page utilities
Utils.formatPage(42); // "[0042]"
Utils.estimatePageFromLocation(160); // 10 (160 / 16)
Utils.getPageInfo(clipping); // { page: 42, estimated: false }
// Date utilities
Utils.formatDateHuman(new Date()); // "January 18, 2024"
// Geo utilities
Utils.formatLocation({ lat: 40.7, lon: -74 }); // "40.7, -74"Available modules:
| Module | Functions |
|---|---|
Utils.normalizeText() |
Collapse whitespace, trim |
Utils.normalizeUnicode() |
NFC normalization |
Utils.jaccardSimilarity() |
Word overlap (0-1) |
Utils.formatPage() |
Zero-padded [0042] |
Utils.estimatePageFromLocation() |
Location ÷ 16 |
Utils.getPageInfo() |
Page with estimation flag |
Utils.formatDateHuman() |
Human-readable date |
Utils.formatLocation() |
Geo coordinates |
{
"clippings": [
{
"id": "a6439ae5832a",
"title": "The Art of War",
"author": "Sun Tzu",
"content": "All warfare is based on deception.",
"type": "highlight",
"page": 42
}
],
"meta": { "total": 1 }
}Excel-compatible with BOM for UTF-8:
id,title,author,type,page,location,date,content,wordCount
a6439ae5832a,The Art of War,Sun Tzu,highlight,42,100-105,2024-01-01,"All warfare...",6# Kindle Highlights
## The Art of War
*Sun Tzu*
> All warfare is based on deception.
> — Page 42, Location 100-105const exporter = new MarkdownExporter();
// Use presets
exporter.export(clippings, { templatePreset: 'obsidian' });
// Presets: 'default', 'minimal', 'obsidian', 'notion', 'academic', 'compact', 'verbose'
// Custom templates
exporter.export(clippings, {
customTemplates: {
clipping: `> {{content}}\n> *({{page}})*\n`,
book: `# {{title}}\n\n{{#each clippings}}{{> clipping}}{{/each}}`,
}
});Template Variables:
| Variable | Description |
|---|---|
content |
Highlight text |
title |
Book title |
author |
Book author |
page |
Page number |
location |
Location string |
date |
Formatted date |
tags |
Extracted tags array |
tagsHashtags |
#tag1 #tag2 |
note |
Linked note content |
hasNote |
Boolean |
hasTags |
Boolean |
One file per book with YAML frontmatter:
---
title: "The Art of War"
author: "Sun Tzu"
source: kindle
total_highlights: 25
tags:
- strategy
---
# The Art of War
**Author:** [[Sun Tzu]]
## 📝 Highlights
> [!quote] Page 42, Location 100-105
> All warfare is based on deception.Folder structures:
flat— All files in rootby-book—books/Title/Title.mdby-author—books/Author/Title.mdby-author-book—books/Author/Title/Title.md
Importable archive with notebooks, notes, and tags. Uses deterministic IDs for idempotent imports.
Hierarchy:
flat—Kindle Highlights > Book > Notesby-author—Kindle Highlights > AUTHOR > Book > Notes
Standalone page with:
- Responsive design
- Dark mode toggle
- Search/filter
- Print-friendly styles
- XSS protection
| Code | Language | "Added on" Pattern |
|---|---|---|
en |
English | "Added on Friday, January 1, 2024" |
es |
Spanish | "Añadido el viernes, 1 de enero de 2024" |
pt |
Portuguese | "Adicionado em sexta-feira, 1 de janeiro de 2024" |
de |
German | "Hinzugefügt am Freitag, 1. Januar 2024" |
fr |
French | "Ajouté le vendredi 1 janvier 2024" |
it |
Italian | "Aggiunto il venerdì 1 gennaio 2024" |
zh |
Chinese | "添加于 2024年1月1日星期五" |
ja |
Japanese | "追加日 2024年1月1日金曜日" |
ko |
Korean | "추가됨 2024년 1월 1일 금요일" |
nl |
Dutch | "Toegevoegd op vrijdag 1 januari 2024" |
ru |
Russian | "Добавлено пятница, 1 января 2024 г." |
Language is auto-detected by analyzing the file content.
The library is isomorphic — works in both Node.js and browsers:
| Environment | Parsing | Exporting | File System |
|---|---|---|---|
| Node.js | ✅ parseFile() + parseString() |
✅ All formats | ✅ Native fs |
| Browser | ✅ parseString() only |
✅ All formats | ❌ Use File API |
Browser example:
import { parseString, JsonExporter } from 'kindle-tools-ts';
// From <input type="file">
const file = inputElement.files[0];
const content = await file.text();
const result = parseString(content);
const exporter = new JsonExporter();
const json = await exporter.export(result.clippings);interface ParseOptions {
// Language
language?: 'auto' | 'en' | 'es' | 'pt' | 'de' | 'fr' | 'it' | 'zh' | 'ja' | 'ko' | 'nl' | 'ru';
// Processing
removeDuplicates?: boolean; // Default: true
mergeOverlapping?: boolean; // Default: true
mergeNotes?: boolean; // Default: true
extractTags?: boolean; // Default: false
tagCase?: 'original' | 'uppercase' | 'lowercase'; // Default: 'uppercase'
highlightsOnly?: boolean; // Default: false
// Normalization
normalizeUnicode?: boolean; // Default: true
cleanContent?: boolean; // Default: true
cleanTitles?: boolean; // Default: true
// Filtering
excludeTypes?: ClippingType[]; // Exclude specific types
excludeBooks?: string[]; // Exclude books by title
onlyBooks?: string[]; // Only these books
minContentLength?: number; // Minimum content length
}Overlapping highlights are merged using:
- Location check: Overlap or within 5 characters
- Content check: Substring match OR >50% word overlap (Jaccard)
- Result: Longest content, combined range, latest date
Two-phase algorithm:
- Range match: Note location within highlight's range
- Proximity fallback: Within 10 locations
Flags added via isSuspiciousHighlight:
too_short: < 5 charactersfragment: Starts lowercaseincomplete: No ending punctuation
Uses deterministic SHA-256 hash of:
- Normalized title
- Location
- Type
- First 50 chars of content
Same input = same ID = idempotent imports.
- CSV injection: Fields starting with
=+−@are prefixed with' - XSS protection: HTML export escapes all content
- Schema validation: All inputs validated with Zod
Q: Does it modify my original file? A: No. It only reads the file and produces new output.
Q: Can I use it in React/Next.js/Vue?
A: Yes! Use parseString for client-side operations.
Q: Why are there *Raw fields?
A: They preserve original data for debugging and re-processing.
Q: How do I silence logs?
A: setLogger(nullLogger) from the main export.
Q: Can I re-import exported JSON/CSV?
A: Yes! Use JsonImporter or CsvImporter.
# Clone
git clone https://github.com/KindleTools/kindle-tools-ts.git
cd kindle-tools-ts
# Install
pnpm install
# Test
pnpm test
# Build
pnpm build
# Visual workbench
pnpm run gui # Opens http://localhost:5173PRs are welcome! Please:
- Run
pnpm run checkbefore submitting - Add tests for new features
- Follow Conventional Commits
See CONTRIBUTING.md for details.
MIT © Andrés M. Jiménez
- ARCHITECTURE.md — Technical deep-dive for contributors
- ROADMAP.md — Project status and future plans
- CHANGELOG.md — Version history