Skip to content

Latest commit

 

History

History
268 lines (187 loc) · 7.74 KB

File metadata and controls

268 lines (187 loc) · 7.74 KB

📧 Email Archive Parser

npm TypeScript License

The most comprehensive TypeScript library for parsing email archives and extracting valuable insights.

🔍 Intelligent Detection • 📧 Multi-Format Support • ⚡ Memory Efficient • 🌐 Cross-Platform

InstallationQuick StartUse CasesAPI Reference


✨ What This Library Can Do

Email Archive Parser is a powerful, modern TypeScript library that goes beyond simple email parsing. It intelligently analyzes your email archives to extract:

📧 Email Archive Parsing

  • OLM Files - Outlook for Mac archives (.olm) with contacts & calendar events
  • MBOX Files - Gmail Takeout, Thunderbird, Apple Mail (.mbox)
  • Unlimited File Sizes - Stream processing handles multi-GB files (tested with 2.4GB+)
  • Gmail Labels - Automatic label extraction (Inbox, Starred, Categories, etc.)
  • Contact Extraction - Automatically builds contact list from email senders
  • MIME Support - Parse multipart emails, attachments, HTML content

🧠 Intelligent Detection Engines

  • 🔍 Account Detection - 100+ services (Netflix, GitHub, Amazon, etc.)
  • 🛒 Purchase Detection - Orders, receipts, invoices with multi-currency support
  • 🔄 Subscription Detection - Recurring services, billing cycles, renewal dates
  • 📰 Newsletter Detection - Newsletters, promotional emails, frequency analysis

📊 Data Extraction & Analysis

  • Smart Categorization - Automatically classify emails by type
  • Financial Tracking - Sum purchases, identify spending patterns
  • Service Inventory - Complete list of accounts and subscriptions
  • Email Statistics - Read/unread status, folder distribution, sender analysis

Performance & Reliability

  • Memory Efficient - Stream processing for large files
  • Cross-Platform - Node.js and browser environments
  • TypeScript First - Full type safety and IntelliSense
  • Minimal Dependencies - Only jszip for archive extraction

🔒 Privacy First

  • Local Processing - All analysis happens on your device
  • No Data Transmission - Emails never leave your computer
  • Open Source - Transparent, auditable code

📦 Installation

npm install @technical-1/email-archive-parser
yarn add @technical-1/email-archive-parser
pnpm add @technical-1/email-archive-parser

📁 Examples

The /examples directory contains ready-to-use code samples:

Example Description
react-demo/ Complete React app - Lift and shift into your project!
quick-start-react.tsx Simple React component for quick integration
basic-usage.ts General usage patterns for both formats
olm-usage.ts Outlook-specific features
mbox-usage.ts Gmail-specific features
with-detectors.ts Detection examples

React Demo (Recommended)

A complete React application with IndexedDB storage that handles files of any size:

cd examples/react-demo
npm install
npm run dev

Features:

  • 📧 Parse OLM and MBOX files of any size
  • 💾 IndexedDB storage (no memory limits)
  • 🔍 Search and pagination
  • 📬 Email detail view
  • 👥 Contacts list
  • 📅 Calendar events
  • 🗑️ Clear data button
  • 🎨 Tailwind CSS styling

Copy the src/ folder into your React project to use!

🚀 Quick Start

⚡ Simplest Possible Integration (Copy & Paste)

React / Next.js / Vite:

import { parseArchive } from '@technical-1/email-archive-parser';

// In your component:
const handleUpload = async (e) => {
  const file = e.target.files[0];
  const result = await parseArchive(file);
  console.log(result.emails); // Your emails!
};

return <input type="file" accept=".olm,.mbox" onChange={handleUpload} />;

Vanilla JavaScript:

<input type="file" id="upload" accept=".olm,.mbox">
<script type="module">
  import { parseArchive } from '@technical-1/email-archive-parser';
  
  document.getElementById('upload').onchange = async (e) => {
    const result = await parseArchive(e.target.files[0]);
    console.log(result.emails); // Your emails!
  };
</script>

Node.js (for any file size):

import { MBOXParser, OLMParser } from '@technical-1/email-archive-parser';

// Parse a 5GB MBOX file with streaming - no memory issues!
const parser = new MBOXParser();
const result = await parser.parseFile('/path/to/huge-archive.mbox');
console.log(result.emails);

🌐 Building a Web App? Use the React Demo!

For production web applications, check out our complete React implementation in examples/react-demo/. It includes:

  • IndexedDB storage - Handles files of any size without memory issues
  • Streaming parsing - Saves to database during parsing, not after
  • Ready-to-use components - EmailList, EmailDetail, ContactList, CalendarList
  • Custom React hook - useEmailDB for all database operations
  • Tailwind CSS styling - Modern, responsive UI
# Try it out
cd examples/react-demo
npm install
npm run dev

Lift and shift the src/ folder into your own React/Next.js/Vite project!


📖 API Reference

For detailed API documentation, advanced examples, and use cases, see API.md.


📊 Performance & Benchmarks

File Size Support

File Size Memory Usage Processing Time Method
< 20MB Normal < 5 seconds Standard parsing
20MB - 500MB Moderate 10-60 seconds Standard parsing
500MB - 2GB Low 1-5 minutes Streaming parsing
> 2GB Very Low 5+ minutes Streaming parsing

Detection Accuracy

Detector Precision Recall Sample Size
Accounts 92% 88% 1,000+ emails
Purchases 94% 91% 500+ transactions
Subscriptions 89% 95% 200+ services
Newsletters 96% 87% 800+ emails

Supported Email Formats

Format Extensions Source Features
OLM .olm Outlook for Mac Full support: emails, contacts, calendar
MBOX .mbox Gmail Takeout Full support + Gmail labels
MBOX .mbox Thunderbird Full support + folder structure
MBOX .mbox Apple Mail Full support
MBOX .mbx Various clients Basic support

Email Content Support

  • Plain Text emails
  • HTML emails with content extraction
  • MIME Multipart (text + HTML + attachments)
  • Quoted-Printable encoding
  • Base64 encoding
  • UTF-8 and international character sets
  • File Attachments (metadata extraction)
  • Email Threads (conversation grouping)

🧪 Development

# Install dependencies
npm install

# Build
npm run build

# Watch mode
npm run dev

# Run tests
npm test

🔐 Privacy

This library processes all data locally. No email content is ever sent to external servers.


📄 License

MIT License - see LICENSE for details.


🙏 Acknowledgments


Made by Jacob Kanfer