Chat Export Structurer

"The next era of AI won't be won by who prompts best. It will be won by who holds context best."

If you're building with AI, you know the problem: every conversation starts from zero, your best insights get buried in exported JSON, and there's no memory layer connecting it all.

This tool is the foundation. A clean, local-first parser that turns messy AI exports into structured SQLite archives—ready for search, RAG, agents, or whatever you're building next.

Part of MyChatArchive — a full platform for AI memory and context. Launching Q1 2026.

What It Does

Convert messy AI chat exports into clean, queryable SQLite archives. Own your data. Build your memory layer.

Supports ChatGPT, Claude (Anthropic), and Grok exports. No API keys, no cloud services, just local SQLite.

Why Use This

Own your data — Everything stays local. No cloud, no API keys, no tracking
Search everything — Built-in full-text search (FTS5) across all your conversations
Never lose context — SHA1-based deduplication keeps your archive clean across imports
Multi-platform — Works with ChatGPT, Claude, and Grok exports (more coming)
Production-ready — Streaming parser handles multi-GB files without breaking
Safe testing — Preview mode lets you inspect data before writing to database

Installation

git clone https://github.com/1ch1n/chat-export-structurer.git
cd chat-export-structurer
pip install -r requirements.txt

Requirements:

Python 3.8+
ijson for streaming JSON
tqdm for progress indicators (optional)

Quick Start

Get started in under 2 minutes:

1. Export Your Data

ChatGPT:
Settings → Data controls → Export data → Download conversations.json

Anthropic Claude:
Settings → Export data

Grok (X.AI):
Settings → Export conversations

2. Test Import

Try the included sample database:

sqlite3 examples/sample_archive.sqlite "SELECT title, role, text FROM messages LIMIT 5;"

Or test parsing your own export:

python src/ingest.py \
  --in path/to/export.json \
  --format chatgpt \
  --test

3. Import to SQLite

python src/ingest.py \
  --in path/to/export.json \
  --db my_archive.sqlite \
  --format chatgpt

4. Query Your Data

sqlite3 my_archive.sqlite

# Search messages
SELECT role, text, ts FROM messages 
WHERE text LIKE '%python%' 
LIMIT 10;

# Full-text search
SELECT m.text, m.ts 
FROM messages_fts 
JOIN messages_fts_docids d ON messages_fts.rowid = d.rowid
JOIN messages m ON m.message_id = d.message_id
WHERE messages_fts MATCH 'machine learning';

# Count conversations
SELECT COUNT(DISTINCT canonical_thread_id) FROM messages;

Usage

python src/ingest.py --in INPUT --format FORMAT [--db DATABASE] [OPTIONS]

Arguments

Argument	Required	Description
`--in`	Yes	Path to export JSON file
`--format`	Yes	Export format: `chatgpt`, `anthropic`, or `grok`
`--db`	Conditional	SQLite database path (required unless `--test`)
`--test`	No	Preview mode - no database writes
`--account`	No	Account identifier (default: `main`)
`--source-id`	No	Batch ID (default: `src_0001`)

Supported Formats

ChatGPT

python src/ingest.py \
  --in conversations.json \
  --db archive.sqlite \
  --format chatgpt

Anthropic Claude

python src/ingest.py \
  --in claude_export.json \
  --db archive.sqlite \
  --format anthropic

Grok

python src/ingest.py \
  --in grok_export.json \
  --db archive.sqlite \
  --format grok

Combine Multiple Platforms

# Import from different platforms into one database
python src/ingest.py --in chatgpt.json --db unified.sqlite --format chatgpt
python src/ingest.py --in claude.json --db unified.sqlite --format anthropic
python src/ingest.py --in grok.json --db unified.sqlite --format grok

# Duplicates are automatically skipped

Database Schema

Entity Relationship

erDiagram
    messages ||--o{ messages_fts_docids : "indexed_by"
    messages_fts_docids ||--|| messages_fts : "maps_to"
    
    messages {
        TEXT message_id PK
        TEXT canonical_thread_id
        TEXT platform
        TEXT account_id
        TEXT ts
        TEXT role
        TEXT text
        TEXT title
        TEXT source_id
    }
    
    messages_fts {
        INTEGER rowid PK
        TEXT text
    }
    
    messages_fts_docids {
        INTEGER rowid PK
        TEXT message_id FK
    }

`messages` table

CREATE TABLE messages (
  message_id TEXT PRIMARY KEY,
  canonical_thread_id TEXT NOT NULL,
  platform TEXT NOT NULL,
  account_id TEXT NOT NULL,
  ts TEXT NOT NULL,
  role TEXT NOT NULL,
  text TEXT NOT NULL,
  title TEXT,
  source_id TEXT NOT NULL
);

Full-text search

Uses SQLite FTS5 for fast text queries:

messages_fts - Virtual FTS table (indexed text content)
messages_fts_docids - Maps FTS rowids to message IDs for joins

Example Queries

Find questions about a topic

SELECT text, ts FROM messages 
WHERE role = 'user' 
AND text LIKE '%kubernetes%'
ORDER BY ts DESC;

Most active conversations

SELECT title, COUNT(*) as message_count
FROM messages
GROUP BY canonical_thread_id
ORDER BY message_count DESC
LIMIT 10;

Export to CSV

sqlite3 -header -csv archive.sqlite \
  "SELECT * FROM messages WHERE ts >= '2024-01-01'" \
  > 2024_messages.csv

Example Data

The examples/ directory includes:

Sample export files from each platform (JSON format)
sample_archive.sqlite - Pre-built database with 12 messages from all three platforms

Try querying the sample database:

# View all conversations
sqlite3 examples/sample_archive.sqlite "SELECT DISTINCT title, platform FROM messages;"

# Search for specific terms
sqlite3 examples/sample_archive.sqlite "SELECT role, text FROM messages WHERE text LIKE '%learning%';"

What's Next

This parser is the foundation. The full MyChatArchive platform (launching Q1 2025) will add:

Web UI for browsing and filtering your archive
Vector search for semantic queries across conversations
AI synthesis to surface insights and patterns
Enhanced exports to Markdown, CSV, and agent-ready formats

This open-source tool will always remain free and stay at the core of the stack.

Want early access? Star the repo and watch for updates, or check MyChatArchive.com.

Roadmap

Export Structurer (this tool):

ChatGPT, Claude, and Grok parsers
Additional platforms (Gemini, Perplexity, Copilot, etc.)
Advanced deduplication and merge strategies
CLI improvements (progress bars, better error handling)

Contributing

This tool uses a modular parser architecture. Adding support for a new platform is straightforward.

Add a Parser

Create src/parsers/your_platform.py:

from typing import Iterator, Dict

def parse(input_path: str) -> Iterator[Dict]:
    """
    Yield normalized messages with:
    - thread_id: str
    - thread_title: str
    - role: str ("user", "assistant", or "system")
    - content: str
    - created_at: float (Unix timestamp)
    """
    # Your parsing logic
    pass

Register in src/ingest.py:

from parsers import chatgpt, anthropic, grok, your_platform

PARSERS = {
    "chatgpt": chatgpt,
    "anthropic": anthropic,
    "grok": grok,
    "your_platform": your_platform
}

Test it:

python src/ingest.py --in export.json --format your_platform --test

Pull Requests

Test with real exports
Add example file to examples/
Update README
No external API dependencies
Follow existing code style

License

MIT License - free for everyone, including commercial use.

See LICENSE for full terms.

Want the full platform? MyChatArchive.com (launching Q1 2025) will add:

Web UI with zero setup
Vector search and AI synthesis
Team collaboration features
Cloud sync (optional - local-first stays free)

Built by Channing Chasko · MyChatArchive.com (Q1 2025)

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
examples		examples
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

License

1ch1n/chat-export-structurer

Folders and files

Latest commit

History

Repository files navigation

Chat Export Structurer

What It Does

Why Use This

Installation

Quick Start

1. Export Your Data

2. Test Import

3. Import to SQLite

4. Query Your Data

Usage

Arguments

Supported Formats

ChatGPT

Anthropic Claude

Grok

Combine Multiple Platforms

Database Schema

Entity Relationship

messages table

Full-text search

Example Queries

Find questions about a topic

Most active conversations

Export to CSV

Example Data

What's Next

Roadmap

Contributing

Add a Parser

Pull Requests

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

`messages` table

Packages