A privacy-first workflow for processing documents, videos, podcasts, and RSS feeds with local AI. Designed for a Mac with Apple Silicon; no cloud storage for your research data.
Every source — paper, podcast, video, RSS article — passes through three explicit phases:
| Phase | Goal | How |
|---|---|---|
| 1 — Cast wide | Capture from three sources into Zotero _inbox |
Feedreader — feedreader-score.py runs daily, scores RSS/YouTube/podcast items by semantic similarity to your library, and produces a filtered HTML reader and Atom feed at http://localhost:8765/filtered.html; interesting items go to _inbox via browser extension or iOS app · Share sheet — content you've already consumed in apps (browser, YouTube, podcasts) goes directly to _inbox via the iOS share sheet · Other — documents, emails, and notes added manually |
| 2 — Filter | You decide what enters the vault | index-score.py ranks inbox items by semantic similarity to your existing library; Qwen3.5:9b (local) generates a 2–3 sentence summary per item; you give a Go or No-go |
| 3 — Process | Full processing of approved items | Qwen3.5:9b (local) writes a structured literature note to the Obsidian vault including key findings, methodology notes, relevant quotes, and flashcards for spaced repetition |
The explicit filter step between capture and processing keeps both your feed reader and your vault clean: only sources you have consciously approved end up in the vault, and your feed reader only shows items that are likely relevant.
| Tool | Role | Local / Cloud |
|---|---|---|
| Zotero | Reference manager and central inbox | Local |
| Zotero MCP | Connects Claude Code to your Zotero library via local API | Local |
| Obsidian | Markdown-based note-taking and knowledge base | Local |
| Ollama | Local language model for offline tasks | Local |
| yt-dlp | Download YouTube transcripts and podcast audio | Local |
| youtube-transcript-api | Fast transcript fetching for feedreader YouTube scoring (no video download) | Local |
| whisper.cpp | Local speech-to-text transcription for podcasts | Local |
| NetNewsWire | RSS reader subscribed to the feedreader filtered feed | Local |
| Claude Code | AI assistant that orchestrates the workflow; generative work runs locally via Qwen3.5:9b (Ollama) | Local (default) / Cloud API with --hd |
In standard mode, only orchestration instructions are sent to the Anthropic API; all generative work is handled locally by Qwen3.5:9b. Only when --hd is explicitly requested do the prompt and source content go to the Anthropic API (Claude Sonnet 4.6). Reference data, notes, and transcriptions always stay local.
ResearchVault/
├── literature/ # One note per approved source
├── syntheses/ # Thematic syntheses across multiple sources
├── projects/ # Project-specific documentation
├── daily/ # Daily notes and log
├── inbox/ # Raw input awaiting processing
├── CLAUDE.md # Workflow instructions for Claude Code
└── .claude/
├── index-score.py # Relevance scoring for _inbox items (phase 2)
├── fetch-fulltext.py # Fetch Zotero attachment text to a local file (no content returned)
├── ollama-generate.py # Call Ollama REST API and write output to file
├── zotero-inbox.py # List all items in Zotero _inbox (human-readable or JSON)
├── process_item.py # Privacy-preserving subagent: item key + metadata → literature note
├── summarize_item.py # Privacy-preserving subagent: item key + metadata → compact summary for Go/No-go
├── zotero_utils.py # Shared Zotero SQLite helpers (make_sqlite_copy, get_library_keys_with_weights)
├── feedreader-score.py # RSS feed scoring and filtered feed generation (feedreader)
├── feedreader_core.py # Shared scoring functions (cosine similarity, profile, source type detection)
├── feedreader-server.py # Local HTTP server (port 8765) + POST /skip
├── feedreader-learn.py # Learning loop: processes skip queue + threshold calibration
├── feedreader-list.txt # List of RSS feed URLs (web, YouTube, podcast)
├── score_log.jsonl # Running log of scored feed items (incl. source_type, skipped flag)
├── skip_queue.jsonl # Queue of explicitly rejected items (👎); processed daily
├── transcript_cache/ # Transcript & show-notes cache (YouTube: {video_id}.json; podcast: podcast_{episode_id}.json)
└── skills/
└── SKILL.md # Workflow skill (loaded each session)
The feedreader runs automatically — feedreader-score.py is triggered daily at 06:00 by a launchd agent, scores all feeds in feedreader-list.txt, and updates the filtered feed at http://localhost:8765/filtered.html. No manual action required.
Your daily session:
- Browse the filtered feed at
http://localhost:8765/filtered.html(or in NetNewsWire viahttp://localhost:8765/filtered.xml). Items are sorted by relevance score; interesting ones go to Zotero_inboxvia the browser extension or iOS app. - Open Terminal, navigate to your vault, and start Claude Code:
cd ~/Documents/ResearchVault claude
- Activate the research workflow:
or just type:
/researchstart research workflow - Optionally, run
index-score.pyfirst to prioritize your review:This ranks all~/.local/share/uv/tools/zotero-mcp-server/bin/python3 .claude/index-score.py_inboxitems by semantic similarity to your existing library (using the ChromaDB embeddings from zotero-mcp), so you know which items to focus on. - Claude Code retrieves all items from your Zotero
_inboxand presents each one with a short summary and relevance assessment — the summary is generated locally by Qwen3.5:9b. You respond Go or No-go per item. - For each Go: Claude Code moves the item to the correct Zotero collection and writes a structured literature note in
literature/. - For each No-go: Claude Code removes the item from
_inbox(after your confirmation). - At the end of the session, Claude Code shows a summary: X approved, Y removed. The Zotero semantic search database is updated automatically every day at 05:45 by the
nl.researchvault.zotero-updatelaunchd agent — no manual action needed. If you process items later in the day and want the database to reflect them immediately, run:Or use the alias:zotero-mcp update-db --fulltext # recommended (includes full text, 5–20 min on Apple Silicon)update-zotero. Check database status withzotero-mcp db-status.
Full step-by-step instructions covering all tools, configuration, and the first test run are published interactively at pjastam.github.io/ResearchVault. A single-file download is also available: installation-guide-v1.12.md.
To configure Claude Code's permission settings for this vault, run the setup script from your vault directory:
./setup.shThe script auto-detects your home path and asks for your Zotero library ID (found via zotero-mcp setup-info).
- Your Zotero library and Obsidian vault stay entirely on your own machine
- The Zotero local API is only accessible via
localhost - Transcription (whisper.cpp) and local model inference (Ollama) run fully offline
- In standard mode, only orchestration instructions reach the Anthropic API; source content stays local
- With
--hd, the prompt and source content are sent to the Anthropic API (Claude Sonnet 4.6) - For a fully local orchestration alternative, see Step 15: Future perspective — local orchestrator
- Does content go to the cloud?
In the default mode: no. Claude Code orchestrates the workflow, but all content-heavy work is delegated to process_item.py — a local subagent that receives only a Zotero item key and metadata (title, authors, year, tags). The subagent fetches the full text locally, generates the literature note via Qwen3.5:9b (Ollama), and writes the .md file to literature/. Claude Code receives only a JSON status object: {"status": "ok", "path": "literature/..."}. No source content ever reaches Anthropic's servers. Only when you explicitly add --hd does source content go to the Anthropic API — and Claude Code asks for confirmation first.
- Do you need a paid Claude subscription?
Partially yes — Claude Code needs an Anthropic account (paid subscription or API credits) for its orchestration role. But the AI that actually reads and processes your research is Ollama + Qwen3.5:9b, which is completely free and open source. So the heavy lifting costs nothing.
- "No data leaks" — is that accurate?
Yes, substantially. In default mode no vault content, paper, transcript, or note leaves your local machine. The privacy claim holds up in this sense.
MIT — feel free to adapt this workflow for your own research setup.