Skip to content

Features

inactinique edited this page Jan 28, 2026 · 7 revisions

ClioDeck Features

Version: 1.0.0-rc.1

This document provides an overview of ClioDeck's main features.


Bibliography Management

Zotero Integration

ClioDeck integrates seamlessly with Zotero for bibliography management:

  • Import from Zotero: Connect to your Zotero library and import collections with a single click
  • Bidirectional Sync: Detect changes (additions, modifications, deletions) between local and Zotero bibliographies
  • Conflict Resolution: Three strategies - Remote Wins, Local Wins, or Manual selection
  • PDF Download: Automatically download PDFs from Zotero attachments
  • Collection Selection: Choose which Zotero collection to sync
  • Group Support: Access Zotero group libraries

BibTeX Support

  • Import: Load existing .bib files
  • Export: Export bibliography to BibTeX format with all metadata preserved
  • Round-trip: Full preservation of custom fields, tags, and notes during import/export

PDF Management

  • Automatic Indexing: Index PDFs for semantic search using RAG
  • Batch Download: Download all missing PDFs from Zotero
  • Orphan Detection: Find and clean up PDFs not linked to any citation
  • Re-indexation: Detect modified PDFs and propose re-indexing
  • Archive Option: Safely move orphan PDFs to archive folder instead of deleting

Tags and Metadata

  • Custom Tags: Organize citations with user-defined tags
  • Tag Filtering: Filter bibliography by one or more tags
  • Custom Fields: Store additional metadata not covered by BibTeX
  • Notes: Add personal notes to citations
  • Date Tracking: Automatic timestamps for added/modified citations

Statistics Dashboard

Interactive statistics with 4 tabs:

  • Overview: Total counts, year range, PDF coverage, publication types
  • Authors: Top 15 authors, collaboration metrics, publication years
  • Publications: Top journals, yearly distribution histogram
  • Timeline: Cumulative and annual publication trends

Primary Sources (Tropy Integration)

ClioDeck integrates with Tropy for managing primary sources:

  • Import Tropy Projects: Read .tropy packages and .tpy databases
  • Metadata Sync: Import title, date, creator, archive, collection, tags
  • Transcription Support: Import transcriptions from Tropy notes, OCR (Tesseract), or Transkribus
  • Unified RAG: Search both secondary sources (PDFs) and primary sources (Tropy) together
  • Auto-sync: Detect changes in Tropy files and propose re-synchronization
  • OCR Pipeline: Built-in Tesseract.js for images without transcription

AI-Powered Research Assistant

RAG (Retrieval-Augmented Generation)

  • Semantic Search: Query your indexed PDFs and primary sources using natural language
  • Context Retrieval: Automatically retrieves relevant passages from your corpus
  • Source Citations: Every answer includes references to source documents
  • Multi-source Search: Combines results from PDFs and Tropy sources
  • Configurable Parameters: Adjust topK, similarity threshold, chunking strategy
  • Query Embedding Cache: Optimized performance with LRU cache (500 entries, 60min TTL)
  • Context Compression: Automatic compression when context exceeds LLM limits
  • RAG Explanation: Transparency about retrieval process (chunks used, compression ratio, source types)
  • Stream Cancellation: Cancel ongoing generation at any time

LLM Integration

  • Ollama Support: Use local LLMs via Ollama (default: gemma2:2b)
  • Embedded LLM: Download and run models directly (Qwen2.5-0.5B, 1.5B) for offline use
  • Auto Fallback: Automatically switches between Ollama and embedded model
  • Claude/OpenAI: Connect to cloud LLM providers (optional)
  • System Prompts: Customizable system prompts in French, English, and German

Hybrid Search

  • HNSW Index: Fast approximate nearest neighbor search (~15ms for 50k chunks)
  • BM25 Search: Keyword-based search for proper nouns and acronyms
  • RRF Fusion: Reciprocal Rank Fusion combining both approaches (60% dense / 40% sparse)
  • Multilingual Query Expansion: Automatic FR↔EN translation for academic terms (e.g., "primary sources" ↔ "sources primaires")
  • Exact Match Boosting: Priority for exact keyword matches

Topic Modeling (Optional)

  • BERTopic Integration: Identify main themes in your corpus
  • Topic Timeline: Visualize theme evolution over time
  • Python Environment: Isolated Python venv for dependencies
  • Optional Feature: Install only if needed

Corpus Analysis

Knowledge Graph

  • Document Network: Visualize relationships between documents
  • Citation Links: Track internal citations within your corpus
  • Similarity Edges: Connect semantically similar documents
  • Community Detection: Identify document clusters (Louvain algorithm)
  • Interactive Exploration: ForceAtlas2 layout with zoom and pan

Textometrics

  • Word Frequencies: Most common words (stopwords removed)
  • N-grams: Bigrams and trigrams analysis
  • Lexical Richness: Type-Token Ratio and vocabulary metrics
  • TF-IDF: Characteristic words identification

Similarity Finder

  • Document Comparison: Compare your text with indexed sources
  • Segment Analysis: Analyze by section, paragraph, or sentence
  • Recommendations: Get relevant source suggestions for each segment
  • Smart Cache: Hash-based caching for performance

Document Editing

Milkdown Editor

  • WYSIWYG Markdown: Visual editing with full markdown support (Milkdown)
  • Citation Autocomplete: Type @ to insert citations from bibliography
  • Footnotes: Visual styling for footnotes in both dark and light themes
  • Live Preview: See formatted output as you type
  • Auto-save: Periodic saving with draft recovery
  • Keyboard Shortcuts: Standard formatting shortcuts (Ctrl+B, Ctrl+I, etc.)

Export Options

  • PDF Export: Generate professional PDFs via Pandoc/LaTeX
  • Word Export: Export to .docx format with template support

Research Journal

  • Session Tracking: Track research sessions and activities
  • Chat History: Review past conversations with the AI assistant
  • Timeline View: Visualize activity over time
  • Context Recovery: Resume previous conversations
  • Date/Time Display: Sessions show both date and time
  • Filter Empty Sessions: Hide sessions without activity

Project Management

  • Project Types: Article, Book, or Presentation projects
  • Recent Projects: Quick access to recently opened projects
  • Project Settings: CSL styles, export options
  • Database Actions: Purge, rebuild, and optimize project database
  • Per-project Configuration: Independent settings for each project

User Interface

Themes

  • Dark/Light Mode: Toggle between themes
  • Auto Theme: Automatic switching based on time of day
  • Consistent Styling: All components adapt to selected theme

Internationalization

  • Languages: French, English, German
  • Auto-detection: Detects system language on first launch
  • Menu Translations: Complete localization including menus

Technical Features

Vector Database

  • HNSW Index: Fast approximate nearest neighbor search (hnswlib-node)
  • BM25 Index: Sparse search for keywords (natural.js)
  • SQLite Storage: Persistent storage for chunks and metadata
  • Separate Stores: Independent databases for PDFs and primary sources

Chunking Strategies

Strategy Chunk Size Overlap Use Case
cpuOptimized 300 words 50 Modest machines (8GB RAM)
standard 500 words 75 Balanced performance
large 800 words 100 Maximum precision (16GB+ RAM)

RAG Configuration

Parameter Default Description
topK 10 Number of chunks to retrieve
similarityThreshold 0.12 Minimum score (RRF-optimized)
useHybridSearch true Combine HNSW + BM25
enableQualityFiltering true Filter low-quality chunks
enableDeduplication true Remove duplicate chunks

Configuration

  • Settings Panel: Centralized configuration for all features
  • Per-project Settings: Some settings (like database actions) are project-specific
  • Persistent Storage: Settings saved via Electron Store

For detailed technical documentation on specific features, see the individual feature documentation files.

Clone this wiki locally