# ClioDeck Features **Version**: 1.0.0-rc.1 This document provides an overview of ClioDeck's main features. --- ## Bibliography Management ### Zotero Integration ClioDeck integrates seamlessly with Zotero for bibliography management: - **Import from Zotero**: Connect to your Zotero library and import collections with a single click - **Bidirectional Sync**: Detect changes (additions, modifications, deletions) between local and Zotero bibliographies - **Conflict Resolution**: Three strategies - Remote Wins, Local Wins, or Manual selection - **PDF Download**: Automatically download PDFs from Zotero attachments - **Collection Selection**: Choose which Zotero collection to sync - **Group Support**: Access Zotero group libraries ### BibTeX Support - **Import**: Load existing `.bib` files - **Export**: Export bibliography to BibTeX format with all metadata preserved - **Round-trip**: Full preservation of custom fields, tags, and notes during import/export ### PDF Management - **Automatic Indexing**: Index PDFs for semantic search using RAG - **Batch Download**: Download all missing PDFs from Zotero - **Orphan Detection**: Find and clean up PDFs not linked to any citation - **Re-indexation**: Detect modified PDFs and propose re-indexing - **Archive Option**: Safely move orphan PDFs to archive folder instead of deleting ### Tags and Metadata - **Custom Tags**: Organize citations with user-defined tags - **Tag Filtering**: Filter bibliography by one or more tags - **Custom Fields**: Store additional metadata not covered by BibTeX - **Notes**: Add personal notes to citations - **Date Tracking**: Automatic timestamps for added/modified citations ### Statistics Dashboard Interactive statistics with 4 tabs: - **Overview**: Total counts, year range, PDF coverage, publication types - **Authors**: Top 15 authors, collaboration metrics, publication years - **Publications**: Top journals, yearly distribution histogram - **Timeline**: Cumulative and annual publication trends --- ## Primary Sources (Tropy Integration) ClioDeck integrates with [Tropy](https://tropy.org/) for managing primary sources: - **Import Tropy Projects**: Read `.tropy` packages and `.tpy` databases - **Metadata Sync**: Import title, date, creator, archive, collection, tags - **Transcription Support**: Import transcriptions from Tropy notes, OCR (Tesseract), or Transkribus - **Unified RAG**: Search both secondary sources (PDFs) and primary sources (Tropy) together - **Auto-sync**: Detect changes in Tropy files and propose re-synchronization - **OCR Pipeline**: Built-in Tesseract.js for images without transcription --- ## AI-Powered Research Assistant ### RAG (Retrieval-Augmented Generation) - **Semantic Search**: Query your indexed PDFs and primary sources using natural language - **Context Retrieval**: Automatically retrieves relevant passages from your corpus - **Source Citations**: Every answer includes references to source documents - **Multi-source Search**: Combines results from PDFs and Tropy sources - **Configurable Parameters**: Adjust topK, similarity threshold, chunking strategy - **Query Embedding Cache**: Optimized performance with LRU cache (500 entries, 60min TTL) - **Context Compression**: Automatic compression when context exceeds LLM limits - **RAG Explanation**: Transparency about retrieval process (chunks used, compression ratio, source types) - **Stream Cancellation**: Cancel ongoing generation at any time ### LLM Integration - **Ollama Support**: Use local LLMs via Ollama (default: gemma2:2b) - **Embedded LLM**: Download and run models directly (Qwen2.5-0.5B, 1.5B) for offline use - **Auto Fallback**: Automatically switches between Ollama and embedded model - **Claude/OpenAI**: Connect to cloud LLM providers (optional) - **System Prompts**: Customizable system prompts in French, English, and German ### Hybrid Search - **HNSW Index**: Fast approximate nearest neighbor search (~15ms for 50k chunks) - **BM25 Search**: Keyword-based search for proper nouns and acronyms - **RRF Fusion**: Reciprocal Rank Fusion combining both approaches (60% dense / 40% sparse) - **Multilingual Query Expansion**: Automatic FR↔EN translation for academic terms (e.g., "primary sources" ↔ "sources primaires") - **Exact Match Boosting**: Priority for exact keyword matches ### Topic Modeling (Optional) - **BERTopic Integration**: Identify main themes in your corpus - **Topic Timeline**: Visualize theme evolution over time - **Python Environment**: Isolated Python venv for dependencies - **Optional Feature**: Install only if needed --- ## Corpus Analysis ### Knowledge Graph - **Document Network**: Visualize relationships between documents - **Citation Links**: Track internal citations within your corpus - **Similarity Edges**: Connect semantically similar documents - **Community Detection**: Identify document clusters (Louvain algorithm) - **Interactive Exploration**: ForceAtlas2 layout with zoom and pan ### Textometrics - **Word Frequencies**: Most common words (stopwords removed) - **N-grams**: Bigrams and trigrams analysis - **Lexical Richness**: Type-Token Ratio and vocabulary metrics - **TF-IDF**: Characteristic words identification ### Similarity Finder - **Document Comparison**: Compare your text with indexed sources - **Segment Analysis**: Analyze by section, paragraph, or sentence - **Recommendations**: Get relevant source suggestions for each segment - **Smart Cache**: Hash-based caching for performance --- ## Document Editing ### Milkdown Editor - **WYSIWYG Markdown**: Visual editing with full markdown support (Milkdown) - **Citation Autocomplete**: Type `@` to insert citations from bibliography - **Footnotes**: Visual styling for footnotes in both dark and light themes - **Live Preview**: See formatted output as you type - **Auto-save**: Periodic saving with draft recovery - **Keyboard Shortcuts**: Standard formatting shortcuts (Ctrl+B, Ctrl+I, etc.) ### Export Options - **PDF Export**: Generate professional PDFs via Pandoc/LaTeX - **Word Export**: Export to .docx format with template support --- ## Research Journal - **Session Tracking**: Track research sessions and activities - **Chat History**: Review past conversations with the AI assistant - **Timeline View**: Visualize activity over time - **Context Recovery**: Resume previous conversations - **Date/Time Display**: Sessions show both date and time - **Filter Empty Sessions**: Hide sessions without activity --- ## Project Management - **Project Types**: Article, Book, or Presentation projects - **Recent Projects**: Quick access to recently opened projects - **Project Settings**: CSL styles, export options - **Database Actions**: Purge, rebuild, and optimize project database - **Per-project Configuration**: Independent settings for each project --- ## User Interface ### Themes - **Dark/Light Mode**: Toggle between themes - **Auto Theme**: Automatic switching based on time of day - **Consistent Styling**: All components adapt to selected theme ### Internationalization - **Languages**: French, English, German - **Auto-detection**: Detects system language on first launch - **Menu Translations**: Complete localization including menus --- ## Technical Features ### Vector Database - **HNSW Index**: Fast approximate nearest neighbor search (hnswlib-node) - **BM25 Index**: Sparse search for keywords (natural.js) - **SQLite Storage**: Persistent storage for chunks and metadata - **Separate Stores**: Independent databases for PDFs and primary sources ### Chunking Strategies | Strategy | Chunk Size | Overlap | Use Case | |----------|-----------|---------|----------| | **cpuOptimized** | 300 words | 50 | Modest machines (8GB RAM) | | **standard** | 500 words | 75 | Balanced performance | | **large** | 800 words | 100 | Maximum precision (16GB+ RAM) | ### RAG Configuration | Parameter | Default | Description | |-----------|---------|-------------| | **topK** | 10 | Number of chunks to retrieve | | **similarityThreshold** | 0.12 | Minimum score (RRF-optimized) | | **useHybridSearch** | true | Combine HNSW + BM25 | | **enableQualityFiltering** | true | Filter low-quality chunks | | **enableDeduplication** | true | Remove duplicate chunks | ### Configuration - **Settings Panel**: Centralized configuration for all features - **Per-project Settings**: Some settings (like database actions) are project-specific - **Persistent Storage**: Settings saved via Electron Store --- For detailed technical documentation on specific features, see the individual feature documentation files.