# ClioDeck Features

**Version**: 1.0.0-rc.1

This document provides an overview of ClioDeck's main features.

---

## Bibliography Management

### Zotero Integration

ClioDeck integrates seamlessly with Zotero for bibliography management:

- **Import from Zotero**: Connect to your Zotero library and import collections with a single click
- **Bidirectional Sync**: Detect changes (additions, modifications, deletions) between local and Zotero bibliographies
- **Conflict Resolution**: Three strategies - Remote Wins, Local Wins, or Manual selection
- **PDF Download**: Automatically download PDFs from Zotero attachments
- **Collection Selection**: Choose which Zotero collection to sync
- **Group Support**: Access Zotero group libraries

### BibTeX Support

- **Import**: Load existing `.bib` files
- **Export**: Export bibliography to BibTeX format with all metadata preserved
- **Round-trip**: Full preservation of custom fields, tags, and notes during import/export

### PDF Management

- **Automatic Indexing**: Index PDFs for semantic search using RAG
- **Batch Download**: Download all missing PDFs from Zotero
- **Orphan Detection**: Find and clean up PDFs not linked to any citation
- **Re-indexation**: Detect modified PDFs and propose re-indexing
- **Archive Option**: Safely move orphan PDFs to archive folder instead of deleting

### Tags and Metadata

- **Custom Tags**: Organize citations with user-defined tags
- **Tag Filtering**: Filter bibliography by one or more tags
- **Custom Fields**: Store additional metadata not covered by BibTeX
- **Notes**: Add personal notes to citations
- **Date Tracking**: Automatic timestamps for added/modified citations

### Statistics Dashboard

Interactive statistics with 4 tabs:

- **Overview**: Total counts, year range, PDF coverage, publication types
- **Authors**: Top 15 authors, collaboration metrics, publication years
- **Publications**: Top journals, yearly distribution histogram
- **Timeline**: Cumulative and annual publication trends

---

## Primary Sources (Tropy Integration)

ClioDeck integrates with [Tropy](https://tropy.org/) for managing primary sources:

- **Import Tropy Projects**: Read `.tropy` packages and `.tpy` databases
- **Metadata Sync**: Import title, date, creator, archive, collection, tags
- **Transcription Support**: Import transcriptions from Tropy notes, OCR (Tesseract), or Transkribus
- **Unified RAG**: Search both secondary sources (PDFs) and primary sources (Tropy) together
- **Auto-sync**: Detect changes in Tropy files and propose re-synchronization
- **OCR Pipeline**: Built-in Tesseract.js for images without transcription

---

## AI-Powered Research Assistant

### RAG (Retrieval-Augmented Generation)

- **Semantic Search**: Query your indexed PDFs and primary sources using natural language
- **Context Retrieval**: Automatically retrieves relevant passages from your corpus
- **Source Citations**: Every answer includes references to source documents
- **Multi-source Search**: Combines results from PDFs and Tropy sources
- **Configurable Parameters**: Adjust topK, similarity threshold, chunking strategy
- **Query Embedding Cache**: Optimized performance with LRU cache (500 entries, 60min TTL)
- **Context Compression**: Automatic compression when context exceeds LLM limits
- **RAG Explanation**: Transparency about retrieval process (chunks used, compression ratio, source types)
- **Stream Cancellation**: Cancel ongoing generation at any time

### LLM Integration

- **Ollama Support**: Use local LLMs via Ollama (default: gemma2:2b)
- **Embedded LLM**: Download and run models directly (Qwen2.5-0.5B, 1.5B) for offline use
- **Auto Fallback**: Automatically switches between Ollama and embedded model
- **Claude/OpenAI**: Connect to cloud LLM providers (optional)
- **System Prompts**: Customizable system prompts in French, English, and German

### Hybrid Search

- **HNSW Index**: Fast approximate nearest neighbor search (~15ms for 50k chunks)
- **BM25 Search**: Keyword-based search for proper nouns and acronyms
- **RRF Fusion**: Reciprocal Rank Fusion combining both approaches (60% dense / 40% sparse)
- **Multilingual Query Expansion**: Automatic FR↔EN translation for academic terms (e.g., "primary sources" ↔ "sources primaires")
- **Exact Match Boosting**: Priority for exact keyword matches

### Topic Modeling (Optional)

- **BERTopic Integration**: Identify main themes in your corpus
- **Topic Timeline**: Visualize theme evolution over time
- **Python Environment**: Isolated Python venv for dependencies
- **Optional Feature**: Install only if needed

---

## Corpus Analysis

### Knowledge Graph

- **Document Network**: Visualize relationships between documents
- **Citation Links**: Track internal citations within your corpus
- **Similarity Edges**: Connect semantically similar documents
- **Community Detection**: Identify document clusters (Louvain algorithm)
- **Interactive Exploration**: ForceAtlas2 layout with zoom and pan

### Textometrics

- **Word Frequencies**: Most common words (stopwords removed)
- **N-grams**: Bigrams and trigrams analysis
- **Lexical Richness**: Type-Token Ratio and vocabulary metrics
- **TF-IDF**: Characteristic words identification

### Similarity Finder

- **Document Comparison**: Compare your text with indexed sources
- **Segment Analysis**: Analyze by section, paragraph, or sentence
- **Recommendations**: Get relevant source suggestions for each segment
- **Smart Cache**: Hash-based caching for performance

---

## Document Editing

### Milkdown Editor

- **WYSIWYG Markdown**: Visual editing with full markdown support (Milkdown)
- **Citation Autocomplete**: Type `@` to insert citations from bibliography
- **Footnotes**: Visual styling for footnotes in both dark and light themes
- **Live Preview**: See formatted output as you type
- **Auto-save**: Periodic saving with draft recovery
- **Keyboard Shortcuts**: Standard formatting shortcuts (Ctrl+B, Ctrl+I, etc.)

### Export Options

- **PDF Export**: Generate professional PDFs via Pandoc/LaTeX
- **Word Export**: Export to .docx format with template support

---

## Research Journal

- **Session Tracking**: Track research sessions and activities
- **Chat History**: Review past conversations with the AI assistant
- **Timeline View**: Visualize activity over time
- **Context Recovery**: Resume previous conversations
- **Date/Time Display**: Sessions show both date and time
- **Filter Empty Sessions**: Hide sessions without activity

---

## Project Management

- **Project Types**: Article, Book, or Presentation projects
- **Recent Projects**: Quick access to recently opened projects
- **Project Settings**: CSL styles, export options
- **Database Actions**: Purge, rebuild, and optimize project database
- **Per-project Configuration**: Independent settings for each project

---

## User Interface

### Themes

- **Dark/Light Mode**: Toggle between themes
- **Auto Theme**: Automatic switching based on time of day
- **Consistent Styling**: All components adapt to selected theme

### Internationalization

- **Languages**: French, English, German
- **Auto-detection**: Detects system language on first launch
- **Menu Translations**: Complete localization including menus

---

## Technical Features

### Vector Database

- **HNSW Index**: Fast approximate nearest neighbor search (hnswlib-node)
- **BM25 Index**: Sparse search for keywords (natural.js)
- **SQLite Storage**: Persistent storage for chunks and metadata
- **Separate Stores**: Independent databases for PDFs and primary sources

### Chunking Strategies

| Strategy | Chunk Size | Overlap | Use Case |
|----------|-----------|---------|----------|
| **cpuOptimized** | 300 words | 50 | Modest machines (8GB RAM) |
| **standard** | 500 words | 75 | Balanced performance |
| **large** | 800 words | 100 | Maximum precision (16GB+ RAM) |

### RAG Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| **topK** | 10 | Number of chunks to retrieve |
| **similarityThreshold** | 0.12 | Minimum score (RRF-optimized) |
| **useHybridSearch** | true | Combine HNSW + BM25 |
| **enableQualityFiltering** | true | Filter low-quality chunks |
| **enableDeduplication** | true | Remove duplicate chunks |

### Configuration

- **Settings Panel**: Centralized configuration for all features
- **Per-project Settings**: Some settings (like database actions) are project-specific
- **Persistent Storage**: Settings saved via Electron Store

---

For detailed technical documentation on specific features, see the individual feature documentation files.