Redundant Parsing When Re-loading the Same Document

Documents are fully re-parsed every time they are opened, even when the same file has already been loaded in the current session. This results in unnecessary work for PDF/DOCX/PPTX parsing and slows down the overall processing pipeline.

**Details**
- The app does not cache parsed document state (tokens, metadata, chunking config, etc.).
- Loading a file triggers a full parse regardless of file identity or modification time.
- Re-loading large files significantly increases processing time and blocks the UI during parsing.

**Expected Behavior**
If a document with the same absolute path (or hash) has already been parsed, the existing parsed model should be reused unless the underlying file has changed.

**Proposed Direction**
- Introduce a document cache keyed by path or checksum.
- Store parsed content + metadata + chunking settings in memory for the session.
- Add basic change detection (mtime or hash comparison) before invalidating the cache.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant Parsing When Re-loading the Same Document #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Redundant Parsing When Re-loading the Same Document #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions