-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Documents are fully re-parsed every time they are opened, even when the same file has already been loaded in the current session. This results in unnecessary work for PDF/DOCX/PPTX parsing and slows down the overall processing pipeline.
Details
- The app does not cache parsed document state (tokens, metadata, chunking config, etc.).
- Loading a file triggers a full parse regardless of file identity or modification time.
- Re-loading large files significantly increases processing time and blocks the UI during parsing.
Expected Behavior
If a document with the same absolute path (or hash) has already been parsed, the existing parsed model should be reused unless the underlying file has changed.
Proposed Direction
- Introduce a document cache keyed by path or checksum.
- Store parsed content + metadata + chunking settings in memory for the session.
- Add basic change detection (mtime or hash comparison) before invalidating the cache.
Metadata
Metadata
Assignees
Labels
No labels