fix: Folder Sync Improvements - Source Path Tracking & LightRAG Fallback #138

ahmedjawedaj · 2026-01-15T20:38:14Z

Summary

This PR fixes folder sync functionality with two key improvements:

1. Fixed New File Detection (Source Path Tracking)

Problem: Files always appeared as "new" after sync because:

detect_folder_changes checks source paths (in linked folder)
synced_files stored destination paths (in KB raw directory)
These never matched, so files were perpetually marked as new

Solution: Now stores source paths in synced_files, matching what detect_folder_changes expects.

2. LightRAG Fallback for RAGAnything

Problem: When RAGAnything module was unavailable, sync would fail with ImportError.

Solution: Gracefully falls back to LightRAG pipeline:

Extracts text from PDF (PyMuPDF), DOCX (python-docx), TXT, MD files
Indexes content via LightRAG for knowledge graph building

Files Changed

src/api/routers/knowledge.py - Added folder_id param and source path tracking
src/knowledge/add_documents.py - Added LightRAG fallback methods

Testing

New file detection works correctly after sync
Progress bar shows proper status
Files no longer appear as perpetually "new"
LightRAG fallback processes documents when RAGAnything unavailable

- Add get_kb_content() to list documents/images - Add /content API endpoint - Fix folder_id parameter in upload task - All tested and working on macOS

## Changes since last push: ### 1. Fixed folder sync state tracking (knowledge.py) - Added 'folder_id' parameter to run_upload_processing_task - Implemented source path tracking: Now stores original source paths from linked folders instead of destination paths in synced_files - This fixes the bug where files always appeared as 'new' after sync because detect_folder_changes checks source paths but synced_files contained destination paths ### 2. Added LightRAG fallback (add_documents.py) - When RAGAnything module is unavailable, gracefully falls back to LightRAG pipeline instead of raising ImportError - New _process_with_lightrag_fallback method handles text extraction and indexing for PDF, DOCX, TXT, and MD files - New _extract_text_content helper for extracting text from various document formats using PyMuPDF and python-docx ### 3. Improved error handling - Added null checks for processed_files in progress messages - Better logging for sync state updates

ahmedjawedaj added 3 commits January 14, 2026 15:03

fix: add missing backend methods for folder sync

cd794f7

- Add get_kb_content() to list documents/images - Add /content API endpoint - Fix folder_id parameter in upload task - All tested and working on macOS

Merge origin/dev into feature/folder-sync-refactored

12dd1a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Folder Sync Improvements - Source Path Tracking & LightRAG Fallback #138

fix: Folder Sync Improvements - Source Path Tracking & LightRAG Fallback #138

ahmedjawedaj commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Folder Sync Improvements - Source Path Tracking & LightRAG Fallback #138

Are you sure you want to change the base?

fix: Folder Sync Improvements - Source Path Tracking & LightRAG Fallback #138

Conversation

ahmedjawedaj commented Jan 15, 2026

Summary

1. Fixed New File Detection (Source Path Tracking)

2. LightRAG Fallback for RAGAnything

Files Changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant