-
-
Notifications
You must be signed in to change notification settings - Fork 67
Fix .org files misdetected as Lotus Organizer binary format #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Discovered while working on emacs-mcp project: Apache Tika incorrectly detects Emacs Org-mode files (.org) as application/vnd.lotus-organizer (Lotus Organizer format), causing read_file to reject them as unsupported binary files. This fix adds extension-based text file detection that runs before MIME type checking: - text-file-extensions: known text extensions (.org, .md, .rst, etc.) - text-file-names: dotfiles/special files (Makefile, .gitignore, etc.) - text-extension?: checks both extension and filename - text-file?: now checks extension first, then falls back to MIME Fixes the error: "File read not supported for `/path/file.org` with mime-type `application/vnd.lotus-organizer`" Co-Authored-By: Pedro Gomes Branquinho <pedrogbranquinho@gmail.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR enhances text file detection by introducing extension and filename-based recognition that precedes MIME-type checking. New utility functions extract file metadata from paths, and curated collections of known text extensions and filenames enable robust file classification independent of system MIME heuristics. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (2)**/*.clj📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*_test.clj📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (3)📚 Learning: 2025-12-07T23:16:26.445ZApplied to files:
📚 Learning: 2025-12-27T06:54:07.157ZApplied to files:
📚 Learning: 2025-12-07T23:16:26.445ZApplied to files:
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Before/After ExamplesBefore (error)After (success)Root causeApache Tika's MIME detection incorrectly identifies Emacs Org-mode files ( SolutionAdded extension-based text file detection that runs before MIME type checking, so known text extensions like |
Summary
application/vnd.lotus-organizer(Lotus Organizer format)Changes
text-file-extensionsset for known text file extensions that Tika may misdetecttext-file-namesset for dotfiles/special files without traditional extensions (Makefile, .gitignore, etc.)text-file?to check extension first before falling back to MIME type detectionTest plan
get-filename,get-file-extension,text-extension?.orgfile detectionCo-Authored-By: Pedro Gomes Branquinho pedrogbranquinho@gmail.com
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.