-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Overview
Implement parser for Rich Text Format (RTF) documents.
Parent Epic
Part of #91 - Document & Office Format Awareness
Description
Parse RTF control words and text to extract document content and metadata.
Implementation Details
- RTF is text-based but highly structured
- Parse control words (\keyword)
- Extract plain text content
- Handle encoding ('hh hex escapes)
- Parse document info group (\info)
- Skip binary embedded objects (\bin)
String Sources
- Document info (title, author, subject)
- Plain text content
- Font names (\fonttbl)
- Style names (\stylesheet)
- Hyperlinks
Acceptance Criteria
- Parse RTF structure
- Extract document info
- Extract plain text
- Handle various encodings
- Skip binary embedded data
- Tests with RTF files
Related
Project: #76
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request