-
Notifications
You must be signed in to change notification settings - Fork 147
Open
Description
High prio:
- address key value regions (consolidate
FormItem/KeyValueItem(KEY_VALUE_REGION) (feat: introduce field data model incl. Doclang serialization #519) - align w.r.t. tokenizer
- fix table cell content deserialization (fix(DocLang): fix table cell
<content>deserialization #512) - fix document re-indexing (fix: fix document re-indexing #510)
- fix DocTags picture meta deserialization (fix(DocTags): fix deserialization to populate picture meta fields #505)
- review checkboxes (fix(Doclang): fix checkbox serialization #503)
- review chart serialization (compared to table serialization) (test(Doclang): add chart serialization test #498)
- align handling of default resolution (location) (fix(IDocTags): fix default location resolution handling #492)
- rich tables / nested tables (feat(IDocTags): add rich table support #491)
- leading / trailing whitespaces (feat(IDocTags): add content wrapping for handling whitespace #489)
- finalize outmost element name (chore: rename IDocTags to Doclang #494)
Medium:
- content layer (e.g. furniture) not captured in Doclang (feat(Doclang): add content layer support #568)
- map HANDWRITTEN label to new Doclang element (similar to bold, italics) (feat: Add HANDWRITTEN_TEXT label support #561)
- include allowed layer values to standard
- govern picture classes (currently
str) - alternative export mode where KVs are shown as just text
- inline code issue: conflict between
<inline class="code">and<code class="Python">(chore(Doclang): removeinlineelement #517) - support extended bounding polygons (beyond non-rotated rectangles) (see feat: extend bounding box #348)
- (on draft level, could just be by generalizing from 4=22 location tokens (interpreted as rectangle) to N2 tokens for any arbitrary polygon with N vertices)
- use defusedxml? (fix: switch xml parsing #509)
- align image ref mode on Doclang serialization (fix(Doclang): align image mode, defaulting to placeholder #506)
- image URI serialization (fix(Doclang): fix image URI serialization #504)
- track CVAT test output (fix: populate picture meta, track Doclang output docling-cvat-tools#8)
Low:
-
add radio button support(update: not for now) - XML Schema / DTD?
- page breaks
- line breaks
- check formatting:hyperlink
- list languages in draft (as subset of Linguist vX.Y.Z)
- extend list to support e.g. Markdown?
- define clearly how non-programming log/console-like content is to be captured (e.g.
codewith unset language orShell?)
- cross-provenance content
- cross-page content
- sync elements between IDocTags & draft (e.g.
content) - converge between
<content xml:space="preserve">and {<content>+ other way of signalizing whitespace preservation e.g. viaATTLIST, XML schema, convention etc.} - clarify dropping of
include_formattingserializer switch - update draft: e.g.
contentelement,base64element, ...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels