Releases: docling-project/docling-eval
Releases · docling-project/docling-eval
v1.0.1
v1.0.0
Feature
- Parallelize the evaluation of tables and cache the loading of external predictions (#190) (
9d04a56) - Regression tests for CVAT to Docling conversion (#193) (
8a10188) - CVAT box rotation support, structural cleanup (#191) (
db068e9) - Improvements in user experience: Performance, error handling, logging (#189) (
a850784) - Visualizer tool and command for datasets (#186) (
373f959) - Extend the evaluators to support external predictions stored in files (#185) (
53dbd95) - Convert Docling JSON inputs to image streams in FileDatasetBuilder (#184) (
15888fd) - Allow subset to split routing in CVAT to HF exporter (#182) (
ebb8800) - Ingest CVAT assets and filter submissions (#180) (
b55b2ea) - Runtime optimizations for MultiLabelConfusionMatrix (#175) (
5084a4d) - Add more fine-grained control in the DoclingEvalCOCOExporter (#149) (
8f33420) - Remove legacy CvatDatasetBuilder code, use modernized code (#174) (
693c224) - Introduce the PixelLayoutEvaluator to produce confusion matrices for the multi-label layout analysis (#173) (
a79bac5) - Review-bundle builder, fixes for GraphCell with merged elements and more (#172) (
21341ce)
Fix
- Correct import path for TableStructureModel (#199) (
a7e74a3) - Fix the reporting of doc_id, true_md, pred_md in markdown_text_evaluator.py (#196) (
3ce7591) - PixelLayoutEvaluator: Set all-pixels background in case of a missing prediction and evaluate (#183) (
4314091) - Fix empty prediction handling in markdown evaluator (#177) (
9b6df83) - Consistenty and perf improvements (#171) (
8fb3a16)
Breaking
- CvatDatasetBuilder now requires modern CVAT folder structure and uses convert_cvat_folder_to_docling() internally. (
693c224)
v0.10.0
Feature
- Extend the CLI for create-eval to receive the vlm-options and max_new_tokens parameters when the provider is GraniteDocling (#164) (
8be2e83) - Harmonizing pic classes for cvat to docling conversion (#167) (
740157d) - Add more specific validation for reading-order, enhance validation report (
5e5f2db) - Integrate textline_cells based OCR evaluation (#156) (
3a9543c)
Fix
- Validation fixes for list item impurity check (#169) (
74e7b3e) - Don't report content-layer group violation multiple times (
cb71009) - Handle merged elements regarding inclusion, don't flag single element pages (
c10fdfd) - Missing transform to storage_scale for some items and table cells (
1eb6b4e) - More CVAT validation and docling conversion fixes (#163) (
6f59c7a) - Better control over scaling in CVAT transform, fixes for OCR (#162) (
ef17b5a) - Fixes for CVAT validation, OCR in CVAT pipeline, logging, and more (#161) (
80e449d)
Performance
v0.9.0
v0.8.1
v0.8.0
What's Changed
- feat: Extend the Consolidator to export Latex files alongside the excel report by @nikos-livathinos in #143
- feat: Extend the DoclingEvalCOCOExporter to export a parquet dataset in COCO format by @nikos-livathinos in #145
- feat: Several fixes and campaign tools extensions by @cau-git in #150
- feat: Add Table structure evaluations for TEDS by @praveenmidde in #94
Full Changelog: v0.7.0...v0.8.0
v0.7.0
v0.6.0
Feature
- Layout evaluation fixes, mode control and cleanup (#133) (
629a451) - Introduce utility to export layout predictions from HF parquet files into pycocotools format. (#125) (
54f7c81) - Add specific language support for XFUND dataset builder (#122) (
4ca6a0e) - Tooling for CVAT validation, to DoclingDocument transformation, new Evaluators (#119) (
2ee1104)
Fix
- Move ibm-cos to hyperscaler (#135) (
9aff6c1) - Update hyperscalers to support multiple image file types (#118) (
a34f264) - Misc fixes (#131) (
518e1ba) - CVAT to DoclingDoc: Ensure that nested list handling works across page boundaries (#129) (
1b58377) - Important fixes for parquet serialization / deserialization, optimizations (#128) (
53c22ef) - Fixes for the dataset visualizers (#127) (
a127ea9)
Performance
v0.5.0
v0.4.0
Feature
- Extend the FileProvider and the CLI to accept parameters that control the source of the prediction images (#111) (
42e1615) - Improvements for the MultiEvaluator (#95) (
04fe2d9) - Add extra args for docling-provider and default annotations for CVAT (#98) (
7903b6a) - Introduce SegmentedPage for OCR (#91) (
be0ff6a) - Update CVAT for multi-page annotation, utility to create sliced PDFs (#90) (
28d166d) - Add area level f1 (#86) (
54d013b)
Fix
- Small fixes (#108) (
0628fa6) - Layout text not correctly populated in AWS prediction provider, add tests (#100) (
6441688) - Dataset feature spec fixes, cvat improvements (#97) (
b79dd19) - Update boto3 AWS client to accept service credentials (#88) (
4e01d0b) - Handle unsupported END2END evaluation and fix variable name in OCR (#87) (
75311da) - Propagate cvat parameters (#82) (
1e2040a)