diff --git a/README.md b/README.md index 43b7a4e..caebd3d 100644 --- a/README.md +++ b/README.md @@ -95,7 +95,7 @@ Evaluation datasets should contain the following fields: - **trafilatura**: trafilatura extractor - **resiliparse**: resiliparse extractor -- **llm-webkit**: llm-webkit extractor +- **mineru-html**: mineru-html extractor - **magic-html**: magic-html extractor - **Custom extractors**: Implement by inheriting from `BaseExtractor` @@ -103,7 +103,7 @@ Evaluation datasets should contain the following fields: | extractor | extractor_version | dataset | total_samples | overall (macro avg) | code_edit | formula_edit | table_TEDS | table_edit | text_edit | |-----------|-------------------|---------|---------------|---------------------|-----------|--------------|------------|-----------|-----------| -| llm-webkit | 4.1.1 | WebMainBench1.0 | 545 | 0.8256 | 0.9093 | 0.9399 | 0.7388 | 0.678 | 0.8621 | +| mineru-html | 4.1.1 | WebMainBench1.0 | 545 | 0.8256 | 0.9093 | 0.9399 | 0.7388 | 0.678 | 0.8621 | | magic-html | 0.1.5 | WebMainBench1.0 | 545 | 0.5141 | 0.4117 | 0.7204 | 0.3984 | 0.2611 | 0.7791 | | trafilatura_md | 2.0.0 | WebMainBench1.0 | 545 | 0.3858 | 0.1305 | 0.6242 | 0.3203 | 0.1653 | 0.6887 | | trafilatura_txt | 2.0.0 | WebMainBench1.0 | 545 | 0.2657 | 0 | 0.6162 | 0 | 0 | 0.7126 | diff --git a/README_zh.md b/README_zh.md index e2d4563..3e7c298 100644 --- a/README_zh.md +++ b/README_zh.md @@ -93,7 +93,7 @@ print(f"Overall Score: {result.overall_metrics['overall']:.4f}") - **trafilatura**: trafilatura抽取器 - **resiliparse**: resiliparse抽取器 -- **llm-webkit**: llm-webkit 抽取器 +- **mineru-html**: mineru-html 抽取器 - **magic-html**: magic-html 抽取器 - **自定义抽取器**: 通过继承 `BaseExtractor` 实现 @@ -101,7 +101,7 @@ print(f"Overall Score: {result.overall_metrics['overall']:.4f}") | extractor | extractor_version | dataset | total_samples | overall(macro avg) | code_edit | formula_edit | table_TEDS | table_edit | text_edit | |-----------|-------------------|---------|---------------|---------------------|-----------|--------------|------------|-----------|-----------| -| llm-webkit | 4.1.1 | WebMainBench1.0 | 545 | 0.8256 | 0.9093 | 0.9399 | 0.7388 | 0.678 | 0.8621 | +| mineru-html | 4.1.1 | WebMainBench1.0 | 545 | 0.8256 | 0.9093 | 0.9399 | 0.7388 | 0.678 | 0.8621 | | magic-html | 0.1.5 | WebMainBench1.0 | 545 | 0.5141 | 0.4117 | 0.7204 | 0.3984 | 0.2611 | 0.7791 | | trafilatura_md | 2.0.0 | WebMainBench1.0 | 545 | 0.3858 | 0.1305 | 0.6242 | 0.3203 | 0.1653 | 0.6887 | | trafilatura_txt | 2.0.0 | WebMainBench1.0 | 545 | 0.2657 | 0 | 0.6162 | 0 | 0 | 0.7126 |