Skip to content

Fix DTD validation: Add required tbody elements to tables with thead#39

Merged
findbhavin merged 4 commits intomainfrom
copilot/fix-table-formatting-dtd-error
Jan 22, 2026
Merged

Fix DTD validation: Add required tbody elements to tables with thead#39
findbhavin merged 4 commits intomainfrom
copilot/fix-table-formatting-dtd-error

Conversation

Copy link
Contributor

Copilot AI commented Jan 22, 2026

Tables with <thead> were failing DTD validation due to missing required <tbody> elements. The error: expecting ((col* | colgroup*), ((thead?, tfoot?, tbody+) | tr+)), got (colgroup thead).

Root Cause

MasterPipeline._post_process_xml() explicitly skipped adding <tbody> for tables with multiple thead rows to avoid empty rows in HTML output, violating JATS DTD requirements.

Changes

  • MasterPipeline.py (lines 980-1011)

    • Removed conditional that skipped tbody addition for multi-row thead tables
    • Always add <tbody> when <thead> exists but no <tbody> present
    • Calculate column count from colgroup or thead and apply colspan to empty tbody cell
  • tests/test_table_and_article_type_fixes.py

    • Updated test_tables_with_thead_have_tbody to verify DTD compliance
    • Revised test_html_tables_no_empty_rows_at_end to accept minimal tbody as DTD-required

Example

Before:

<table>
  <colgroup><col width="50%"/><col width="50%"/></colgroup>
  <thead><tr><th>Header 1</th><th>Header 2</th></tr></thead>
</table>

After:

<table>
  <colgroup><col width="50%"/><col width="50%"/></colgroup>
  <thead><tr><th>Header 1</th><th>Header 2</th></tr></thead>
  <tbody><tr><td colspan="2"></td></tr></tbody>
</table>

Result

All 5 tables in generated XML now conform to DTD structure requirements with appropriate colspan attributes (2, 13, 12, 11, 3 columns respectively).

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

The style checker recently reported a recurring DTD validation error due to incorrect table formatting in XML/HTML files. The error appears at multiple line numbers and is as follows:

error: Element table content does not follow the DTD, expecting ((col* | colgroup*) , ((thead? , tfoot? , tbody+) | tr+)), got (colgroup thead)

This indicates that the

elements contain and but are missing the required element or direct elements after the header row, as specified by the DTD. To fix this, each
that currently has followed by must be updated to also include , ensuring it follows the structure required by the DTD:

.........

Please review all affected files across the main branch where this style checker error is reported and update them accordingly, by adding the necessary wrappers or rearranging content to satisfy the expected DTD structure.

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits January 22, 2026 05:55
Co-authored-by: findbhavin <19400906+findbhavin@users.noreply.github.com>
Co-authored-by: findbhavin <19400906+findbhavin@users.noreply.github.com>
Co-authored-by: findbhavin <19400906+findbhavin@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix DTD validation error for table formatting in XML/HTML files Fix DTD validation: Add required tbody elements to tables with thead Jan 22, 2026
Copilot AI requested a review from findbhavin January 22, 2026 06:03
@findbhavin findbhavin marked this pull request as ready for review January 22, 2026 06:04
@findbhavin findbhavin merged commit 4613078 into main Jan 22, 2026
3 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants