Skip to content

fix bugs: stabilize htmlContent by normalizing tableRows whitespace/boundary events#1017

Open
chihyu0917 wants to merge 1 commit intoapache:masterfrom
chihyu0917:fix-testHtmlContent
Open

fix bugs: stabilize htmlContent by normalizing tableRows whitespace/boundary events#1017
chihyu0917 wants to merge 1 commit intoapache:masterfrom
chihyu0917:fix-testHtmlContent

Conversation

@chihyu0917
Copy link

Motivation

MarkdownParserTest#testHtmlContent was brittle around HTML tables.
Inside <table><tbody>…</tbody></table>, the event stream intermittently included extra "text" whitespace nodes and "unknown" nodes (tbody boundary markers), which caused order-sensitive assertions to fail (e.g., expecting text but seeing tableRow / tableHeaderCell_). This is environmental/renderer dependent and shows up under different JDKs/runners and with NonDex.

Design / Implementation

  • Replace the large positional assertSinkEquals(...) block for this test with a compact normalization loop:
    • Iterate the emitted events.
    • Track when we are inside tableRowstableRows_.
    • While inside that region, skip intermittent "text" and "unknown" events.
    • Compare the remaining sequence with a concise exp list.
      Scope is only htmlContent; no production code touched, no new imports or dependencies.

Reproduce the error

  • Error message: org.apache.maven.doxia.module.markdown.MarkdownParserTest.htmlContent -- Time elapsed: 0.869 s <<< FAILURE! org.opentest4j.AssertionFailedError
  • Reproduce: mvn -pl doxia-modules/doxia-module-markdown edu.illinois:nondex-maven-plugin:2.1.7:nondex -Dtest=org.apache.maven.doxia.module.markdown.MarkdownParserTest#htmlContent

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Your pull request should address just one issue, without pulling in other changes.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.
    Note that commits might be squashed by a maintainer on merge.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied.
    This may not always be possible but is a best-practice.
  • Run mvn verify to make sure basic checks pass.
    A more thorough check will be performed on your pull request automatically.
  • You have run the integration tests successfully (mvn -Prun-its verify).

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@chihyu0917 chihyu0917 changed the title test(markdown): stabilize htmlContent by normalizing tableRows whitespace/boundary events fix bugs: stabilize htmlContent by normalizing tableRows whitespace/boundary events Nov 8, 2025
@chihyu0917
Copy link
Author

I rerun mvn spotless:apply to ensure not failing in CI

@kwin
Copy link
Member

kwin commented Nov 28, 2025

@chihyu0917 Where exactly does it fail? At least in our CI it succeeds reproducibly: https://github.com/apache/maven-doxia/actions/runs/19250318647...

@chihyu0917
Copy link
Author

mvn verify or mvn test passes because it runs with a deterministic order. I used NonDex, a maven tool to shuffle in random seed to find bugs in non-deterministic order. It failed in seed=933178, 974622. Also, I provided the error message in this thread. Likely cause: iteration over HTML table children (or attributes) that relies on unspecified order, so NonDex shuffling exposes it. Some renderers also emit intermittent whitespace as text and markers as unknown, which changes the event stream around tableRows.
My env:
Apache Maven 3.8.7
Maven home: /usr/share/maven
Java version: 1.8.0_472, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "6.8.0-88-generic", arch: "amd64", family: "unix"

org.apache.maven.doxia.module.markdown.MarkdownParserTest.testHtmlContent -- Time elapsed: 0.850 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 
expected: <head
head_
body
division
text
paragraph
inline
text
inline_
text
inline
text
inline_
text
paragraph_
text
division_
text
horizontalRule
section1
sectionTitle1
text
sectionTitle1_
paragraph
text
paragraph_
text
table
tableRows
text
unknown
tableRow
tableHeaderCell
text
tableHeaderCell_
tableRow_
text
tableRow
tableCell
text
tableCell_
tableRow_
text
unknown
tableRows_
table_
text
section1_
body_
> but was: <head
head_
body
division
text
paragraph
inline
text
inline_
text
inline
text
inline_
text
paragraph_
text
division_
text
horizontalRule
section1
sectionTitle1
text
sectionTitle1_
paragraph
text
paragraph_
text
table
tableRows
unknown
tableRow
tableHeaderCell
text
tableHeaderCell_
tableRow_
tableRow
tableCell
text
tableCell_
tableRow_
unknown
tableRows_
table_
section1_
body_
>
	at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
	at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
	at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
	at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
	at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
	at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1145)
	at org.apache.maven.doxia.parser.AbstractParserTest.assertSinkEquals(AbstractParserTest.java:305)
	at org.apache.maven.doxia.module.markdown.MarkdownParserTest.testHtmlContent(MarkdownParserTest.java:599)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at java.util.ArrayList.forEach(ArrayList.java:1259)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants