Skip to content

Fixes HTML to plain text conversion... again#593

Merged
idlira merged 7 commits intomainfrom
ili/OX-12298-2
Jan 19, 2026
Merged

Fixes HTML to plain text conversion... again#593
idlira merged 7 commits intomainfrom
ili/OX-12298-2

Conversation

@idlira
Copy link
Contributor

@idlira idlira commented Jan 16, 2026

Description

Note in advance:
we should evaluate using jsoup or some lib to perform these kind of operations. But since that required an analysis (impact and feasibility), I just try to improve the current code. There is no real recognition of things like inline vs block elements.

The HTML code might contain line-breaks. These are not considered for display and the actual br and p tags must be use to really display a line-break.

In addition, HTML reduces spaces. A string like "<b>Hello </b> world" is displayed as "Hello world".

Additional Notes

Checklist

  • Code change has been tested and works locally
  • Code was formatted via IntelliJ and follows SonarLint & best practices

the HTML code might have line breaks.

Fixes: OX-12298
First of all, we drop all line breaks from the original HTML code. These are ignored by any browser on display, so we want to imitate the same behavior.

Later, each line will be additionally trimmed, as leading/training spaces in a row would not be displayed by a browser as well.

Finally, we cannot just trust if the builder is empty, as code starting with line breaks will trick the check.

Fixes: OX-12298
as HTML like `<b>Hello  </b>   world` is rendered with a single space between Hello and world

Fixes: OX-12298
@idlira idlira added the 🐛 Bugfix Contains only a small fix for an existing bug label Jan 16, 2026
@idlira idlira added the 🖐 Keep open Should not be merged label Jan 16, 2026
@idlira idlira removed the 🖐 Keep open Should not be merged label Jan 16, 2026
Copy link
Member

@jakobvogel jakobvogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @idlira!

I would still prefer to do that all via jsoup, particularly the conversion of <p> seems to be strange judging from the tests that require trimming. Also, I would still prefer parameterized tests. But I agree: For now, this is fine.

Extra spaces are dropped later when reduceWhitespace is used

Fixes: OX-12298
@idlira idlira merged commit 88aa88f into main Jan 19, 2026
4 checks passed
@idlira idlira deleted the ili/OX-12298-2 branch January 19, 2026 08:24
assertEquals(
"\nLine 1 Line 2 Line 3",
Strings.cleanup(
"<p>Line 1\nLine 2\nLine 3</p>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, it seems to be legacy behaviour, and we don't need to change this now: But for future extensions, we should consider whether <p>…</p> should really cause a leading newline character in the plain text version. IMHO, this should only be there if there is other content before the <p>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐛 Bugfix Contains only a small fix for an existing bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants