Conversation
including combined extra tags Fixes: OX-12298
Fixes: OX-12298
Fixes: OX-12298
preserving the line breaks converted previously Fixes: OX-12298
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where StringCleanup#htmlToPlainText incorrectly handled line breaks when HTML tags preceded <br /> or <p> tags. The fix processes each line individually to preserve line breaks that would otherwise be removed by the XML stripping regex pattern.
Changes:
- Added line-by-line processing in
htmlToPlainTextto prevent the XML stripping pattern from removing intentional line breaks - Introduced
Strings.iterateLinesutility method to support line-by-line iteration with platform-independent line ending handling - Added test case to verify the fix for HTML with inline tags followed by line breaks
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/main/java/sirius/kernel/commons/StringCleanup.java | Modified htmlToPlainText to process lines individually, preserving converted line breaks |
| src/main/java/sirius/kernel/commons/Strings.java | Added iterateLines utility method for platform-independent line iteration |
| src/test/kotlin/sirius/kernel/commons/StringsTest.kt | Added test case verifying correct handling of inline tags followed by <br /> |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fixes: OX-12298
Fixes: OX-12298
| assertEquals( | ||
| "bold\nmove", | ||
| Strings.cleanup( | ||
| "<strong>bold</strong><br />move", | ||
| StringCleanup::htmlToPlainText | ||
| ) | ||
| ) |
There was a problem hiding this comment.
As mentioned in my other comment: Maybe also add special cases as tests, such as the following one.
<br />
<br />
Hello<br />
World
And I know, the test method was already like that before, but IMHO this is a case for a parameterized test. Right now, the list of unrelated tests in one method gives less reliable statistics if one of the tests fails.
There was a problem hiding this comment.
These exist already:
assertEquals(
"""
first
second
""".trimIndent(),
Strings.cleanup("<p>first<br><br/>second</p>", StringCleanup::htmlToPlainText, StringCleanup::trim)
)
There was a problem hiding this comment.
That test is not equivalent to my example, as it does not contain line breaks, but it includes trimming. Therefore, it does not test the behaviour of your new approach. 🙃
I played around with this a little:
- An extension of your test case by adding a line break:
<strong>bold</strong><br />\nmovegivesbold\n move(with a space in front ofmove) - My example
<br />\n<br />\nHello<br />\nWorldgives\n Hello\n World(with a space at the front of each of the three lines!)
I am not sure whether these really are the results we want. Therefore, I would appreciate to have respective additional test cases defining the gold standard.
There was a problem hiding this comment.
the logic in this class is so broken... will need review and posterior PR
Description
StringCleanup#htmlToPlainTextis expected to convertpandbrtags to line breaks.This works if these tags were not followed by other tags. For example:
➡️ Input:
Hello<br />World✅ Output:
but
➡️ Input:
<strong>Hello</strong><br />World🚫 Output:
HelloWorldThis happens because
StringCleanup.PATTERN_STRIP_XMLwill look for white-spaces leading or trailing a tag, effectively replacing</strong>\nin the example above by an empty string.To not disrupt this pattern, and possible cause a breaking change/behavioral change, which is used in several other places, we iterate and clean each line.
Additional Notes
Checklist