Skip to content

HTML Entities are being decoded, possibly leading to html tags being returned #109

@Wertisdk

Description

@Wertisdk

When passing a string with HTML Entities, the convert method will decode these and return them as-is.
This can lead to some unintentional behaviour if you have used htmlentities to encode text with legitimate html code converted to "string" / HTML Entities.

See this test:

use PHPUnit\Framework\TestCase;
use Soundasleep\Html2Text;

class Html2TextTest extends TestCase
{
  public function testKeepsHtmlEntities()
  {
      $html = '<p>Test <b>bold</b> &lt;script&gt;alert(&quot;nope!&quot;)&lt;/script&gt; <script>alert(\'text\')</script> <i>italic</i> <u>underline</u> <s>strikethrough</s> <a href="http://example.com">link</a></p>';
      $expected = "Test bold &lt;script&gt;alert(&quot;example!&quot;)&lt;/script&gt;  italic underline strikethrough [link](http://example.com)";
  
      $this->assertEquals($expected, Html2Text::convert($html));
  }
}

Output:

Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'Test bold &lt;script&gt;alert(&quot;example!&quot;)&lt;/script&gt;  italic underline strikethrough [link](http://example.com)'
+'Test bold <script>alert("nope!")</script>  italic underline strikethrough [link](http://example.com)'

I believe this is unintentional behaviour, because we end up with raw HTML code - the exact opposite of the intention when calling convert.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions