I have a fetched page by CURL, what charset is windows-1250, and doctype is
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
I change the encoding of my string, check it, and replace the meta charset in string:
$html = str_replace('windows-1250', 'UTF-8', mb_convert_encoding($result, 'UTF-8')); var_dump(mb_detect_encoding($html, "UTF-8, ASCII, ISO-8859-1, windows-1250")); $Doc = \phpQuery::newDocumentHTML($html, 'UTF-8'); echo pq($Doc)->html();
All the UTF-8 characters are messy. var_dump says, its UTF-8, content-type="text/plain; charset=UTF-8".
When I var_dump($Doc); I see, the DOMDocument encoding and xmlencoding values are nulls.
But if I am using:
$Dom = new \DOMDocument(); $Dom->loadHTML($html);
and var_dump it, then everyhing is fine, the characters are ok.
I've checked the createDocumentWrapper and the $contentType is ok.
If I set the static $debug to true I've get this:
`string 'Load markup for content type text/html;charset=utf-8' (length=52)
string 'Loading HTML, content type 'text/html;charset=utf-8'' (length=52)
string 'Full markup load (HTML):
' (length=275)
string 'DOC: UTF-8 REQ: UTF-8' (length=21)
string 'Full markup load (HTML), documentCreate('utf-8')' (length=48)
string 'Selecting document '52280a0c077ec7c5fb2f2350db12f22c' as default one' (length=68)`