Chromium Embedded Framework Forum

by **djrecipe** » Tue Aug 24, 2021 12:13 am

First of all, I just want to point out that "Where to post what" (viewtopic.php?f=4&t=2) is out of date and should point to https://bitbucket.org/chromiumembedded/cef/issues instead.

Issue
Chrome auto-detection of character encoding only checks first X characters in the html string. If there are UTF-16 characters at the end of a long html string, these characters will not render properly in Chrome/CEF. These characters will render properly if the HTML is shortened.

Example
Given this 180,000 character HTML file: https://pastebin.com/LjtHdDs2 , you may open this in Chrome browser or render via CEF and the Chinese text at the end of the HTML string will be garbled. The Chinese characters will be rendered properly if:
1. Many of the <rect></rect> elements are removed, resulting in a shorter HTML string overall.
2. A single UTF-16 character is added somewhere towards the beginning of the HTML string.
3. <meta charset="utf-16"/> is added at the beginning of the file

Theory
Chrome checks the first X characters of the HTML string to autodetect encoding. This number is somewhere around unsigned short max (65535). If no special chars are found, it defaults to UTF-8 (?)

I'm not sure this is really a "bug" persay, but it is somewhat strange behavior with no warning that can cause confusion for developers.

by **magreenblatt** » Tue Aug 24, 2021 9:49 am

Your analysis is likely correct. Evaluating the whole contents of a large HTML file before parsing/rendering would be bad for performance. I suggest using one of the many available techniques for explicitly specifying the character encoding.

by **djrecipe** » Thu Oct 21, 2021 6:50 am

Yea, in the end we are treating it as something that needs to be communicated/worked-around. Thanks for your input. @magreenblatt

Chromium Embedded Framework Forum

Encoding Autodetection Fails With Large HTML String

Encoding Autodetection Fails With Large HTML String

Re: Encoding Autodetection Fails With Large HTML String

Re: Encoding Autodetection Fails With Large HTML String

Who is online