A potential CEF cache corruption scenario
Posted: Fri Jul 03, 2020 11:14 pm
We have a .NET WPF container app in which we host several web apps using CEFSharp.WinForms control. At times, we see that for some users, some JavaScript resource requests fail with the ERR_CONTENT_DECODING_FAILED error message. This issue gets resolved if we reload the app after either clearing the CEF cache or after disabling the cache from the network tab in the developer toolbar window. Please note that this issue isn't confined to a specific subset of resource files: instead, we have seen it happening sporadically for a variety of JavaScript resource files (some hosted on Apache while the others hosted on IIS servers).
While a possible cause for usual ERR_CONTENT_DECODING_FAILED error is a server-side content-encoding issue, in this specific case, we believe this could potentially be related to the CEF browser caching. Please see the analysis section below for the reasons we believe so.
Application Setup
When we initialize CEF settings, we set MultiThreadedMessageLoop setting to true and set CachePath property to a location under %localappdata% on windows 10 machine. When the container app starts, it creates three CEF web browser controls and launches web apps in them. All three apps load concurrently. After that, more CEF web browsers are created as the user visits more apps. The user also reloads some of these apps over time. All the web apps are internal apps sharing the same domain but physically hosted on different web servers. The JavaScript resource files in question usually have caching policy set to allow them to be cached for a week.
CEFSharp version - 79.1.360.0
CEF version - r79.1.36+g90301bd+chromium-79.0.3945.130
Chromium version - 79.0.3945.130
Our Analysis so far
All of the above points lead us to believe that, JavaScript resource files were downloaded (with correct encoding) and cached in CEF cache. The user was also able to use the apps for some time. After that, however, in certain scenarios, some of these files potentially got corrupted in CEF cache, leading to the content decoding error.
We tried using CEF response filter mechanism as explained here to capture the bad response when content decoding error occurs. Unfortunately, we observed that dataIn stream which gets passed to filter function is null when the response fails with this error.
Summary and Questions
This is a sporadic issue which our users are facing. We haven't found a way to deterministically recreate this problem. However based on our analysis so far, we believe some JavaScript files may be getting corrupted in CEF cache over time. We are not sure if the fact that we host several CEF web browsers and load them concurrently could be playing some role in causing this issue.
Has anyone else observed/reported a similar issue? Do you have any idea if we are missing or overlooking something here or going in the wrong direction? Any pointers will be greatly appreciated.
While a possible cause for usual ERR_CONTENT_DECODING_FAILED error is a server-side content-encoding issue, in this specific case, we believe this could potentially be related to the CEF browser caching. Please see the analysis section below for the reasons we believe so.
Application Setup
When we initialize CEF settings, we set MultiThreadedMessageLoop setting to true and set CachePath property to a location under %localappdata% on windows 10 machine. When the container app starts, it creates three CEF web browser controls and launches web apps in them. All three apps load concurrently. After that, more CEF web browsers are created as the user visits more apps. The user also reloads some of these apps over time. All the web apps are internal apps sharing the same domain but physically hosted on different web servers. The JavaScript resource files in question usually have caching policy set to allow them to be cached for a week.
CEFSharp version - 79.1.360.0
CEF version - r79.1.36+g90301bd+chromium-79.0.3945.130
Chromium version - 79.0.3945.130
Our Analysis so far
- 1. We checked the web-server logs for the failing JavaScript resources. We observed that in most cases, the server requests for those resource files (by the impacted user) were made a few days ago. The users are usually able to use the application well for some days before they sporadically start getting this error.
- 2. We checked the network logs (*.HAR file). We see that for the failing JavaScript resource, _transferSize is 0 (which seems to indicate that response was served from the cache as indicated here and here)
- 3. When the error occurs, it gets resolved when we reload the app after either clearing the cache or disabling the cache from the network tab.
- 4. We tried artificially simulating this error. We used Fiddler's autoresponder feature to deliberately respond with a bad server response (the content was 'gzip' encoded however we changed Content-Encoding header to indicate 'br'). We could simulate the ERR_CONTENT_DECODING_FAILED error. In network logs, we could see that _tranferSize was a non-zero value. We also observed that chrome did not cache the bad response (when we turned off the auto-responder, it again made a fresh server request). This test indicates that when the original JavaScript response was cached by the browser, it must have been a correctly encoded response, or else the browser would not have cached it.
All of the above points lead us to believe that, JavaScript resource files were downloaded (with correct encoding) and cached in CEF cache. The user was also able to use the apps for some time. After that, however, in certain scenarios, some of these files potentially got corrupted in CEF cache, leading to the content decoding error.
We tried using CEF response filter mechanism as explained here to capture the bad response when content decoding error occurs. Unfortunately, we observed that dataIn stream which gets passed to filter function is null when the response fails with this error.
Summary and Questions
This is a sporadic issue which our users are facing. We haven't found a way to deterministically recreate this problem. However based on our analysis so far, we believe some JavaScript files may be getting corrupted in CEF cache over time. We are not sure if the fact that we host several CEF web browsers and load them concurrently could be playing some role in causing this issue.
Has anyone else observed/reported a similar issue? Do you have any idea if we are missing or overlooking something here or going in the wrong direction? Any pointers will be greatly appreciated.