CEF 119: dlopen failed in Linux.

Having problems with building or using CEF's C/C++ APIs? This forum is here to help. Please do not post bug reports or feature requests here.

CEF 119: dlopen failed in Linux.

Postby salvadordf » Tue Nov 14, 2023 10:46 am

Hi,

I'm the CEF4Delphi maintainer and we are having an issue when we try to call dlopen to load libcef.so

This issue started in CEF 119.3.1 for Linux in Intel CPUs but we detected this issue in Ubuntu for RaspberryPi several months ago.

This is the smallest C code that shows this issue in a console application :
Code: Select all
#include <stdio.h>
#include <dlfcn.h>

int main (void)
{
  void* g_libcef_handle = dlopen("/path/to/libcef.so", RTLD_LAZY);
  if (!g_libcef_handle) {
    fprintf(stderr, "dlerror %s\n", dlerror());
  }
  return 0;
}


dlopen returns null and this is the output :
dlerror /path/to/libcef.so: cannot allocate memory in static TLS block


We could load libcef.so in CEF 118 or older for Intel without problems.

This "export LD_PRELOAD" workaround works but please let me know if anyone knows a better solution :
export LD_PRELOAD=<FULL-PATH-TO-libcef.so>
Maintainer of the CEF4Delphi, WebView4Delphi, WebUI4Delphi and WebUI4CSharp projects.
User avatar
salvadordf
Expert
 
Posts: 129
Joined: Sun Dec 18, 2016 8:39 am
Location: Spain

Re: CEF 119: dlopen failed in Linux.

Postby magreenblatt » Tue Nov 14, 2023 11:14 am

Perhaps chromium has changed the default TLS model. For background: https://github.com/jemalloc/jemalloc/issues/1237 Possibly related: https://bugs.chromium.org/p/chromium/is ... id=1416182
magreenblatt
Site Admin
 
Posts: 12409
Joined: Fri May 29, 2009 6:57 pm

Re: CEF 119: dlopen failed in Linux.

Postby Slartie » Mon Dec 04, 2023 4:47 am

I ran into a very similar problem with CEF 119 and JCEF. I've ultimately solved it by introducing a patch into our custom CEF branch (we build CEF and JCEF for all platforms off of customized branches, as we have some non-standard changes and extensions applied into both, so adding another patch wasn't that big of a deal - though maybe this particular patch might make sense to be upstreamed to CEF? CEF 119+ isn't easily dynamically loadable anymore on Linux without it, and dynamic loading is something a lot of wrappers like JCEF or your Delphi wrapper tend to do for good (bundling) reasons). The patch disables static TLS usage in libxml2, which is the actual culprit behind the sudden increase of TLS size requirements from 118 to 119.

Here's a copy of the documentation I wrote into our internal Wiki to log my investigations:

The TLS segment size of Chromium 119 increases by over 700 bytes, as can be seen here:
Code: Select all
[xxxx@localhost jcef]$ readelf -Wl libcef.so | grep -E 'PhysAddr|TLS'
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
TLS 0xc8aae40 0x000000000c8ace40 0x000000000c8ace40 0x00008c 0x0006d0 R 0x40

The 0x0006d8 is the segment size in hex, which is 1752 bytes. Earlier versions only had 0x3e8 or 1000 bytes of TLS segment size.

The effective surplus allocated by default in x64 Linux (by ld-linux, which is the lib responsible to do this allocation) is sufficient for about 1600 or so bytes. Obviously, the 1752 bytes alone already exceed this limitation (and that is before even considering other dynamically-loaded libraries which might also need a byte or two).

Further investigation found the following usage (.tdata and .tbss are the symbol sections of interest when looking for the actual users of all that TLS space - this article here (https://fasterthanli.me/articles/a-dyna ... er-mystery) was very helpful in learning interesting tidbits about TLS, and commands to investigate the situation):
Code: Select all
[xxxx@localhost jcef]$ llvm-objdump -C -t libcef.so | grep -F '.tdata'
0000000000000000 l .tdata 0000000000000008 .hidden absl::cord_internal::cordz_next_sample
0000000000000008 l .tdata 0000000000000004 partition_alloc::internal::base::(anonymous namespace)::g_thread_id
000000000000000c l .tdata 0000000000000001 partition_alloc::internal::base::(anonymous namespace)::g_is_main_thread
0000000000000010 l .tdata 0000000000000004 base::(anonymous namespace)::current_sequence_token
0000000000000014 l .tdata 0000000000000004 base::(anonymous namespace)::current_task_token
0000000000000018 l .tdata 0000000000000001 base::internal::(anonymous namespace)::task_priority_for_current_thread
000000000000001c l .tdata 0000000000000004 base::(anonymous namespace)::current_thread_type
0000000000000020 l .tdata 0000000000000008 base::(anonymous namespace)::thread_name
0000000000000028 l .tdata 0000000000000004 base::(anonymous namespace)::g_thread_id
000000000000002c l .tdata 0000000000000001 base::(anonymous namespace)::g_is_main_thread
0000000000000040 l .tdata 0000000000000040 .hidden google::protobuf::internal::ThreadSafeArena::thread_cache_
0000000000000080 l .tdata 0000000000000004 simd_support
0000000000000084 l .tdata 0000000000000004 simd_huffman
0000000000000088 l .tdata 0000000000000004 blink::next_world_id

[xxxx@localhost jcef]$ llvm-objdump -C -t libcef.so | grep -F '.tbss'
0000000000000090 l .tbss 0000000000000008 .hidden perfetto::DataSource<base::perfetto_track_event::TrackEvent, perfetto::internal::TrackEventDataSourceTraits>::tls_state_
0000000000000098 l .tbss 0000000000000008 (anonymous namespace)::g_isolate_manager
00000000000000a0 l .tbss 0000000000000018 absl::cord_internal::cordz_should_profile_slow()::exponential_biased_generator
00000000000000b8 l .tbss 0000000000000008 .hidden partition_alloc::internal::g_thread_cache
00000000000000c0 l .tbss 0000000000000001 mojo::core::(anonymous namespace)::is_extracting_handles_from_message
00000000000000c8 l .tbss 0000000000000008 mojo::core::(anonymous namespace)::current_context
00000000000000d8 l .tbss 0000000000000001 guard variable for SkStrikeCache::GlobalStrikeCache()::cache
00000000000000d0 l .tbss 0000000000000008 SkStrikeCache::GlobalStrikeCache()::cache
00000000000000e0 l .tbss 0000000000000008 dawn::native::(anonymous namespace)::tlDevice
00000000000000e8 l .tbss 0000000000000008 .hidden perfetto::DataSource<v8::perfetto_track_event::TrackEvent, perfetto::internal::TrackEventDataSourceTraits>::tls_state_
00000000000000f0 l .tbss 0000000000000001 v8::internal::(anonymous namespace)::tls_singleton_taken
00000000000000f8 l .tbss 0000000000000018 v8::internal::(anonymous namespace)::tls_singleton_storage
0000000000000110 l .tbss 0000000000000004 .hidden v8::internal::RwxMemoryWriteScope::code_space_write_nesting_level_
0000000000000118 l .tbss 0000000000000008 v8::internal::g_current_per_isolate_thread_data_
0000000000000120 l .tbss 0000000000000008 .hidden v8::internal::g_current_isolate_
0000000000000128 l .tbss 0000000000000004 v8::internal::(anonymous namespace)::thread_id
0000000000000130 l .tbss 0000000000000008 v8::internal::(anonymous namespace)::current_marking_barrier
0000000000000138 l .tbss 0000000000000008 v8::internal::(anonymous namespace)::pending_layout_change_object_address
0000000000000140 l .tbss 0000000000000008 v8::internal::(anonymous namespace)::current_local_heap
0000000000000148 l .tbss 0000000000000008 v8::internal::maglev::labeller_
0000000000000150 l .tbss 0000000000000008 .hidden v8::base::ContextualVariable<v8::internal::compiler::turboshaft::PipelineData, v8::internal::compiler::turboshaft::PipelineData>::top_
0000000000000158 l .tbss 0000000000000004 .hidden v8::internal::trap_handler::g_thread_in_wasm_code
0000000000000160 l .tbss 0000000000000008 v8::internal::wasm::(anonymous namespace)::current_code_refs_scope
0000000000000168 l .tbss 0000000000000008 .hidden v8::base::ContextualVariable<v8::internal::compiler::turboshaft::Tracing, v8::internal::compiler::turboshaft::Tracing>::top_
0000000000000170 l .tbss 0000000000000008 .hidden v8::base::ContextualVariable<v8::internal::compiler::turboshaft::TypeInferenceReducerArgs, v8::internal::compiler::turboshaft::TypeInferenceReducerArgs>::top_
0000000000000178 l .tbss 0000000000000008 ppapi::(anonymous namespace)::ppapi_globals_for_test
0000000000000180 l .tbss 0000000000000001 ppapi::disable_locking_for_thread
0000000000000181 l .tbss 0000000000000001 ppapi::proxy_locked_on_thread
0000000000000188 l .tbss 0000000000000008 content::(anonymous namespace)::notification_service
0000000000000190 l .tbss 0000000000000008 content::media_stream_manager
00000000000001a0 l .tbss 0000000000000008 openscreen::internal::ScopedTraceOperation::root_node_
0000000000000198 l .tbss 0000000000000008 .hidden openscreen::internal::ScopedTraceOperation::traces_
00000000000001a8 l .tbss 0000000000000008 content::(anonymous namespace)::utility_thread
00000000000001b0 l .tbss 0000000000000008 blink::g_thread_specific_
00000000000001b8 l .tbss 0000000000000008 blink::(anonymous namespace)::current_thread
00000000000001c0 l .tbss 0000000000000008 webrtc::(anonymous namespace)::jingle_thread_wrapper
00000000000001c8 l .tbss 0000000000000008 extensions::(anonymous namespace)::contexts
00000000000001d0 l .tbss 0000000000000008 extensions::(anonymous namespace)::service_worker_data
00000000000001d8 l .tbss 0000000000000008 extensions::worker_thread_util::(anonymous namespace)::worker_context_proxy
00000000000001e0 l .tbss 0000000000000008 mojo::internal::(anonymous namespace)::g_thread_local_node
00000000000001e8 l .tbss 0000000000000008 base::internal::current_notification
00000000000001f0 l .tbss 0000000000000008 base::(anonymous namespace)::delegate
00000000000001f8 l .tbss 0000000000000008 base::(anonymous namespace)::run_loop_timeout
0000000000000200 l .tbss 0000000000000008 base::(anonymous namespace)::UpdateAndGetThreadName(char const*)::thread_name
0000000000000208 l .tbss 0000000000000008 base::(anonymous namespace)::scoped_defer_task_posting
0000000000000210 l .tbss 0000000000000008 base::(anonymous namespace)::current_pending_task
0000000000000220 l .tbss 0000000000000008 base::(anonymous namespace)::current_long_task_tracker
0000000000000218 l .tbss 0000000000000008 base::(anonymous namespace)::current_scoped_ipc_hash
0000000000000228 l .tbss 0000000000000008 base::sequence_manager::(anonymous namespace)::thread_local_sequence_manager
0000000000000230 l .tbss 0000000000000008 base::(anonymous namespace)::current_default_handle
0000000000000238 l .tbss 0000000000000008 base::(anonymous namespace)::current_default_handle
0000000000000240 l .tbss 0000000000000001 base::internal::(anonymous namespace)::fizzle_block_shutdown_tasks
0000000000000248 l .tbss 0000000000000008 base::internal::(anonymous namespace)::current_thread_group
0000000000000250 l .tbss 0000000000000008 base::(anonymous namespace)::hang_watch_state
0000000000000258 l .tbss 0000000000000008 base::internal::(anonymous namespace)::blocking_observer
0000000000000260 l .tbss 0000000000000008 base::internal::(anonymous namespace)::last_scoped_blocking_call
0000000000000268 l .tbss 0000000000000008 base::internal::(anonymous namespace)::current_sequence_local_storage
0000000000000270 l .tbss 0000000000000008 base::(anonymous namespace)::fd_watcher
0000000000000278 l .tbss 0000000000000008 base::trace_event::(anonymous namespace)::thread_local_event_buffer
0000000000000280 l .tbss 0000000000000001 base::trace_event::(anonymous namespace)::thread_blocks_message_loop
0000000000000281 l .tbss 0000000000000001 base::trace_event::(anonymous namespace)::thread_is_in_trace_event
0000000000000288 l .tbss 0000000000000008 base::trace_event::TraceLog::ShouldAddAfterUpdatingState(char, unsigned char const*, char const*, unsigned long, int, base::TimeTicks, base::trace_event::TraceArguments*)::current_thread_name
0000000000000290 l .tbss 0000000000000001 base::tracing::GetThreadIsInTraceEvent()::thread_is_in_trace_event
00000000000002c0 l .tbss 0000000000000001 guard variable for quiche::(anonymous namespace)::Xoshiro256PlusPlus()::rng_state
00000000000002a0 l .tbss 0000000000000020 quiche::(anonymous namespace)::Xoshiro256PlusPlus()::rng_state
00000000000002c8 l .tbss 0000000000000008 quic::(anonymous namespace)::current_context
00000000000002d0 l .tbss 0000000000000001 IPC::(anonymous namespace)::off_sequence_binding_allowed
00000000000002d8 l .tbss 0000000000000008 IPC::(anonymous namespace)::received_queue
00000000000002e0 l .tbss 0000000000000008 perfetto::DataSource<tracing::PerfettoTracedProcess::DataSourceProxy<tracing::TraceEventMetadataSource>, perfetto::DefaultDataSourceTraits>::tls_state_
00000000000002e8 l .tbss 0000000000000008 perfetto::DataSource<tracing::PerfettoTracedProcess::DataSourceProxy<tracing::(anonymous namespace)::TracingSamplerProfilerDataSource>, perfetto::DefaultDataSourceTraits>::tls_state_
00000000000002f0 l .tbss 0000000000000008 webrtc::(anonymous namespace)::current
00000000000002f8 l .tbss 0000000000000008 SkSL::sMemPool
0000000000000300 l .tbss 0000000000000008 SkSL::sInstance
0000000000000308 l .tbss 0000000000000008 skgpu::ganesh::gCache
0000000000000310 l .tbss 0000000000000004 localRngInitialized
0000000000000314 l .tbss 0000000000000008 localRngState
0000000000000320 l .tbss 00000000000002c8 globalState
00000000000005e8 l .tbss 0000000000000008 ui::(anonymous namespace)::event_source
00000000000005f0 l .tbss 0000000000000008 gl::(anonymous namespace)::current_context
00000000000005f8 l .tbss 0000000000000008 gl::(anonymous namespace)::current_real_context
0000000000000600 l .tbss 0000000000000008 gl::ThreadLocalCurrentGL()::current_gl
0000000000000608 l .tbss 0000000000000008 gl::(anonymous namespace)::current_surface
0000000000000610 l .tbss 0000000000000008 .hidden re2::hooks::context
0000000000000618 l .tbss 0000000000000008 gpu::(anonymous namespace)::current_task_runner
0000000000000620 l .tbss 0000000000000001 variations::(anonymous namespace)::in_set_field_trial_group_from_browser
0000000000000628 l .tbss 0000000000000001 guard variable for WTF::CurrentThread()::g_id
0000000000000624 l .tbss 0000000000000004 WTF::CurrentThread()::g_id
0000000000000629 l .tbss 0000000000000001 .hidden WTF::g_is_main_thread
0000000000000630 l .tbss 0000000000000008 v8::base::(anonymous namespace)::thread_stack_start
0000000000000638 l .tbss 0000000000000008 .hidden perfetto::DataSource<tracing::PerfettoTracedProcess::DataSourceProxy<memory_instrumentation::TracingObserver>, perfetto::DefaultDataSourceTraits>::tls_state_
0000000000000640 l .tbss 0000000000000008 metrics::(anonymous namespace)::provider
0000000000000648 l .tbss 0000000000000008 gpu::webgpu::(anonymous namespace)::parent_decoder
0000000000000650 l .tbss 0000000000000008 content::(anonymous namespace)::child_process
0000000000000658 l .tbss 0000000000000008 content::(anonymous namespace)::child_thread_impl
0000000000000690 l .tbss 0000000000000008 guard variable for blink::HeapSizeCache::ForCurrentThread()::heap_size_cache
0000000000000660 l .tbss 0000000000000030 blink::HeapSizeCache::ForCurrentThread()::heap_size_cache
0000000000000698 l .tbss 0000000000000004 blink::script_forbidden_counter
00000000000006a0 l .tbss 0000000000000008 content::(anonymous namespace)::render_thread
00000000000006a8 l .tbss 0000000000000008 content::(anonymous namespace)::render_thread
00000000000006b0 l .tbss 0000000000000008 content::(anonymous namespace)::worker_data
00000000000006b8 l .tbss 0000000000000008 .hidden gwp_asan::internal::ThreadLocalState<gwp_asan::internal::ThreadLocalRandomBitGenerator>::state_
00000000000006c0 l .tbss 0000000000000008 gwp_asan::internal::ThreadLocalState<gwp_asan::internal::SamplingState<(gwp_asan::internal::ParentAllocator)0>>::state_
00000000000006c8 l .tbss 0000000000000008 gwp_asan::internal::ThreadLocalState<gwp_asan::internal::SamplingState<(gwp_asan::internal::ParentAllocator)1>>::state_

A huge lot of things, but most just consume 4 or 8 bytes. But there is one outlier - this line here:

Code: Select all
0000000000000320 l .tbss 00000000000002c8 globalState

This is a single entity allocating 0x2c8 = 712 bytes! A comparison with earlier Chromium binaries showed that this "globalState" didn't exist back then, it entered the picture with Chromium 119.

Since there was no namespace info to help, it was pretty likely that this "globalState" didn't belong to Chromium itself (the Chromium code uses namespaces almost throughout) but some third-party library, and a full-text search over the source code of all third-party libraries revealed that the "globalState" comes from libxml, which, in a recent update (https://gitlab.gnome.org/GNOME/libxml2/ ... es/v2.12.0), has begun to use compiler-allocated TLS (instead of dynamically allocating it) to store global variables (and apparently that library has A LOT of global variables...it's not exactly nice behavior to consume 700 bytes PER THREAD - regardless of whether the thread does XML parsing or not!) by default. Chromium uses the default configuration and thus gets the large TLS increase.

The use of compiler-allocated TLS is configurable however, so the solution was relatively obvious: change the Chromium build to disable TLS, which is now done by this little patch introduced into our CEF codebases' patch stack applied to Chromium before building:
Code: Select all
diff --git third_party/libxml/linux/config.h third_party/libxml/linux/config.h
index c064071ce1545..65110af9a78f5 100644
--- third_party/libxml/linux/config.h
+++ third_party/libxml/linux/config.h
@@ -171,7 +171,7 @@
/* #undef XML_SOCKLEN_T */
 
/* TLS specifier */
-#define XML_THREAD_LOCAL _Thread_local
+/* #undef XML_THREAD_LOCAL */
 
/* Define for Solaris 2.5.1 so the uint32_t typedef from <sys/synch.h>,
<pthread.h>, or <semaphore.h> is not used. If the typedef were allowed, the

Tada! TLS size used by libcef.so is back down to a much more manageable 1012 bytes, and the "cannot allocate memory in static TLS block" message is gone.

It should be noted that there was an alternative, but not-so-nice solution: keep the high TLS usage, but increase the TLS space surplus allocated by ld-linux. A few years ago, a glibc "tunable" were introduced for exactly this purpose: glibc.rtld.optional_static_tls, as documented here: https://www.gnu.org/software/libc/manua ... ables.html

The default value of 512 effectively leads to the mentioned about 1600 bytes of TLS surplus (don't ask me why 512 bytes suddenly grow into three times as much, but there's some complex calculation logic for effective TLS size allocation in the linking code that somehow adds additional space). By increasing the tunable by just another 512 bytes to 1024, you can successfully load the "large" libcef.so into a running JVM. However, this means that the env variable must be set before executing the JVM, which is an additional burden (static TLS can't be increased once the process is running). And: the tunable only exists in relatively new Linux distributions - CentOS 7, for example, does not yet have support for it, while Rocky Linux 9 does. There is no known way (at least not to me) to increase static TLS in old Linux systems outside of preloading libraries like libcef.so using LD_PRELOAD (then the linker is able to incorporate the significant static TLS size needs of the lib into the amount allocated at process start), which however is no real solution if you want to dynamically extract and load CEF and JCEF in a running Java app.
Last edited by Slartie on Mon Dec 04, 2023 5:48 am, edited 1 time in total.
Slartie
Techie
 
Posts: 11
Joined: Mon Sep 03, 2018 5:47 am

Re: CEF 119: dlopen failed in Linux.

Postby salvadordf » Mon Dec 04, 2023 5:22 am

Thank you very much Slartie ! :D

I hope this patch is included in the automated build server at Spotify since this issue affects JCEF, Energy and CEF4Delphi wrappers.
Maintainer of the CEF4Delphi, WebView4Delphi, WebUI4Delphi and WebUI4CSharp projects.
User avatar
salvadordf
Expert
 
Posts: 129
Joined: Sun Dec 18, 2016 8:39 am
Location: Spain

Re: CEF 119: dlopen failed in Linux.

Postby magreenblatt » Mon Dec 04, 2023 10:34 am

@Slartie Please add a New issue with your analysis at https://github.com/chromiumembedded/cef/issues
magreenblatt
Site Admin
 
Posts: 12409
Joined: Fri May 29, 2009 6:57 pm

Re: CEF 119: dlopen failed in Linux.

Postby Slartie » Tue Dec 05, 2023 6:49 am

Slartie
Techie
 
Posts: 11
Joined: Mon Sep 03, 2018 5:47 am


Return to Support Forum

Who is online

Users browsing this forum: No registered users and 212 guests