Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Crash on gluten exit #5741

Closed
zhanglistar opened this issue May 14, 2024 · 4 comments · Fixed by #5787
Closed

[CH] Crash on gluten exit #5741

zhanglistar opened this issue May 14, 2024 · 4 comments · Fixed by #5787
Labels
bug Something isn't working triage

Comments

@zhanglistar
Copy link
Contributor

Backend

CH (ClickHouse)

Bug description

image

But, when I gdb the core, get,

BFD: Warning: /data/corefiles/core.VM Thread.78310.1715039755 is truncated: expected core file size >= 1950076928, found: 1391566848.

The core is not complete to trace it, need to dig why.

We first found the core on occur 2024.2.12.

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

No response

@zhanglistar zhanglistar added bug Something isn't working triage labels May 14, 2024
@zhanglistar
Copy link
Contributor Author

zhanglistar commented May 14, 2024

ulimit -a

yarn@hello-dn119:/data/corefiles$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1030220
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Not the limit problem.
The disk space is enough, not the disk problem.

@zhanglistar
Copy link
Contributor Author

This yarn.nodemanager.sleep-delay-before-sigkill.ms parameter of yarn cause this.

@zhanglistar
Copy link
Contributor Author

zhanglistar commented May 14, 2024

(gdb) bt
#0  0x00007fa458537428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fa45853902a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fa456a8fc15 in __gnu_cxx::__verbose_terminate_handler() ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#3  0x00007fa456a8de36 in __cxxabiv1::__terminate(void (*)()) ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#4  0x00007fa456a8de81 in std::terminate() ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#5  0x00007fa456a841cf in __cxa_pure_virtual ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#6  0x00007fa4568a76b0 in outputStream::print_cr(char const*, ...) ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#7  0x00007fa456a5d323 in VMError::report(outputStream*) ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#8  0x00007fa456a5ef1f in VMError::report_and_die() ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#9  0x00007fa4568a04e5 in JVM_handle_linux_signal ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#10 0x00007fa456892f48 in signalHandler(int, siginfo*, void*) ()
   from /data6/hadoop/yarn/local/filecache/187148/jdk-8u391-linux-x64.tar.gz/jdk1.8.0_391/jre/lib/amd64/server/libjvm.so
#11 <signal handler called>
#12 0x00007fa46b2bff2a in std::__1::allocator<Poco::AutoPtr<Poco::Notification>*>::allocate[abi:v15000](unsigned long) (__n=0, this=<optimized out>)
    at ../contrib/llvm-project/libcxx/include/__memory/allocator.h:107
#13 std::__1::allocator<Poco::AutoPtr<Poco::Notification>*>::allocate_at_least[abi:v15000](unsigned long) (__n=0, this=<optimized out>)
    at ../contrib/llvm-project/libcxx/include/__memory/allocator.h:119
#14 std::__1::allocate_at_least[abi:v15000]<std::__1::allocator<Poco::AutoPtr<Poco::Notification>*> >(std::__1::allocator<Poco::AutoPtr<Poco::Notification>*>&, unsigned long) (__n=0, __alloc=...) at ../contrib/llvm-project/libcxx/include/__memory/allocate_at_least.h:33
#15 std::__1::__allocate_at_least[abi:v15000]<std::__1::allocator<Poco::AutoPtr<Poco::Notification>*> >(std::__1::allocator<Poco::AutoPtr<Poco::Notification>*>&, unsigned long) (__n=0, __alloc=...) at ../contrib/llvm-project/libcxx/include/__memory/allocate_at_least.h:42
#16 std::__1::__split_buffer<Poco::AutoPtr<Poco::Notification>*, std::__1::allocator<Poco::AutoPtr<Poco::Notification>*>&>::__split_buffer (__cap=0,
    __start=0, this=<optimized out>, __a=...) at ../contrib/llvm-project/libcxx/include/__split_buffer:316
#17 std::__1::__split_buffer<Poco::AutoPtr<Poco::Notification>*, std::__1::allocator<Poco::AutoPtr<Poco::Notification>*> >::push_back[abi:v15000](Poco::AutoPtr<Poco::Notification>* const&) (this=0x7fa46c9879a0 <absl::synch_event+7664>, __x=<optimized out>)
    at ../contrib/llvm-project/libcxx/include/__split_buffer:559
#18 std::__1::deque<Poco::AutoPtr<Poco::Notification>, std::__1::allocator<Poco::AutoPtr<Poco::Notification> > >::__add_back_capacity (
    this=0x7fa46c9879a0 <absl::synch_event+7664>) at ../contrib/llvm-project/libcxx/include/deque:2597
#19 0x0000000000000000 in ?? ()

@zhanglistar
Copy link
Contributor Author

#5762

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant