main Segfault using cmake & -march=armv8.4a flag #6990

Jeximo · 2024-04-29T21:29:45Z

main crashes with DCMAKE_C_FLAGS=-march=armv8.4a flag. Here's the trace:

llama_new_context_with_model: CPU  output buffer size = 0.49 MiB                         
llama_new_context_with_model: CPU compute buffer size = 3.53 MiB                         
llama_new_context_with_model: graph nodes  = 1030 
llama_new_context_with_model: graph splits = 1  
[New Thread 0x682f (LWP 26671)]
[New Thread 0x6830 (LWP 26672)]
[New Thread 0x6831 (LWP 26673)]

Thread 4 "main" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x6831 (LWP 26673)]          
0x00000055556c4724 in ggml_graph_compute_thread (data=0x7fffffc730) at /data/data/com.termux/files/home/llama.cpp/ggml.c:18361
18361 atomic_store(&state->shared->n_active,  n_threads);                            
(gdb) bt

#0  0x00000055556c4724 in ggml_graph_compute_thread (data=0x7fffffc730) at /data/data/com.termux/files/home/llama.cpp/ggml.c:18361
#1  0x0000007fbb306e48 in __pthread_start(void*) () from /apex/com.android.runtime/lib64/bionic/libc.so
#2  0x0000007fbb2a3458 in __start_thread () from /apex/com.android.runtime/lib64/bionic/libc.so

build/run log

cmake -B build -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod+i8mm && cd build && cmake --build . --config Release --target server --target main && cd bin/

-- The C compiler identification is Clang 18.1.4
-- The CXX compiler identification is Clang 18.1.4
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /data/data/com.termux/files/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /data/data/com.termux/files/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.44.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (3.2s)
-- Generating done (0.3s)
-- Build files have been written to: /data/data/com.termux/files/home/llama.cpp/build

[ 6%] Generating build details from Git
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.44.0")
[ 12%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 12%] Built target build_info
[ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o
/data/data/com.termux/files/home/llama.cpp/ggml.c:1564:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1564 | GGML_F16_VEC_REDUCE(sumf, sum);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:984:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
984 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:974:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
974 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:904:11: note: expanded from macro 'GGML_F32x4_REDUCE'
904 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:889:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
889 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:1612:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1612 | GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:984:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
984 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:974:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
974 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:904:11: note: expanded from macro 'GGML_F32x4_REDUCE'
904 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:889:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
889 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
2 warnings generated.
[ 18%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 25%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[ 25%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3412:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3412 | const block_q4_0 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3415:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3415 | const block_q8_0 * restrict vy1 = vy + by;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3779:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3779 | const block_q4_1 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3781:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3781 | const block_q8_1 * restrict vy1 = vy + by;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:4592:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
4592 | const block_q8_0 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:4594:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
4594 | const block_q8_0 * restrict vy1 = vy + by;
| ~~ ^
6 warnings generated.
[ 31%] Building CXX object CMakeFiles/ggml.dir/sgemm.cpp.o
[ 31%] Built target ggml
[ 31%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o
[ 37%] Building CXX object CMakeFiles/llama.dir/unicode.cpp.o
[ 43%] Building CXX object CMakeFiles/llama.dir/unicode-data.cpp.o
[ 43%] Linking CXX static library libllama.a
[ 43%] Built target llama
[ 43%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 50%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 56%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 56%] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 62%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 68%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 68%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 75%] Linking CXX static library libcommon.a
[ 75%] Built target common
[ 81%] Generating json-schema-to-grammar.mjs.hpp
[ 87%] Generating completion.js.hpp
[ 93%] Generating index.html.hpp
[ 93%] Generating index.js.hpp
[ 93%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
[100%] Linking CXX executable ../../bin/server
[100%] Built target server
[ 15%] Built target build_info
[ 38%] Built target ggml
[ 53%] Built target llama
[ 92%] Built target common
[ 92%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/main
[100%] Built target main

Log start
main: build = 2768 (b8c1476)
main: built with clang version 18.1.4 for aarch64-unknown-linux-android24
main: seed = 1714423770
llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1:
general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10:
general.file_type u32 = 27
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - kv 22: quantize.imatrix.file str = /models/Meta-Llama-3-8B-Instruct-GGUF...
llama_model_loader: - kv 23: quantize.imatrix.dataset str = /training_data/groups_merged.txt
llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 224
llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 88
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 68 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_loader: - type iq3_s: 157 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = IQ3_S mix - 3.66 bpw
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 3.52 GiB (3.76 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size = 0.15 MiB
llm_load_tensors: CPU buffer size = 3602.02 MiB
.....................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 7
llama_new_context_with_model: n_ubatch = 7
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 3.53 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1

fish: Job 1, './main -m ~/Meta-Llama-3-8B-Ins…' terminated by signal SIGILL (Illegal instruction)

main.log

[1714423770] main: build = 2768 (b8c1476)
[1714423770] main: built with clang version 18.1.4 for aarch64-unknown-linux-android24
[1714423770] main: seed = 1714423770
[1714423770] main: llama backend init
[1714423770] main: load the model and apply lora adapter, if any
[1714423770] llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf (version GGUF V3 (latest))
[1714423770] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1714423770] llama_model_loader: - kv 0: general.architecture str = llama
[1714423770] llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
[1714423770] llama_model_loader: - kv 2: llama.block_count u32 = 32
[1714423770] llama_model_loader: - kv 3: llama.context_length u32 = 8192
[1714423770] llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
[1714423770] llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
[1714423770] llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
[1714423770] llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
[1714423770] llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
[1714423770] llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[1714423770] llama_model_loader: - kv 10: general.file_type u32 = 27
[1714423770] llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
[1714423770] llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
[1714423770] llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
[1714423770] llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
[1714423771] llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
[1714423771] llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1714423771] llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
[1714423771] llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
[1714423771] llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
[1714423771] llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
[1714423771] llama_model_loader: - kv 21: general.quantization_version u32 = 2
[1714423771] llama_model_loader: - kv 22: quantize.imatrix.file str = /models/Meta-Llama-3-8B-Instruct-GGUF...
[1714423771] llama_model_loader: - kv 23: quantize.imatrix.dataset str = /training_data/groups_merged.txt
[1714423771] llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 224
[1714423771] llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 88
[1714423771] llama_model_loader: - type f32: 65 tensors
[1714423771] llama_model_loader: - type q4_K: 68 tensors
[1714423771] llama_model_loader: - type q6_K: 1 tensors
[1714423771] llama_model_loader: - type iq3_s: 157 tensors
[1714423772] llm_load_vocab: special tokens definition check successful ( 256/128256 ).
[1714423772] llm_load_print_meta: format = GGUF V3 (latest)
[1714423772] llm_load_print_meta: arch = llama
[1714423772] llm_load_print_meta: vocab type = BPE
[1714423772] llm_load_print_meta: n_vocab = 128256
[1714423772] llm_load_print_meta: n_merges = 280147
[1714423772] llm_load_print_meta: n_ctx_train = 8192
[1714423772] llm_load_print_meta: n_embd = 4096
[1714423772] llm_load_print_meta: n_head = 32
[1714423772] llm_load_print_meta: n_head_kv = 8
[1714423772] llm_load_print_meta: n_layer = 32
[1714423772] llm_load_print_meta: n_rot = 128
[1714423772] llm_load_print_meta: n_embd_head_k = 128
[1714423772] llm_load_print_meta: n_embd_head_v = 128
[1714423772] llm_load_print_meta: n_gqa = 4
[1714423772] llm_load_print_meta: n_embd_k_gqa = 1024
[1714423772] llm_load_print_meta: n_embd_v_gqa = 1024
[1714423772] llm_load_print_meta: f_norm_eps = 0.0e+00
[1714423772] llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[1714423772] llm_load_print_meta: f_clamp_kqv = 0.0e+00
[1714423772] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1714423772] llm_load_print_meta: f_logit_scale = 0.0e+00
[1714423772] llm_load_print_meta: n_ff = 14336
[1714423772] llm_load_print_meta: n_expert = 0
[1714423772] llm_load_print_meta: n_expert_used = 0
[1714423772] llm_load_print_meta: causal attn = 1
[1714423772] llm_load_print_meta: pooling type = 0
[1714423772] llm_load_print_meta: rope type = 0
[1714423772] llm_load_print_meta: rope scaling = linear
[1714423772] llm_load_print_meta: freq_base_train = 500000.0
[1714423772] llm_load_print_meta: freq_scale_train = 1
[1714423772] llm_load_print_meta: n_yarn_orig_ctx = 8192
[1714423772] llm_load_print_meta: rope_finetuned = unknown
[1714423772] llm_load_print_meta: ssm_d_conv = 0
[1714423772] llm_load_print_meta: ssm_d_inner = 0
[1714423772] llm_load_print_meta: ssm_d_state = 0
[1714423772] llm_load_print_meta: ssm_dt_rank = 0
[1714423772] llm_load_print_meta: model type = 8B
[1714423772] llm_load_print_meta: model ftype = IQ3_S mix - 3.66 bpw
[1714423772] llm_load_print_meta: model params = 8.03 B
[1714423772] llm_load_print_meta: model size = 3.52 GiB (3.76 BPW)
[1714423772] llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
[1714423772] llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
[1714423772] llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
[1714423772] llm_load_print_meta: LF token = 128 'Ä'
[1714423772] llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
[1714423772] llm_load_tensors: ggml ctx size = 0.15 MiB
[1714423776] llm_load_tensors: CPU buffer size = 3602.02 MiB
[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776]
[1714423776] llama_new_context_with_model: n_ctx = 2048
[1714423776] llama_new_context_with_model: n_batch = 7
[1714423776] llama_new_context_with_model: n_ubatch = 7
[1714423776] llama_new_context_with_model: freq_base = 500000.0
[1714423776] llama_new_context_with_model: freq_scale = 1
[1714423776] llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
[1714423776] llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
[1714423776] llama_new_context_with_model: CPU output buffer size = 0.49 MiB
[1714423776] llama_new_context_with_model: CPU compute buffer size = 3.53 MiB
[1714423776] llama_new_context_with_model: graph nodes = 1030
[1714423776] llama_new_context_with_model: graph splits = 1
[1714423776] warming up the model with an empty run

uname -a:
Linux localhost 4.14.190-23725627-abG975WVLS8IWD1 #2 SMP PREEMPT Mon Apr 10 18:16:39 KST 2023 aarch64 Android

clang --version:
clang version 18.1.4
Target: aarch64-unknown-linux-android24

cmake --version
cmake version 3.29.2

lscpu

Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              Qualcomm
  Model name:           Kryo-4XX-Silver
    Model:              14
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           0xd
    CPU(s) scaling MHz: 62%
    CPU max MHz:        1785.6000
    CPU min MHz:        300.0000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull
                         sha1 sha2 crc32 atomics f
                        php asimdhp cpuid asimdrdm
                         lrcpc dcpop asimddp
  Model name:           Kryo-4XX-Gold
    Model:              14
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          2
    Stepping:           0xd
    CPU(s) scaling MHz: 71%
    CPU max MHz:        2841.6001
    CPU min MHz:        710.4000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull
                         sha1 sha2 crc32 atomics f
                        php asimdhp cpuid asimdrdm
                         lrcpc dcpop asimddp
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Vulnerable
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; __user pointer
                         sanitization
  Spectre v2:           Mitigation; Branch predict
                        or hardening
  Srbds:                Not affected
  Tsx async abort:      Not affected

make builds/runs as expected. Also, cmake works by removing -DCMAKE_C_FLAGS=-march=armv8.4a. Finally, -DLLAMA_SANITIZE_ADDRESS=ON allows me to build/run including all flags, but that's less than ideal.

Thanks.

The text was updated successfully, but these errors were encountered:

Manamama · 2024-04-29T22:57:54Z

Dunno if it helps, but it has always (+4 months) worked and compiled on my box, very similar:

clang version 18.1.4
Target: aarch64-unknown-linux-android24
Thread model: posix
InstalledDir: /data/data/com.termux/files/usr/bin
~ $ cmake --version
cmake version 3.28.3

CMake suite maintained and supported by Kitware (kitware.com/cmake).
~ $ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A55
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 6
    Socket(s):          1
    Stepping:           r1p0
    CPU(s) scaling MHz: 54%
    CPU max MHz:        2000.0000
    CPU min MHz:        500.0000
    BogoMIPS:           26.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
                        fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
  Model name:           Cortex-A76
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           r3p0
    CPU(s) scaling MHz: 64%
    CPU max MHz:        2050.0000
    CPU min MHz:        774.0000
    BogoMIPS:           26.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
                        fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via p
                        rctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected
~ $ uname -a
Linux localhost 4.14.186+ #1 SMP PREEMPT Thu Mar 17 16:28:22 CST 2022 aarch64 Android
~ $

If smth crashes like in yours, I recompile with this one-liner:

alias cmakeinstall='rm CMakeCache.txt & export CFLAGS="-fuse-ld=lld  -pthread -g -march=armv8-a -mtune=cortex-a53 -Wall -Wextra" && export CXXFLAGS="  -pthread -g -march=armv8-a -mtune=cortex-a53 -Wall -Wextra" &&  cmake -DCMAKE_INSTALL_PREFIX=$PREFIX . && time make -j4 && make install'

Jeximo · 2024-05-07T21:29:16Z

@Manamama thanks, your fix did help.

I found replacing the instruction with -DCMAKE_CXX_FLAGS:STRING=-march=armv8.4a also works.

Jeximo added the bug-unconfirmed label Apr 29, 2024

Jeximo mentioned this issue May 6, 2024

illegal instruction and crash when run llama-bench (build on android device not cross platform compilation )on android #6995

Open

Jeximo closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main Segfault using cmake & -march=armv8.4a flag #6990

main Segfault using cmake & -march=armv8.4a flag #6990

Jeximo commented Apr 29, 2024 •

edited

Manamama commented Apr 29, 2024 •

edited

Jeximo commented May 7, 2024

main Segfault using cmake & -march=armv8.4a flag #6990

main Segfault using cmake & -march=armv8.4a flag #6990

Comments

Jeximo commented Apr 29, 2024 • edited

Manamama commented Apr 29, 2024 • edited

Jeximo commented May 7, 2024

Jeximo commented Apr 29, 2024 •

edited

Manamama commented Apr 29, 2024 •

edited