Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main Segfault using cmake & -march=armv8.4a flag #6990

Closed
Jeximo opened this issue Apr 29, 2024 · 2 comments
Closed

main Segfault using cmake & -march=armv8.4a flag #6990

Jeximo opened this issue Apr 29, 2024 · 2 comments

Comments

@Jeximo
Copy link
Contributor

Jeximo commented Apr 29, 2024

main crashes with DCMAKE_C_FLAGS=-march=armv8.4a flag. Here's the trace:

llama_new_context_with_model: CPU  output buffer size = 0.49 MiB                         
llama_new_context_with_model: CPU compute buffer size = 3.53 MiB                         
llama_new_context_with_model: graph nodes  = 1030 
llama_new_context_with_model: graph splits = 1  
[New Thread 0x682f (LWP 26671)]
[New Thread 0x6830 (LWP 26672)]
[New Thread 0x6831 (LWP 26673)]

Thread 4 "main" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x6831 (LWP 26673)]          
0x00000055556c4724 in ggml_graph_compute_thread (data=0x7fffffc730) at /data/data/com.termux/files/home/llama.cpp/ggml.c:18361
18361 atomic_store(&state->shared->n_active,  n_threads);                            
(gdb) bt

#0  0x00000055556c4724 in ggml_graph_compute_thread (data=0x7fffffc730) at /data/data/com.termux/files/home/llama.cpp/ggml.c:18361
#1  0x0000007fbb306e48 in __pthread_start(void*) () from /apex/com.android.runtime/lib64/bionic/libc.so
#2  0x0000007fbb2a3458 in __start_thread () from /apex/com.android.runtime/lib64/bionic/libc.so
build/run log
cmake -B build -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod+i8mm && cd build && cmake --build . --config Release --target server --target main && cd bin/

-- The C compiler identification is Clang 18.1.4
-- The CXX compiler identification is Clang 18.1.4
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /data/data/com.termux/files/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /data/data/com.termux/files/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.44.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (3.2s)
-- Generating done (0.3s)
-- Build files have been written to: /data/data/com.termux/files/home/llama.cpp/build

[ 6%] Generating build details from Git
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.44.0")
[ 12%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 12%] Built target build_info
[ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o
/data/data/com.termux/files/home/llama.cpp/ggml.c:1564:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1564 | GGML_F16_VEC_REDUCE(sumf, sum);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:984:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
984 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:974:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
974 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:904:11: note: expanded from macro 'GGML_F32x4_REDUCE'
904 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:889:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
889 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:1612:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1612 | GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:984:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
984 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:974:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
974 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:904:11: note: expanded from macro 'GGML_F32x4_REDUCE'
904 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:889:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
889 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
2 warnings generated.
[ 18%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 25%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[ 25%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3412:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3412 | const block_q4_0 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3415:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3415 | const block_q8_0 * restrict vy1 = vy + by;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3779:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3779 | const block_q4_1 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3781:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3781 | const block_q8_1 * restrict vy1 = vy + by;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:4592:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
4592 | const block_q8_0 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:4594:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
4594 | const block_q8_0 * restrict vy1 = vy + by;
| ~~ ^
6 warnings generated.
[ 31%] Building CXX object CMakeFiles/ggml.dir/sgemm.cpp.o
[ 31%] Built target ggml
[ 31%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o
[ 37%] Building CXX object CMakeFiles/llama.dir/unicode.cpp.o
[ 43%] Building CXX object CMakeFiles/llama.dir/unicode-data.cpp.o
[ 43%] Linking CXX static library libllama.a
[ 43%] Built target llama
[ 43%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 50%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 56%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 56%] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 62%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 68%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 68%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 75%] Linking CXX static library libcommon.a
[ 75%] Built target common
[ 81%] Generating json-schema-to-grammar.mjs.hpp
[ 87%] Generating completion.js.hpp
[ 93%] Generating index.html.hpp
[ 93%] Generating index.js.hpp
[ 93%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
[100%] Linking CXX executable ../../bin/server
[100%] Built target server
[ 15%] Built target build_info
[ 38%] Built target ggml
[ 53%] Built target llama
[ 92%] Built target common
[ 92%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/main
[100%] Built target main

./main -m ~/Meta-Llama-3-8B-Instruct-IQ3_M.gguf -i --color --penalize-nl -e --temp 0 -t 4 -b 7 -c 2048 -r "<|eot_id|>" --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n"

Log start
main: build = 2768 (b8c1476)
main: built with clang version 18.1.4 for aarch64-unknown-linux-android24
main: seed = 1714423770
llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1:
general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10:
general.file_type u32 = 27
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - kv 22: quantize.imatrix.file str = /models/Meta-Llama-3-8B-Instruct-GGUF...
llama_model_loader: - kv 23: quantize.imatrix.dataset str = /training_data/groups_merged.txt
llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 224
llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 88
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 68 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_loader: - type iq3_s: 157 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = IQ3_S mix - 3.66 bpw
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 3.52 GiB (3.76 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size = 0.15 MiB
llm_load_tensors: CPU buffer size = 3602.02 MiB
.....................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 7
llama_new_context_with_model: n_ubatch = 7
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 3.53 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1

fish: Job 1, './main -m ~/Meta-Llama-3-8B-Ins…' terminated by signal SIGILL (Illegal instruction)

main.log
[1714423770] Log start [1714423770] Cmd: ./main -m /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf -i --color --penalize-nl -e --temp 0 -t 4 -b 7 -c 2048 -r <|eot_id|> --in-prefix \n<|start_header_id|>user<|end_header_id|>\n\n --in-suffix <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n"

[1714423770] main: build = 2768 (b8c1476)
[1714423770] main: built with clang version 18.1.4 for aarch64-unknown-linux-android24
[1714423770] main: seed = 1714423770
[1714423770] main: llama backend init
[1714423770] main: load the model and apply lora adapter, if any
[1714423770] llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf (version GGUF V3 (latest))
[1714423770] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1714423770] llama_model_loader: - kv 0: general.architecture str = llama
[1714423770] llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
[1714423770] llama_model_loader: - kv 2: llama.block_count u32 = 32
[1714423770] llama_model_loader: - kv 3: llama.context_length u32 = 8192
[1714423770] llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
[1714423770] llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
[1714423770] llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
[1714423770] llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
[1714423770] llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
[1714423770] llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[1714423770] llama_model_loader: - kv 10: general.file_type u32 = 27
[1714423770] llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
[1714423770] llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
[1714423770] llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
[1714423770] llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
[1714423771] llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
[1714423771] llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1714423771] llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
[1714423771] llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
[1714423771] llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
[1714423771] llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
[1714423771] llama_model_loader: - kv 21: general.quantization_version u32 = 2
[1714423771] llama_model_loader: - kv 22: quantize.imatrix.file str = /models/Meta-Llama-3-8B-Instruct-GGUF...
[1714423771] llama_model_loader: - kv 23: quantize.imatrix.dataset str = /training_data/groups_merged.txt
[1714423771] llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 224
[1714423771] llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 88
[1714423771] llama_model_loader: - type f32: 65 tensors
[1714423771] llama_model_loader: - type q4_K: 68 tensors
[1714423771] llama_model_loader: - type q6_K: 1 tensors
[1714423771] llama_model_loader: - type iq3_s: 157 tensors
[1714423772] llm_load_vocab: special tokens definition check successful ( 256/128256 ).
[1714423772] llm_load_print_meta: format = GGUF V3 (latest)
[1714423772] llm_load_print_meta: arch = llama
[1714423772] llm_load_print_meta: vocab type = BPE
[1714423772] llm_load_print_meta: n_vocab = 128256
[1714423772] llm_load_print_meta: n_merges = 280147
[1714423772] llm_load_print_meta: n_ctx_train = 8192
[1714423772] llm_load_print_meta: n_embd = 4096
[1714423772] llm_load_print_meta: n_head = 32
[1714423772] llm_load_print_meta: n_head_kv = 8
[1714423772] llm_load_print_meta: n_layer = 32
[1714423772] llm_load_print_meta: n_rot = 128
[1714423772] llm_load_print_meta: n_embd_head_k = 128
[1714423772] llm_load_print_meta: n_embd_head_v = 128
[1714423772] llm_load_print_meta: n_gqa = 4
[1714423772] llm_load_print_meta: n_embd_k_gqa = 1024
[1714423772] llm_load_print_meta: n_embd_v_gqa = 1024
[1714423772] llm_load_print_meta: f_norm_eps = 0.0e+00
[1714423772] llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[1714423772] llm_load_print_meta: f_clamp_kqv = 0.0e+00
[1714423772] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1714423772] llm_load_print_meta: f_logit_scale = 0.0e+00
[1714423772] llm_load_print_meta: n_ff = 14336
[1714423772] llm_load_print_meta: n_expert = 0
[1714423772] llm_load_print_meta: n_expert_used = 0
[1714423772] llm_load_print_meta: causal attn = 1
[1714423772] llm_load_print_meta: pooling type = 0
[1714423772] llm_load_print_meta: rope type = 0
[1714423772] llm_load_print_meta: rope scaling = linear
[1714423772] llm_load_print_meta: freq_base_train = 500000.0
[1714423772] llm_load_print_meta: freq_scale_train = 1
[1714423772] llm_load_print_meta: n_yarn_orig_ctx = 8192
[1714423772] llm_load_print_meta: rope_finetuned = unknown
[1714423772] llm_load_print_meta: ssm_d_conv = 0
[1714423772] llm_load_print_meta: ssm_d_inner = 0
[1714423772] llm_load_print_meta: ssm_d_state = 0
[1714423772] llm_load_print_meta: ssm_dt_rank = 0
[1714423772] llm_load_print_meta: model type = 8B
[1714423772] llm_load_print_meta: model ftype = IQ3_S mix - 3.66 bpw
[1714423772] llm_load_print_meta: model params = 8.03 B
[1714423772] llm_load_print_meta: model size = 3.52 GiB (3.76 BPW)
[1714423772] llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
[1714423772] llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
[1714423772] llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
[1714423772] llm_load_print_meta: LF token = 128 'Ä'
[1714423772] llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
[1714423772] llm_load_tensors: ggml ctx size = 0.15 MiB
[1714423776] llm_load_tensors: CPU buffer size = 3602.02 MiB
[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776]
[1714423776] llama_new_context_with_model: n_ctx = 2048
[1714423776] llama_new_context_with_model: n_batch = 7
[1714423776] llama_new_context_with_model: n_ubatch = 7
[1714423776] llama_new_context_with_model: freq_base = 500000.0
[1714423776] llama_new_context_with_model: freq_scale = 1
[1714423776] llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
[1714423776] llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
[1714423776] llama_new_context_with_model: CPU output buffer size = 0.49 MiB
[1714423776] llama_new_context_with_model: CPU compute buffer size = 3.53 MiB
[1714423776] llama_new_context_with_model: graph nodes = 1030
[1714423776] llama_new_context_with_model: graph splits = 1
[1714423776] warming up the model with an empty run

uname -a:
Linux localhost 4.14.190-23725627-abG975WVLS8IWD1 #2 SMP PREEMPT Mon Apr 10 18:16:39 KST 2023 aarch64 Android

clang --version:
clang version 18.1.4
Target: aarch64-unknown-linux-android24

cmake --version
cmake version 3.29.2

lscpu

Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              Qualcomm
  Model name:           Kryo-4XX-Silver
    Model:              14
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           0xd
    CPU(s) scaling MHz: 62%
    CPU max MHz:        1785.6000
    CPU min MHz:        300.0000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull
                         sha1 sha2 crc32 atomics f
                        php asimdhp cpuid asimdrdm
                         lrcpc dcpop asimddp
  Model name:           Kryo-4XX-Gold
    Model:              14
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          2
    Stepping:           0xd
    CPU(s) scaling MHz: 71%
    CPU max MHz:        2841.6001
    CPU min MHz:        710.4000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull
                         sha1 sha2 crc32 atomics f
                        php asimdhp cpuid asimdrdm
                         lrcpc dcpop asimddp
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Vulnerable
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; __user pointer
                         sanitization
  Spectre v2:           Mitigation; Branch predict
                        or hardening
  Srbds:                Not affected
  Tsx async abort:      Not affected

make builds/runs as expected. Also, cmake works by removing -DCMAKE_C_FLAGS=-march=armv8.4a. Finally, -DLLAMA_SANITIZE_ADDRESS=ON allows me to build/run including all flags, but that's less than ideal.

Thanks.

@Manamama
Copy link

Manamama commented Apr 29, 2024

Dunno if it helps, but it has always (+4 months) worked and compiled on my box, very similar:

clang version 18.1.4
Target: aarch64-unknown-linux-android24
Thread model: posix
InstalledDir: /data/data/com.termux/files/usr/bin
~ $ cmake --version
cmake version 3.28.3

CMake suite maintained and supported by Kitware (kitware.com/cmake).
~ $ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A55
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 6
    Socket(s):          1
    Stepping:           r1p0
    CPU(s) scaling MHz: 54%
    CPU max MHz:        2000.0000
    CPU min MHz:        500.0000
    BogoMIPS:           26.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
                        fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
  Model name:           Cortex-A76
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           r3p0
    CPU(s) scaling MHz: 64%
    CPU max MHz:        2050.0000
    CPU min MHz:        774.0000
    BogoMIPS:           26.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
                        fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via p
                        rctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected
~ $ uname -a
Linux localhost 4.14.186+ #1 SMP PREEMPT Thu Mar 17 16:28:22 CST 2022 aarch64 Android
~ $

If smth crashes like in yours, I recompile with this one-liner:

alias cmakeinstall='rm CMakeCache.txt & export CFLAGS="-fuse-ld=lld  -pthread -g -march=armv8-a -mtune=cortex-a53 -Wall -Wextra" && export CXXFLAGS="  -pthread -g -march=armv8-a -mtune=cortex-a53 -Wall -Wextra" &&  cmake -DCMAKE_INSTALL_PREFIX=$PREFIX . && time make -j4 && make install'

@Jeximo
Copy link
Contributor Author

Jeximo commented May 7, 2024

@Manamama thanks, your fix did help.

I found replacing the instruction with -DCMAKE_CXX_FLAGS:STRING=-march=armv8.4a also works.

@Jeximo Jeximo closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants