New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
main Segfault using cmake & -march=armv8.4a flag #6990
Labels
Comments
Dunno if it helps, but it has always (+4 months) worked and compiled on my box, very similar:
If smth crashes like in yours, I recompile with this one-liner:
|
@Manamama thanks, your fix did help. I found replacing the instruction with |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
main
crashes withDCMAKE_C_FLAGS=-march=armv8.4a
flag. Here's the trace:build/run log
cmake -B build -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod+i8mm && cd build && cmake --build . --config Release --target server --target main && cd bin/
-- The C compiler identification is Clang 18.1.4
-- The CXX compiler identification is Clang 18.1.4
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /data/data/com.termux/files/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /data/data/com.termux/files/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.44.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (3.2s)
-- Generating done (0.3s)
-- Build files have been written to: /data/data/com.termux/files/home/llama.cpp/build
[ 6%] Generating build details from Git
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.44.0")
[ 12%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 12%] Built target build_info
[ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o
/data/data/com.termux/files/home/llama.cpp/ggml.c:1564:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1564 | GGML_F16_VEC_REDUCE(sumf, sum);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:984:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
984 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:974:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
974 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:904:11: note: expanded from macro 'GGML_F32x4_REDUCE'
904 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:889:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
889 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:1612:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1612 | GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:984:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
984 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:974:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
974 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/data/data/com.termux/files/home/llama.cpp/ggml.c:904:11: note: expanded from macro 'GGML_F32x4_REDUCE'
904 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/llama.cpp/ggml.c:889:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
889 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
2 warnings generated.
[ 18%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 25%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[ 25%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3412:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3412 | const block_q4_0 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3415:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3415 | const block_q8_0 * restrict vy1 = vy + by;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3779:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3779 | const block_q4_1 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:3781:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
3781 | const block_q8_1 * restrict vy1 = vy + by;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:4592:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
4592 | const block_q8_0 * restrict vx1 = vx + bx;
| ~~ ^
/data/data/com.termux/files/home/llama.cpp/ggml-quants.c:4594:46: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
4594 | const block_q8_0 * restrict vy1 = vy + by;
| ~~ ^
6 warnings generated.
[ 31%] Building CXX object CMakeFiles/ggml.dir/sgemm.cpp.o
[ 31%] Built target ggml
[ 31%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o
[ 37%] Building CXX object CMakeFiles/llama.dir/unicode.cpp.o
[ 43%] Building CXX object CMakeFiles/llama.dir/unicode-data.cpp.o
[ 43%] Linking CXX static library libllama.a
[ 43%] Built target llama
[ 43%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 50%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 56%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 56%] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 62%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 68%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 68%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 75%] Linking CXX static library libcommon.a
[ 75%] Built target common
[ 81%] Generating json-schema-to-grammar.mjs.hpp
[ 87%] Generating completion.js.hpp
[ 93%] Generating index.html.hpp
[ 93%] Generating index.js.hpp
[ 93%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
[100%] Linking CXX executable ../../bin/server
[100%] Built target server
[ 15%] Built target build_info
[ 38%] Built target ggml
[ 53%] Built target llama
[ 92%] Built target common
[ 92%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/main
[100%] Built target main
./main -m ~/Meta-Llama-3-8B-Instruct-IQ3_M.gguf -i --color --penalize-nl -e --temp 0 -t 4 -b 7 -c 2048 -r "<|eot_id|>" --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n"
Log start
main: build = 2768 (b8c1476)
main: built with clang version 18.1.4 for aarch64-unknown-linux-android24
main: seed = 1714423770
llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1:
general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10:
general.file_type u32 = 27
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - kv 22: quantize.imatrix.file str = /models/Meta-Llama-3-8B-Instruct-GGUF...
llama_model_loader: - kv 23: quantize.imatrix.dataset str = /training_data/groups_merged.txt
llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 224
llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 88
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 68 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_loader: - type iq3_s: 157 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = IQ3_S mix - 3.66 bpw
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 3.52 GiB (3.76 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size = 0.15 MiB
llm_load_tensors: CPU buffer size = 3602.02 MiB
.....................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 7
llama_new_context_with_model: n_ubatch = 7
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 3.53 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
fish: Job 1, './main -m ~/Meta-Llama-3-8B-Ins…' terminated by signal SIGILL (Illegal instruction)
main.log
[1714423770] Log start [1714423770] Cmd: ./main -m /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf -i --color --penalize-nl -e --temp 0 -t 4 -b 7 -c 2048 -r <|eot_id|> --in-prefix \n<|start_header_id|>user<|end_header_id|>\n\n --in-suffix <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n"
[1714423770] main: build = 2768 (b8c1476)
[1714423770] main: built with clang version 18.1.4 for aarch64-unknown-linux-android24
[1714423770] main: seed = 1714423770
[1714423770] main: llama backend init
[1714423770] main: load the model and apply lora adapter, if any
[1714423770] llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /data/data/com.termux/files/home/Meta-Llama-3-8B-Instruct-IQ3_M.gguf (version GGUF V3 (latest))
[1714423770] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1714423770] llama_model_loader: - kv 0: general.architecture str = llama
[1714423770] llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
[1714423770] llama_model_loader: - kv 2: llama.block_count u32 = 32
[1714423770] llama_model_loader: - kv 3: llama.context_length u32 = 8192
[1714423770] llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
[1714423770] llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
[1714423770] llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
[1714423770] llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
[1714423770] llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
[1714423770] llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[1714423770] llama_model_loader: - kv 10: general.file_type u32 = 27
[1714423770] llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
[1714423770] llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
[1714423770] llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
[1714423770] llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
[1714423771] llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
[1714423771] llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1714423771] llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
[1714423771] llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
[1714423771] llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
[1714423771] llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
[1714423771] llama_model_loader: - kv 21: general.quantization_version u32 = 2
[1714423771] llama_model_loader: - kv 22: quantize.imatrix.file str = /models/Meta-Llama-3-8B-Instruct-GGUF...
[1714423771] llama_model_loader: - kv 23: quantize.imatrix.dataset str = /training_data/groups_merged.txt
[1714423771] llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 224
[1714423771] llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 88
[1714423771] llama_model_loader: - type f32: 65 tensors
[1714423771] llama_model_loader: - type q4_K: 68 tensors
[1714423771] llama_model_loader: - type q6_K: 1 tensors
[1714423771] llama_model_loader: - type iq3_s: 157 tensors
[1714423772] llm_load_vocab: special tokens definition check successful ( 256/128256 ).
[1714423772] llm_load_print_meta: format = GGUF V3 (latest)
[1714423772] llm_load_print_meta: arch = llama
[1714423772] llm_load_print_meta: vocab type = BPE
[1714423772] llm_load_print_meta: n_vocab = 128256
[1714423772] llm_load_print_meta: n_merges = 280147
[1714423772] llm_load_print_meta: n_ctx_train = 8192
[1714423772] llm_load_print_meta: n_embd = 4096
[1714423772] llm_load_print_meta: n_head = 32
[1714423772] llm_load_print_meta: n_head_kv = 8
[1714423772] llm_load_print_meta: n_layer = 32
[1714423772] llm_load_print_meta: n_rot = 128
[1714423772] llm_load_print_meta: n_embd_head_k = 128
[1714423772] llm_load_print_meta: n_embd_head_v = 128
[1714423772] llm_load_print_meta: n_gqa = 4
[1714423772] llm_load_print_meta: n_embd_k_gqa = 1024
[1714423772] llm_load_print_meta: n_embd_v_gqa = 1024
[1714423772] llm_load_print_meta: f_norm_eps = 0.0e+00
[1714423772] llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[1714423772] llm_load_print_meta: f_clamp_kqv = 0.0e+00
[1714423772] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1714423772] llm_load_print_meta: f_logit_scale = 0.0e+00
[1714423772] llm_load_print_meta: n_ff = 14336
[1714423772] llm_load_print_meta: n_expert = 0
[1714423772] llm_load_print_meta: n_expert_used = 0
[1714423772] llm_load_print_meta: causal attn = 1
[1714423772] llm_load_print_meta: pooling type = 0
[1714423772] llm_load_print_meta: rope type = 0
[1714423772] llm_load_print_meta: rope scaling = linear
[1714423772] llm_load_print_meta: freq_base_train = 500000.0
[1714423772] llm_load_print_meta: freq_scale_train = 1
[1714423772] llm_load_print_meta: n_yarn_orig_ctx = 8192
[1714423772] llm_load_print_meta: rope_finetuned = unknown
[1714423772] llm_load_print_meta: ssm_d_conv = 0
[1714423772] llm_load_print_meta: ssm_d_inner = 0
[1714423772] llm_load_print_meta: ssm_d_state = 0
[1714423772] llm_load_print_meta: ssm_dt_rank = 0
[1714423772] llm_load_print_meta: model type = 8B
[1714423772] llm_load_print_meta: model ftype = IQ3_S mix - 3.66 bpw
[1714423772] llm_load_print_meta: model params = 8.03 B
[1714423772] llm_load_print_meta: model size = 3.52 GiB (3.76 BPW)
[1714423772] llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
[1714423772] llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
[1714423772] llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
[1714423772] llm_load_print_meta: LF token = 128 'Ä'
[1714423772] llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
[1714423772] llm_load_tensors: ggml ctx size = 0.15 MiB
[1714423776] llm_load_tensors: CPU buffer size = 3602.02 MiB
[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776] .[1714423776]
[1714423776] llama_new_context_with_model: n_ctx = 2048
[1714423776] llama_new_context_with_model: n_batch = 7
[1714423776] llama_new_context_with_model: n_ubatch = 7
[1714423776] llama_new_context_with_model: freq_base = 500000.0
[1714423776] llama_new_context_with_model: freq_scale = 1
[1714423776] llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
[1714423776] llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
[1714423776] llama_new_context_with_model: CPU output buffer size = 0.49 MiB
[1714423776] llama_new_context_with_model: CPU compute buffer size = 3.53 MiB
[1714423776] llama_new_context_with_model: graph nodes = 1030
[1714423776] llama_new_context_with_model: graph splits = 1
[1714423776] warming up the model with an empty run
make
builds/runs as expected. Also,cmake
works by removing-DCMAKE_C_FLAGS=-march=armv8.4a
. Finally,-DLLAMA_SANITIZE_ADDRESS=ON
allows me to build/run including all flags, but that's less than ideal.Thanks.
The text was updated successfully, but these errors were encountered: