We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With:
./Meta-Llama-3-8B-Instruct.Q2_K.llamafile -ngl 9999
Have this error at the first prompt, what ever I prompt:
import_cuda_impl: initializing gpu module... get_rocm_bin_path: note: amdclang++ not found on $PATH get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist get_rocm_bin_path: note: /opt/rocm/bin/amdclang++ does not exist get_rocm_bin_path: note: hipInfo not found on $PATH get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist get_rocm_bin_path: note: rocminfo not found on $PATH get_rocm_bin_path: note: $HIP_PATH/bin/rocminfo does not exist get_rocm_bin_path: note: /opt/rocm/bin/rocminfo does not exist get_amd_offload_arch_flag: warning: can't find hipInfo/rocminfo commands for AMD GPU detection llamafile_log_command: hipcc -O3 -fPIC -shared -DNDEBUG --offload-arch=native -march=native -mtune=native -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-return-type -Wno-unused-result -DGGML_USE_HIPBLAS -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DIGNORE4 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DIGNORE -o /home/yo/.llamafile/ggml-rocm.so.vadb38 /home/yo/.llamafile/ggml-cuda.cu -lhipblas -lrocblas hipcc: No such file or directory extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.so not found link_cuda_dso: note: dynamically linking /home/yo/.llamafile/ggml-cuda.so ggml_cuda_link: welcome to CUDA SDK with cuBLAS link_cuda_dso: GPU support loaded {"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2839,"msg":"build info","tid":"8545344","timestamp":1714201880} {"function":"server_cli","level":"INFO","line":2842,"msg":"system info","n_threads":6,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"8545344","timestamp":1714201880,"total_threads":12} llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from Meta-Llama-3-8B-Instruct.Q2_K.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = . llama_model_loader: - kv 2: llama.vocab_size u32 = 128256 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.block_count u32 = 32 llama_model_loader: - kv 6: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 8: llama.attention.head_count u32 = 32 llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 11: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 12: general.file_type u32 = 10 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q2_K: 129 tensors llama_model_loader: - type q3_K: 64 tensors llama_model_loader: - type q4_K: 32 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 256/128256 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q2_K - Medium llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 2.95 GiB (3.16 BPW) llm_load_print_meta: general.name = . llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' llm_load_print_meta: LF token = 128 'Ä' ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 1050 Ti, compute capability 6.1, VMM: yes llm_load_tensors: ggml ctx size = 0.22 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 164.39 MiB llm_load_tensors: CUDA0 buffer size = 2859.99 MiB ................................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA0 KV buffer size = 64.00 MiB llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 258.50 MiB llama_new_context_with_model: CUDA0 compute buffer size = 258.50 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 9.00 MiB llama_new_context_with_model: graph nodes = 1060 llama_new_context_with_model: graph splits = 2 {"function":"initialize","level":"INFO","line":481,"msg":"initializing slots","n_slots":1,"tid":"8545344","timestamp":1714201882} {"function":"initialize","level":"INFO","line":490,"msg":"new slot","n_ctx_slot":512,"slot_id":0,"tid":"8545344","timestamp":1714201882} {"function":"server_cli","level":"INFO","line":3060,"msg":"model loaded","tid":"8545344","timestamp":1714201882} llama server listening at http://127.0.0.1:8080 opening browser tab... (pass --nobrowser to disable) {"function":"server_cli","hostname":"127.0.0.1","level":"INFO","line":3183,"msg":"HTTP server listening","port":"8080","tid":"8545344","timestamp":1714201882} {"function":"validate_model_chat_template","level":"ERR","line":470,"msg":"The chat template comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses","tid":"8545344","timestamp":1714201882} {"function":"update_slots","level":"INFO","line":1619,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"8545344","timestamp":1714201882} Opening in existing browser session. {"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/","remote_addr":"127.0.0.1","remote_port":54622,"status":200,"tid":"17594341382800","timestamp":1714201882} {"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/completion.js","remote_addr":"127.0.0.1","remote_port":54628,"status":200,"tid":"17594341384672","timestamp":1714201882} {"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/json-schema-to-grammar.mjs","remote_addr":"127.0.0.1","remote_port":54636,"status":200,"tid":"17594335595984","timestamp":1714201882} {"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/index.js","remote_addr":"127.0.0.1","remote_port":54622,"status":200,"tid":"17594341382800","timestamp":1714201882} parse: error parsing grammar: expecting ::= at me how to draw a the mount Fuji. Detailed art, no color, clear weather. Then, do: - Create a html skeleton - Add a canvas HTML tag in the middle. Then - Read again your previous answer where your listed all steps to draw the mount Fuji. - For every steps, create the perfect code to draw exactly what is describe in the step. - Check if the draw is beautiful or not. llama_sampling_init: failed to parse grammar {"function":"launch_slot_with_data","level":"INFO","line":871,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"8545344","timestamp":1714201886} error: Uncaught SIGSEGV (SEGV_MAPERR) at 0x128 on Yocom pid 40372 tid 40372 ./Meta-Llama-3-8B-Instruct.Q2_K.llamafile No such file or directory Linux Cosmopolitan 3.3.3 MODE=x86_64; #1 SMP PREEMPT_DYNAMIC Wed Apr 10 20:11:08 UTC 2024 Yocom 6.6.26-1-MANJARO RAX 00001000814da5d0 RBX 00000000000007ec RDI 0000000000000000 RCX 0000000000000000 RDX 00000000000007ec RSI 0000100086c90010 RBP 00007ffd730c44f0 RSP 00007ffd730c4480 RIP 000000000056eaf0 R8 0000100080040000 R9 00001000814da900 R10 00001000814e9360 R11 0000000000000080 R12 0000000000000000 R13 0000000000000000 R14 00007ffd730c7ce8 R15 00007ffd730c7f10 TLS 0000000000704e40 XMM0 00000000000000000000000000000000 XMM8 00007fbc3662301800007fbc36623020 XMM1 00001000814da2a000001000814da2a0 XMM9 00007fbc3662302800007fbc36623030 XMM2 222c3137383a22656e696c222c224f46 XMM10 00007fbc3662303800007fbc36623040 XMM3 4e49223a226c6576656c222c22617461 XMM11 00007fbc3662304800007fbc36623050 XMM4 6e656577746562206e6f697461737265 XMM12 00007fbc3662305800007fbc36623060 XMM5 5f6b736174222c303a2264695f746f6c XMM13 00007fbc3662306800007fbc36623070 XMM6 61645f687469775f746f6c735f68636e XMM14 00007fbc3662307800007fbc36623080 XMM7 75616c223a226e6f6974636e7566227b XMM15 00000000000000000000000000000000 cosmoaddr2line /media/Qemu/Model/Meta-Llama-3-8B-Instruct.Q2_K.llamafile 56eaf0 48c6a3 48abc5 43fadc 401b81 410a03 4015fb 0x000000000056eaf0: ?? ??:0 0x000000000048c6a3: ?? ??:0 0x000000000048abc5: ?? ??:0 0x000000000043fadc: ?? ??:0 0x0000000000401b81: ?? ??:0 0x0000000000410a03: ?? ??:0 0x00000000004015fb: ?? ??:0 10008004-10008009 rw-pa- 6x automap 384kB w/ 128kB hole 1000800c-1000800f rw-Sa- 4x automap 256kB 10008010-1000801f rw-pa- 16x automap 1024kB 10008020-1000803f rw-Sa- 32x automap 2048kB 10008040-10008077 rw-pa- 56x automap 3584kB 10008078-10008083 rw-Sa- 12x automap 768kB w/ 8256kB hole 10008105-100086cb rw-pa- 1'479x automap 92mB w/ 3328kB hole 10008700-100087e6 rw-pa- 231x automap 14mB w/ 1927mB hole 1001005a-1001bef9 r--s-- 48'800x automap 3050mB w/ 1040mB hole 10020000-1002bd85 r--s-- 48'518x automap 3032mB w/ 96tB hole 6fc00004-6fc00004 rw-paF 1x nsync 64kB w/ 64gB hole 6fd00004-6fd0000f rw-paF 12x zipos 768kB w/ 64gB hole 6fe00004-6fe00004 rw-paF 1x g_fds 64kB # 6198mB total mapped memory ./Meta-Llama-3-8B-Instruct.Q2_K.llamafile -m Meta-Llama-3-8B-Instruct.Q2_K.gguf -ngl 9999 zsh: segmentation fault (core dumped) ./Meta-Llama-3-8B-Instruct.Q2_K.llamafile -ngl 9999```` Llamafile --version ```bash llamafile v0.8.1
Cuda version:
Version : 12.3.2-1 Description : NVIDIA's GPU programming toolkit Architecture : x86_64 URL : https://developer.nvidia.com/cuda-zone Licenses : LicenseRef-NVIDIA-CUDA Groups : None Provides : cuda-toolkit cuda-sdk libcudart.so=12-64 libcublas.so=12-64 libcublas.so=12-64 libcusolver.so=11-64 libcusolver.so=11-64 libcusparse.so=12-64 libcusparse.so=12-64
My machine:
System: Kernel: 6.6.26-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1 Desktop: GNOME v: 45.4 tk: GTK v: 3.24.41 Distro: Manjaro base: Arch Linux Machine: Type: Laptop System: HP product: HP Pavilion Gaming Laptop 15-cx0xxx Memory: System RAM: total: 32 GiB available: 31.24 GiB used: 4.16 GiB (13.3%) CPU: Info: model: Intel Core i7-8750H bits: 64 type: MT MCP arch: Coffee Lake gen: core 8 level: v3 note: Graphics: Device-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] vendor: Hewlett-Packard driver: nvidia v: 550.67 alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of 2024-04; EOL~2026-12-xx) arch: Pascal code: GP10x process: TSMC 16nm built: 2016-2021 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.0 chip-ID: 10de:1c8c class-ID: 0300
The text was updated successfully, but these errors were encountered:
No branches or pull requests
With:
Have this error at the first prompt, what ever I prompt:
Cuda version:
My machine:
System: Kernel: 6.6.26-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1 Desktop: GNOME v: 45.4 tk: GTK v: 3.24.41 Distro: Manjaro base: Arch Linux Machine: Type: Laptop System: HP product: HP Pavilion Gaming Laptop 15-cx0xxx Memory: System RAM: total: 32 GiB available: 31.24 GiB used: 4.16 GiB (13.3%) CPU: Info: model: Intel Core i7-8750H bits: 64 type: MT MCP arch: Coffee Lake gen: core 8 level: v3 note: Graphics: Device-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] vendor: Hewlett-Packard driver: nvidia v: 550.67 alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of 2024-04; EOL~2026-12-xx) arch: Pascal code: GP10x process: TSMC 16nm built: 2016-2021 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.0 chip-ID: 10de:1c8c class-ID: 0300
The text was updated successfully, but these errors were encountered: