Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU offloading doesn't seem to be working #384

Open
v4u6h4n opened this issue Apr 28, 2024 · 7 comments
Open

GPU offloading doesn't seem to be working #384

v4u6h4n opened this issue Apr 28, 2024 · 7 comments

Comments

@v4u6h4n
Copy link

v4u6h4n commented Apr 28, 2024

Hey everyone, awesome project :-) am having fun playing around with it, but I think my GPU isn't being utilised. I can see my CPU maxing out, and not seeing much of a change in my GPU usage, just wondering what the issue is. Here's the output in terminal:

/media/storage/Software/AI/Meta-Llama-3-70B-Instruct.Q4_0.llamafile -ngl 9999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: /opt/rocm/bin/amdclang++ does not exist
get_rocm_bin_path: note: hipInfo not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist
get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist
get_rocm_bin_path: note: rocminfo not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/rocminfo does not exist
get_rocm_bin_path: note: /opt/rocm/bin/rocminfo does not exist
get_amd_offload_arch_flag: warning: can't find hipInfo/rocminfo commands for AMD GPU detection
llamafile_log_command: hipcc -O3 -fPIC -shared -DNDEBUG --offload-arch=native -march=native -mtune=native -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-return-type -Wno-unused-result -DGGML_USE_HIPBLAS -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DIGNORE4 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DIGNORE -o /home/v4u6h4n/.llamafile/ggml-rocm.so.dhsn3g /home/v4u6h4n/.llamafile/ggml-cuda.cu -lhipblas -lrocblas
hipcc: Permission denied
extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.so not found
get_nvcc_path: note: nvcc not found on $PATH
get_nvcc_path: note: $CUDA_PATH/bin/nvcc does not exist
get_nvcc_path: note: /opt/cuda/bin/nvcc does not exist
get_nvcc_path: note: /usr/local/cuda/bin/nvcc does not exist
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
{"function":"server_params_parse","level":"WARN","line":2384,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"8545344","timestamp":1714335027}
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
{"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2839,"msg":"build info","tid":"8545344","timestamp":1714335027}
{"function":"server_cli","level":"INFO","line":2842,"msg":"system info","n_threads":16,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"8545344","timestamp":1714335027,"total_threads":32}
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q4_0.gguf (version GGUF V3 (latest))

...and my system specs:

OS: Arch Linux x86_64
Kernel: 6.8.7-arch1-2
CPU: AMD Ryzen 9 7950X3D (32) @ 5.759GHz
GPU: AMD ATI Radeon RX 7900 XT/7900 XTX/7900M
GPU: AMD ATI 13:00.0 Raphael
Memory: 14430MiB / 63427MiB
@ahonnecke
Copy link

ahonnecke commented Apr 29, 2024

Same here, Radeon Pro W5700

llava-v1.5-7b-q4.llamafile --version
llamafile v0.8.0

@ahonnecke
Copy link

@v4u6h4n
Copy link
Author

v4u6h4n commented Apr 29, 2024

relevant perhaps: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html

Hey :-)

Did it fix anything for you?

@ahonnecke
Copy link

Doesn't seem to have, but I'm not sure that it install properly.

@fcrisciani
Copy link

I was able to make it work by changing the base image of my container to FROM nvcr.io/nvidia/pytorch:24.03-py3

That base image is gigantic (~14.6 GB), so probably the best option would be to use docker multi stage build to extract nvcc and its dependencies.

@v4u6h4n
Copy link
Author

v4u6h4n commented May 7, 2024

@fcrisciani Unfortunately I am enough of an amateur linux user that I don't know what that means lol but happy you got it working ;-)

@fcrisciani
Copy link

I was referring to creating a docker image (https://docs.docker.com/engine/install/)

My Dockerfile looks like:

FROM nvcr.io/nvidia/pytorch:24.03-py3

RUN apt update && apt install -y wget

COPY start.sh /
RUN chmod +x /start.sh

CMD /start.sh

the start file looks like:

#!/bin/bash

echo "Download llamafile..."
wget https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile?download=true -O /tmp/llava-v1.5-7b-q4.llamafile

echo "Start serving the llamafile"
chmod +x /tmp/llava-v1.5-7b-q4.llamafile
/tmp/llava-v1.5-7b-q4.llamafile -ngl 999 --gpu nvidia --nobrowser --host 0.0.0.0

you can:

  1. install docker
  2. create a folder with the 2 files above: Dockerfile and start.sh
  3. build the container image: docker build -t my_gpu_test .
  4. run it: docker run --rm -it --gpus=all my_gpu_test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants