-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable OLLAMA Arc GPU support with SYCL backend #3796
base: main
Are you sure you want to change the base?
Conversation
In the current state, this code always seems to use the CPU for inference on my system. I followed the build instructions from the other pull request. It seems to detect the GPU and prints out some relevant messages, but doesn't actually use it. In addition to my dGPU, I also have a level-zero compatible iGPU, disabling it in the BIOS did not help. Compiling Llama.cpp from source with SYCL support works fine on the same system, as well as ComfyUI using IPEX for acceleration. Arch Linux
To get this working, I needed to do two things:
Here's the full console output with debug enabled with and without the change: Debug output with SYCL acceleration working after removing elif
Debug output before removing the elif, resulting in running on CPU (I also had it print out `memInfo` and `memInfo.count` before the other print line)
So this detection of compatible integrated GPUs doesn't seem to be working as intended. In my case, it was excluding a dGPU. Let me know if there are any extra commands I can run to help debug this. |
Sorry, I'm an idiot and I was accidentally comparing a different quantization of the model. This is probably not important. |
@mlim15 Yes, I could reproduce the problem with an Intel i5 and Arc A770 GPU. I previously tested with the AMD + Arc machine. I will work on both issues and run a benchmark. |
- Update the Intel oneAPI base image version to 2024.0.1 - Copy oneAPI build artifacts to the correct path in the final image - Update gen_linux.sh to build in ../build/linux instead of llama.cpp/build - Remove -g flag when building with Intel oneAPI compiler
I managed to fix the Linux build script. The oneAPI libraries are too large to be bundled in, so the libraries need to be in the runtime environment. required to run before running the server source /opt/intel/oneapi/setvars.sh I haven't tested this for Arm64 as I don't have a device to test it on. SYCL log:
I ran a benchmark, and these are the results I got.
|
Hello, I’ve tested the branch in my debian environment, and ollama fails to run a model. I’m using Intel Arc 770 16Gb
This may be something related with the debian library, or specific for my environment, but I prefer to report it now in case this could hide a more global issue. |
@Chimrod, for some reason, llama.cpp is not picking up the GPUs. I'm currently looking into it and see what I can do |
Hello, the latest commit gives me segfault error when I start I do not know the go language, but tell me if I need to activate any switch during the compilation or any trace in the code to grasp the cause of this. |
- Fix GPU ID formatting issue in oneapi_check_vram function - Update GPU detection logic for oneapi devices - Update gen_linux.sh to remove LLAMA_SYSCTL_F16 default flag
@Chimrod I only used the Docker build for testing. Are you using Debian or a variant of RHEL?
build source /opt/intel/oneapi/setvars.sh
export BUILD_ARCH=amd64
./scripts/build_linux.sh run source /opt/intel/oneapi/setvars.sh
export ZES_ENABLE_SYSMAN=1 #optional
./dist/ollama-linux-amd64 serve |
Hi, I'm the original author of this patch, thanks for continuing to work on it! I'm finally able to play with this stuff again, so let's see if we can get the kinks figured out! Just to note, I'm trying this on a fresh Ubuntu 24.04 install, using oneAPI installed through Intel's APT repository (when I was first working on this, I was using arch linux and it seemed to work fine there, so now I want to make sure it works on Ubuntu). I pointed out a small mistake that was causing a segfault. With that fixed, I'm now getting this error when llama.cpp tries to load the model:
Currently looking into this. |
@felipeagc, That's great that you're able to work on this again. I'm on ubuntu 24.04 as well. I took a detour on the original work you did. I'm new to oneAPI and c. if you feel that my approach might not be the right one, please feel free to incorporate any relevant code to your pr. |
"I'm currently attempting to execute the code on an Intel i7 1185g7 processor running Arch Linux. It seems that SYCL_DEVICE_ALLOWLIST isn't being detected properly, if my understanding is correct. The error message Update. |
I've been able to build the Docker image. The Arc A370M dGPU was detected (the iGPU of my i7-12700H was too) but I'm having hard time running any models. Log
I've tried EDIT: Here's a recording of me trying to run Here is a asciicast file in case the player breaks or the recording is deleted from the website EDIT: From using |
Why is the integrated GPU skipped? I'm trying to run ollama on a
Running the llama2 example with
|
@jiriks74 I haven't dived into the Docker builds after the recent changes I made; it looks like library paths are not properly set. I plan to get the Docker images working next. |
@tristan-k The log messages are from one of the previous iterations of this implementation. Could you check if you have the latest code from this PR? |
@gamunu It looks like You can take a look how it basically crashes the same way and how it works with the iGPU but once the dGPU is passed it breaks down even if the iGPU is still present. |
Thanks. Your main branch works fine on a bare metal Ubuntu 22.04 server but for some reasons fails to pick up the integrated GPU on a Ubuntu 22.04 LXC in Proxmox 8. With the LXC it falls back to
Running the llama2 example with
|
I was slightly confused. I presume @felipeagc managed to get the changes to the main? @dhiltgen We may have to do some refactoring.
|
@dhiltgen, it may be good the credit the original author of the changes in the release notes, that pr doesn't reflect the commit history of @felipeagc |
This is based on the original PR created by @felipeagc:main #2458.
It seems that the work on that pull request has come to a halt. I would like to work on this project in the next few days and accelerate the progress. I have tested the build with Ubuntu LTS and GPU Arc770.
I'm happy to progress the PR with the community feedback.