[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only #7351

egeoz · 2024-05-17T20:07:58Z

I have managed to get Vulkan working in the Termux environment on my Samsung Galaxy S24+ (Exynos 2400 and Xclipse 940), and I have been experimenting with LLMs on LLama.cpp. While the performance improvement is excellent for both inference and processing, I am experiencing significantly higher RAM usage with Vulkan enabled, to the point where the device starts to aggressively swap out anything it can. The output is not garbled with Vulkan, so I do not think that the issue is with Vulkan drivers of my device. Since my phone is not rooted, I am also unable to see the memory usage of individual processes, but both instances were run with nothing in the background and right after one another.

Vulkan

Run command:
$ ./main -m ../models/gemma-1.1-2b-it-Q6_K.gguf -ngl 50 -c 4096 --no-mmap -i

Memory:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            10Gi       9.9Gi       203Mi       3.0Mi       915Mi       894Mi
Swap:          8.0Gi       1.6Gi       6.4Gi

Benchmark with -n 100:

llama_print_timings:        load time =    9958.81 ms
llama_print_timings:      sample time =      51.08 ms /   100 runs   (    0.51 ms per token,  1957.64 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)                                       
llama_print_timings:        eval time =    5877.33 ms /   100 runs   (   58.77 ms per token,    17.01 tokens per second)
llama_print_timings:       total time =    6266.68 ms /   100 tokens

CPU

Run command:
$ ./main -m ../models/gemma-1.1-2b-it-Q6_K.gguf -c 4096 --no-mmap -i

Memory:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            10Gi       6.0Gi       204Mi       8.0Mi       4.7Gi       4.7Gi
Swap:          8.0Gi       458Mi       7.6Gi

Benchmark with -n 100:

llama_print_timings:        load time =    1545.39 ms
llama_print_timings:      sample time =      14.47 ms /   100 runs   (    0.14 ms per token,  6912.76 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =   12535.73 ms /   100 runs   (  125.36 ms per token,     7.98 tokens per second)
llama_print_timings:       total time =   12666.80 ms /   100 tokens

Please let me know if I can provide any other information.

The text was updated successfully, but these errors were encountered:

smilingOrange · 2024-05-18T08:32:36Z

can you show the steps you used to get llama.cpp with Vulkan working in termux?

egeoz · 2024-05-19T10:02:39Z

can you show the steps you used to get llama.cpp with Vulkan working in termux?

I've downloaded the latest artifact from the following link, installed mesa-zink from tur-repo and enabled zink with GALLIUM_DRIVER=zink environment variable.
https://github.com/termux/termux-packages/actions?query=branch%3Adev%2Fsysvk++
Though, I suspect it only worked properly for me because of the Xclipse GPU. I recall seeing some issues here regarding Adreno Vulkan implementation.

Jeximo · 2024-05-20T15:58:06Z

I recall seeing some issues here regarding Adreno Vulkan implementation.

It's not implemented.

Related: #6395 (comment)

egeoz added the bug-unconfirmed label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only #7351

[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only #7351

egeoz commented May 17, 2024 •

edited

smilingOrange commented May 18, 2024

egeoz commented May 19, 2024

Jeximo commented May 20, 2024

[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only #7351

[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only #7351

Comments

egeoz commented May 17, 2024 • edited

Vulkan

CPU

smilingOrange commented May 18, 2024

egeoz commented May 19, 2024

Jeximo commented May 20, 2024

egeoz commented May 17, 2024 •

edited