-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less than ideal GPU usage on high-res VR video #709
Comments
In short: pythons threads don't work like your typical windows thread, less is more and never trust the windows taskmanager performance scrren. As you already noticed, once VRAM is full, processing gets really slow, no matter how many threads you're using. And onboard standard RAM doesn't matter, your video card RAM does. |
Thank you for the reply. I understand that this issue looks like VRAM exhaustion, but that isn't happening here. I believe this is something else not represented in the general troubleshooting. I have tested thread numbers of 1,2,3,4,6,8,10,12,16,24,32. With 24GB RTX 4090, on a 5400x2700 VR video, I can run up to 12 threads initially without exhausting VRAM, but that's not guaranteed to last for the entire run, so I backed it off to 8 threads, to leave plenty of room. The screenshot above is ~36 hours in to the run with 4+GB free at all times. The difference in performance between 1-8 threads is barely noticeable, but 8 threads give me ~3-5% better performance. It fluctuates so much, it's hard to make that determination. At 1080p, I get around 30fps, and that can go as high as 40-60. At 2700p I get ~1 fps. That res is 7x higher, so I could image 4-6fps, and if the GPU was steady at 90+% usage, I wouldn't worry about it even at 1 fps. I've been able to recreate this behavior on two different systems. I wonder if you see something similar on hi-res VR video? |
Firstly, everything works as expected with both standard video and low-res VR video. In that case, there's barely any CPU usage, and nearly perfect 100% GPU usage the entire process. I'm running 'all faces', VR Mode and GFPGAN.
Things change at 5400x2700 or higher resolution. GPU usage fluctuates wildly from 0-100. I've tried playing with thread count from 1 - 32. Best performance around 8-10 threads, and things fall off a cliff at 12 threads at which point VRAM is exhausted, but the behavior is more or less the same no matter the thread count. I've had this problem for several months including the most recent 4.0 release.
At first, I thought perhaps it was a CPU bottleneck because the first system was an older 6 core processor, but now I have access to a AMD Epyc 9654 (96 cores, 12 channel, 768GB RAM), and the behavior is the same.
I'm not entirely sure where to dig in to uncover the bottleneck.
The text was updated successfully, but these errors were encountered: