-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure safety of indirect dispatch #5714
base: trunk
Are you sure you want to change the base?
Conversation
Hmm, I'm not sure how we should deal with the |
Let's just require it in core, honestly. We could build a naga module in a build script, serialize it and store that in the binary, but that's wayyyy to much work for no real gain |
That's what I thought as well. Requiring it in That PR was prefaced with:
This was also mentioned in #2549 (comment):
@daxpedda were the benefits of #2890 substantial in terms of compile time and binary size? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our office hours meeting this morning, we came up with a better way to test, by requesting a device with a lower limit than the adapter offers. Testing this way would give a more positive indication that the limit is being imposed.
I just tried a minimal example when using GLSL shaders on Wasm with WebGL.
Compile times are similar. Total compile time with O3 goes from 12.21s to 13.48s, so again around a 10% increase. Keep in mind that this is a minimal example and might not be representative for a full blown application. |
I would say 10% is significant especially for web apps but I'd be curious how much it decreases with larger applications. It's unfortunate though that since WebGL doesn't support indirect calls, this increase doesn't bring anything of value for this use case. Maybe we can feature gate the validation on the The downside being that |
018b23b
to
a5bebb0
Compare
@jimblandy I lowered the limit required to 10 since the testing infra is not setup to be able to require limits based on what the adapter supports. I think 10 should be safe, the default value is 65535. |
ac3f089
to
36281af
Compare
I added another test for the |
I wanted to get CI working so I just implemented #5714 (comment) for now. |
Filed #5739 for the metal issue which I've worked around in this PR. |
The And is because the internal call to I was hoping to get away with calling |
b57350e
to
e0bd41a
Compare
Actually, I don't think the test is correct. Shouldn't our The The test was recently added in #5570.
@Wumpf could you chime in on this? Did we use to crash or raise a validation error? |
No, drop just removes the user's handle to it, doesn't make access to the resource invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts so far:
I see, I will try to convert the code doing bindgroup creation to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance/code simplification thoughts.
Dispatches are unfortunately not trivial to validate all at once :(
wgpu-core/src/command/compute.rs
Outdated
let (dst_buffer_id, error) = self.device_create_buffer::<A>( | ||
device_id, | ||
&crate::resource::BufferDescriptor { | ||
label: None, | ||
size: 4 * 3, | ||
usage: wgt::BufferUsages::INDIRECT | wgt::BufferUsages::STORAGE, | ||
mapped_at_creation: false, | ||
}, | ||
None, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a buffer and bind group every dispatch is going to be very expensive. In this case we can make a single buffer/bind group per queue and a single bind group per indirect command buffer. We'd trade one bind group for two, but mean we never have to create resources at runtime.
I really wish we had root-uavs/bda so we didn't need a bind group here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a single dst buffer (with its own bind group) per queue become a bottleneck in cases where a user issues 2 dispatchIndirect
commands back to back each with their own indirect buffer? By only having one dst buffer, the 2 internal dispatche
s we do for validation can't run in parallel and I think nor can the 2nd dispatchIndirect
start before the first dispatchIndirect
finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this at our maintainers meeting and decided to go with Connor's suggestion. My concerns about not being able to run dispatches in parallel were not well-founded as most (if not all) IHVs will treat buffer barriers as "wait for idle" anyway.
by injecting a compute shader that validates the content of the indirect buffer also adds missing indirect buffer offset validation
cf022cb
to
d25ed4b
Compare
Ensure safety of indirect dispatch by injecting a compute shader that validates the content of the indirect buffer.
Part of #2431.