You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2023-06-08-shader-converter.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ The GPU ecosystem exists at the knife edge of being strangled by complexity. A b
18
18
19
19
The widespread use of shader translation makes the situation even worse. When writing HLSL that will be translated into other shader languages, it's no longer sufficient to consider [Shader Model 5] to be a baseline, but rather the developer needs to keep in mind all the features that don't translate to other languages. In some cases, the semantics change subtly (the rules for the various flavors "count leading zeros" when the input is 0 vary), and in other cases, like these device scoped barriers.
20
20
21
-
A separate category is things technically forbidden by the spec, but expected to work in practice. A good example here is the mixing of atomic and non-atomic memory operations (see [gpuweb#2229]). The spirv-cross shader translation tool casts non-atomic pointers to atomic pointers to support this common pattern, which is technically undefined behavior in C++, but in practice lots of people would be unhappy if the Metal shader compiler did anything other than the reasonable thing. Since Metal's semantics are based on C++, I'd personally love to see this resolved by adopting std::atomic_ref from C++20 (Metal is still based on C++14). I'll also note that the official Metal shader compiler tool generates [reasonable IR] for this pattern. It's concerning that using open source tools such as spirv-cross triggers technical undefined behavior, but it's probably not a big problem in practice.
21
+
A separate category is things technically forbidden by the spec, but expected to work in practice. A good example here is the mixing of atomic and non-atomic memory operations (see [gpuweb#2229]). The spirv-cross shader translation tool casts non-atomic pointers to atomic pointers to support this common pattern, which is technically undefined behavior in C++, but in practice lots of people would be unhappy if the Metal shader compiler did anything other than the reasonable thing. Since Metal's semantics are based on C++, I'd personally love to see this resolved by adopting [std::atomic_ref] from C++20 (Metal is still based on C++14). I'll also note that the official Metal shader compiler tool generates [reasonable IR] for this pattern. It's concerning that using open source tools such as spirv-cross triggers technical undefined behavior, but it's probably not a big problem in practice.
22
22
23
23
I understand the incentives, but overall I find it disappointing that Metal chases shiny new features like ray-tracing, while failing to provide a solid, spec-compliant foundation for GPU compute.
24
24
@@ -32,6 +32,8 @@ For two, we can cheer on the work of Asahi Linux. They have recently announced [
32
32
33
33
I have a recommendations for Apple as well. I hope that they document which HLSL features are expected to work and which are not. Currently in their documentation (which is admittedly beta), it just says "Some features not supported," which I personally find not very useful. I would also like to give them credit for clarifying the [Metal Shading Language Specification] with respect to the scope of the `mem_device` flag to `threadgroup_barrier`. It now says, "The flag ensures the GPU correctly orders the memory operations to device memory for threads in the threadgroup," which to a very careful reader does indicate threadgroup scope and no guarantee at device scope. Previously it [said][gpuweb#2297] "Ensure correct ordering of memory operations to device memory," which could easily be misinterpreted as providing a device scope guarantee.
34
34
35
+
I am optimistic in the long term about having really good, portable infrastructure for GPU compute, but it is clear that we have a long way to go.
0 commit comments