DXC Cookbook: HLSL Coding Patterns for SPIR-V ============================================= Author: Steven Perron Date: Oct 22, 2018 Introduction ============ This document provides a set of examples that demonstrate what will and will not be accepted by the DXC compiler when generating SPIR-V. The difficulty in defining what is acceptable is that it cannot be specified by a grammar. The entire program must be taken into consideration. Hopefully this will be useful. We are interested in how global resources are used. For a SPIR-V shader to be valid, accesses to global resources like structured buffers and images must be done directly on the global resources. They cannot be copied or have their address returned from functions. However, in HLSL, it is possible to copy a global resource or to pass it by reference to a function. Since this can be arbitrarily complex, DXC can generate valid SPIR-V only if the compiler is able to remove all of these copies. The transformations that are used to remove the copies will be the same for both structured buffers and images, so we have chosen to focus on structured buffer. The process of transforming the code in this way is called *legalization.* Support evolves over time as the optimizations in SPIRV-Tools are improved. At GDC 2018, Greg Fischer from LunarG `presented `__ earlier results in this space. The DXC, Glslang, and SPIRV-Tools maintainers work together to handle new HLSL code patterns. This document represents the state of the DXC compiler in October 2018. Glslang does legalization as well. However, what it is able to legalize is different from DXC because of features it chooses to support, and the optimizations from SPIRV-Tools it choose to run. For example, Glslang does not support structured buffer aliasing yet, so many of these examples will not work with Glslang. All of the examples are available in the DXC repository, at https://github.com/Microsoft/DirectXShaderCompiler/tree/main/tools/clang/test/CodeGenSPIRV/legal-examples . To open a link to Tim Jones' Shader Playground for an example, you can follow the url in the comments of each example. Examples for structured buffers =============================== Desired code ------------ .. code-block:: hlsl // 0-copy-sbuf-ok.hlsl // http://shader-playground.timjones.io/e6af2bdce0c61ed07d3a826aa8a95d45 struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; void main() { gRWSBuffer[i] = gSBuffer[i]; } This example shows code that directly translates to valid SPIR-V. In this case, we have two structured buffers. When one of their elements is accessed, it is done by naming the resource from which to get the element. Note that it is fine to copy an element of the structured buffer. Single copy to a local ---------------------- Cases that can be easily legalized are those where there is exactly one assignment to the local copy of the structured buffer. In this context, a local is either a global static or a function scope symbol. Something that can be accessed by only a single instance of the shader. When you have a single copy to a local, it is obvious which global is actually be used. This allows the compiler to replace a reference to the local symbol with the global resource. Initialization of a static ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: hlsl // 1-copy-global-static-ok.hlsl // http://shader-playground.timjones.io/815543dc91a4e6855a8d0c6a345d4a5a struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; static StructuredBuffer sSBuffer = gSBuffer; void main() { gRWSBuffer[i] = sSBuffer[i]; } This example shows an implicitly addressed structured buffer ``gSBuffer`` assigned to a static ``sSBuffer``. This copy is treated like a shallow copy. This is implemented by making ``sSBuffer`` a pointer to ``gSBuffer``. This example can be legalized because the compiler is able to see that ``sSbuffer`` is points to ``gSBuffer``, which does not move, so uses of ``sSbuffer`` can be replaced by ``gSBuffer``. .. code-block:: hlsl // 2-write-global-static-ok.hlsl // http://shader-playground.timjones.io/1c65c467e395383945d219a60edbe10c struct S { float4 f; }; int i; RWStructuredBuffer gRWSBuffer; static RWStructuredBuffer sRWSBuffer = gRWSBuffer; void main() { sRWSBuffer[i].f = 0.0; } This example is similar to the previous example, except in this case the shallow copy becomes important. ``sRWSBuffer`` is treated like a pointer to ``gRWSBuffer``. As before, the references to ``sRWSBuffer`` can be replaced by ``gRWSBuffer``. This means that the write that occurs will be visible outside of the shader. Copy to function scope ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: hlsl // 3-copy-local-struct-ok.hlsl // http://shader-playground.timjones.io/77dd20774e4943044c2f1b630c539f07 struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; void main() { CombinedBuffers cb; cb.SBuffer = gSBuffer; cb.RWSBuffer = gRWSBuffer; cb.RWSBuffer[i] = cb.SBuffer[i]; } It is also possible to copy a structured buffer to a function scope symbol. This is similar to a copy to a static scope symbol. The local copy is really a pointer to the original. This example demonstrates that DXC can legalize the copy even if it is a copy to part of a structure. There are no specific restrictions on the structure. The structured buffers can be anywhere in the structure, and there can be any number of members. Structured buffers can be in nested structures of any depth. The following is a move complicated example. .. code-block:: hlsl // 4-copy-local-nested-struct-ok.hlsl // http://shader-playground.timjones.io/14f59ff2a28c0a0180daf6ce4393cf6b struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; struct S2 { CombinedBuffers cb; }; struct S1 { S2 s2; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; void main() { S1 s1; s1.s2.cb.SBuffer = gSBuffer; s1.s2.cb.RWSBuffer = gRWSBuffer; s1.s2.cb.RWSBuffer[i] = s1.s2.cb.SBuffer[i]; } Function parameters ~~~~~~~~~~~~~~~~~~~ .. code-block:: hlsl // 5-func-param-sbuf-ok.hlsl // http://shader-playground.timjones.io/aeb06f527c5390d82d63bdb4eafc9ae7 struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; void foo(StructuredBuffer pSBuffer) { gRWSBuffer[i] = pSBuffer[i]; } void main() { foo(gSBuffer); } It is possible to pass a structured buffer as a parameter to a function. As with the copies in the previous section, it is a pointer to the structured buffer that is actually being passed to ``foo``. This is the same way that arrays work in C/C++. .. code-block:: hlsl // 6-func-param-rwsbuf-ok.hlsl // http://shader-playground.timjones.io/f4e0194ce78118c0a709d85080ccea93 struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; void foo(RWStructuredBuffer pRWSBuffer) { pRWSBuffer[i] = gSBuffer[i]; } void main() { foo(gRWSBuffer); } The same is true for RW structured buffers. So in this case, the write to ``pRWSBuffer`` is changing ``gRWSBuffer``. This means that the write to ``pRWSBuffer`` will be visible outside of the function, and outside of the shader. Return values ~~~~~~~~~~~~~ The next two examples show that structured buffers can be a function's return value. As before, the return value of ``foo`` is really a pointer to the global resource. .. code-block:: hlsl // 7-func-ret-tmp-var-ok.hlsl // http://shader-playground.timjones.io/d6b706423f02dad58fbb01841282c6a1 struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; RWStructuredBuffer foo() { return gRWSBuffer; } void main() { RWStructuredBuffer lRWSBuffer = foo(); lRWSBuffer[i] = gSBuffer[i]; } | In this case, the compiler will replace ``lRWSBuffer`` by ``gRWSBuffer``. .. code-block:: hlsl // 8-func-ret-direct-ok.hlsl // http://shader-playground.timjones.io/6edbbc1aa6c6b6533c5a728135f87fb9 struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer; StructuredBuffer foo() { return gSBuffer; } void main() { gRWSBuffer[i] = foo()[i]; } This example is similar to the previous, but shows that you do not have to use an explicit temporary value. Conditional control flow ------------------------ The examples so far have do not have any conditional control flow. This makes it obvious which resources are being used. The introduction of conditional control flow makes the job of the compiler much harder, and in some cases impossible. Remember that the compiler is trying to determine at compile time which resource will be used at run time. In this section, we will look at how control flow affects the compiler's ability to do this. The bottom line is that the compiler has to be able to turn all of the conditional control flow that affects which resources are used into straight line code. Inputs in if-statement ~~~~~~~~~~~~~~~~~~~~~~ The first example is one where the compiler cannot determine which resource is actually being accessed. .. code-block:: hlsl // 9-if-stmt-select-fail.hlsl // http://shader-playground.timjones.io/2896e95627fd8a6689ca96c81a5c7c68 struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; if (constant > i) { // Condition can't be computed at compile time. lSBuffer = gSBuffer1; // Will produce invalid SPIR-V for Vulkan. } else { lSBuffer = gSBuffer2; } gRWSBuffer[i] = lSBuffer[i]; } In this example, ``lsBuffer`` could be either ``gSBuffer1`` or ``gSBuffer2``. It depends on the value of ``i`` which is a parameter to the shader and cannot be known at compile time. At this time, the compiler is not able to convert this code into something that drivers will accept. If this is the pattern that your code, I would suggest rewriting the code into the following: .. code-block:: hlsl // 10-if-stmt-select-ok.hlsl // http://shader-playground.timjones.io/5063d8a0a7ad1f9d0839cd34a6d94dd2 struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; if (constant > i) { lSBuffer = gSBuffer1; gRWSBuffer[i] = lSBuffer[i]; } else { lSBuffer = gSBuffer2; gRWSBuffer[i] = lSBuffer[i]; } } Notice that this involves replicating code. If the code that follows the if-statement is long, you could consider moving it to a function, and having two calls to that function. If-statements with constants ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Not all control flow is a problem. There are situations where the compiler is able to determine that a condition is always true or always false. For example, in the following code, the compiler looks at "0>2", and knows that is always false. .. code-block:: hlsl // 11-if-stmt-const-ok.hlsl // http://shader-playground.timjones.io/7ef5b89b3ec3d56c22e1bca45b40516a struct S { float4 f; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; if (constant > 2) { lSBuffer = gSBuffer1; } else { lSBuffer = gSBuffer2; } gRWSBuffer[i] = lSBuffer[i]; } The compiler will turn this code into .. code-block:: hlsl struct S { float4 f; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { gRWSBuffer[i] = gSBuffer2[i]; } The two previous examples show that handling control flow depends on what the compiler can do. This depends on the amount of optimization that is done, and which optimizations are done. In general, when you are writing code that will select a resource, keep the conditions as simple as possible to make it as easy as possible for the compiler to determine which path is taken. Switch statements ~~~~~~~~~~~~~~~~~ Switch statements are similar to if-statements. If the selector is a constant, then the compiler will be able to propagate the copies. .. code-block:: hlsl // 12-switch-stmt-select-fail.hlsl // http://shader-playground.timjones.io/b079f878daeba5d77842725b90a476ca struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; switch(i) { // Compiler can't determine which case will run. case 0: lSBuffer = gSBuffer1; // Will produce invalid SPIR-V for Vulkan. break; default: lSBuffer = gSBuffer2; break; } gRWSBuffer[i] = lSBuffer[i]; } The compiler is not able to remove the copies in this example because it does not know the value of ``i`` at compile time. .. code-block:: hlsl // 13-switch-stmt-const-ok.hlsl // http://shader-playground.timjones.io/a46dd1f1a84eba38c047439741ec08ab struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; const static int constant = 0; void main() { StructuredBuffer lSBuffer; switch(constant) { case 0: lSBuffer = gSBuffer1; break; default: lSBuffer = gSBuffer2; break; } gRWSBuffer[i] = lSBuffer[i]; } However, if the selector is turned into a constant, the compiler can replace uses of ``lSBuffer`` by ``gSBuffer1``. Loop Induction Variables in conditions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Besides inputs, another type of variable that hinders the compiler are loop induction variables. These are variables that change value for each iteration of the loop. Consider this example. .. code-block:: hlsl // 14-loop-var-fail.hlsl // http://shader-playground.timjones.io/8df364770e3f425e6321e71f817bcd1a struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; for( int j = 0; j < 2; j++ ) { if (constant > j) { // Condition is different for different iterations lSBuffer = gSBuffer1; // Will produces invalid SPIR-V for Vulkan. } else { lSBuffer = gSBuffer2; } gRWSBuffer[j] = lSBuffer[j]; } } In this example, ``j`` is an induction variable. It takes on the values ``0`` and ``1``. The information is there to be able to determine which path is taken in each iteration, but the compiler does not figure this out by default. If you want the compiler to be able to legalize this code, then you will have to direct the compiler to unroll this loop using the unroll attribute. The following example can be legalized by the compiler: .. code-block:: hlsl // 15-loop-var-unroll-ok.hlsl // http://shader-playground.timjones.io/3d0f6f830fc4a5102714e19c748e81c7 struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; [unroll] for( int j = 0; j < 2; j++ ) { if (constant > j) { lSBuffer = gSBuffer1; } else { lSBuffer = gSBuffer2; } gRWSBuffer[j] = lSBuffer[j]; } } Variable iteration counts ~~~~~~~~~~~~~~~~~~~~~~~~~ Adding the unroll attribute to loops does not guarantee that the compiler is able to legalize the code. The compiler has to be able to fully unroll the loop. That means the compiler will have to create a copy of the body of the loop for each iteration so that there is no loop anymore. That can only be done if the number of iterations can be known at compile time. This means that the compiler must be able to determine the initial value, the final value, and the step for the induction variable, ``j`` in the example. None of ``foo1``, ``foo2``, or ``foo3`` can be legalized because the number of iterations cannot be known at compile time. .. code-block:: hlsl // 16-loop-var-range-fail.hlsl // http://shader-playground.timjones.io/376f5f985c3ceceea004ab58edb336f2 struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; int i; #define constant 0 void foo1() { StructuredBuffer lSBuffer; [unroll] for( int j = i; j < 2; j++ ) { // Compiler can't determine the initial value if (constant > j) { lSBuffer = gSBuffer1; } else { lSBuffer = gSBuffer2; } gRWSBuffer[j] = lSBuffer[j]; } } void foo2() { StructuredBuffer lSBuffer; [unroll] for( int j = 0; j < i; j++ ) { // Compiler can't determine the end value if (constant > j) { lSBuffer = gSBuffer1; } else { lSBuffer = gSBuffer2; } gRWSBuffer[j] = lSBuffer[j]; } } void foo3() { StructuredBuffer lSBuffer; [unroll] for( int j = 0; j < 2; j += i ) { // Compiler can't determine the step count if (constant > j) { lSBuffer = gSBuffer1; } else { lSBuffer = gSBuffer2; } gRWSBuffer[j] = lSBuffer[j]; } } void main() { foo1(); foo2(); foo3(); } As before the compiler will try to simplify expressions to determine their value at compile time, but it may not always be successful. We would recommend that you keep the expressions for the loop bounds as simple as possible to increase the chances the compiler can figure it out. Other restrictions on unrolling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Not being able to determine the iteration count at compile time is a fundamental problem. No matter how good the compiler is, it will never be able to fully unroll the loop. However, due to the internal details (algorithms in the SPIRV-Tools optimizer), other cases cannot be handled. The most notable one is that the induction variable must be an integral type. .. code-block:: hlsl // 17-loop-var-float-fail.hlsl // http://shader-playground.timjones.io/d5d2598699378688684a4a074553dddf struct S { float4 f; }; struct CombinedBuffers { StructuredBuffer SBuffer; RWStructuredBuffer RWSBuffer; }; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer; #define constant 0 void main() { StructuredBuffer lSBuffer; [unroll] for( float j = 0; j < 2; j++ ) { // Can't infer floating point induction values if (constant > j) { lSBuffer = gSBuffer1; } else { lSBuffer = gSBuffer2; } gRWSBuffer[j] = lSBuffer[j]; } } This example cannot be legalized because ``j`` is a ``float``. Other interesting cases ----------------------- Multiple calls to a function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: hlsl // 18-multi-func-call-ok.hlsl // http://shader-playground.timjones.io/e7b3ac1262a291c92902fd3f1fd3343c struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer1; RWStructuredBuffer gRWSBuffer2; void foo(RWStructuredBuffer pRWSBuffer) { pRWSBuffer[i] = gSBuffer[i]; } void main() { foo(gRWSBuffer1); foo(gRWSBuffer2); } In this example, we see the same function is called twice. Each call has a different parameter. This can look like a problem because ``pRWSBuffer`` could be either ``gRWSBuffer1`` or ``gRWSBuffer2``. However, the compiler is able to work around this by creating a separate copy of ``foo`` for each call site. In fact, these copies will be placed inline. Multiple returns ~~~~~~~~~~~~~~~~ As we have already seen, a return from a function is a copy. At this point, it would be fair to ask what happens if there are multiple returns. .. code-block:: hlsl // 19-multi-func-ret-fail.hlsl // http://shader-playground.timjones.io/922facb688a5ba09b153d64cf1fc4557 struct S { float4 f; }; int i; StructuredBuffer gSBuffer; RWStructuredBuffer gRWSBuffer1; RWStructuredBuffer gRWSBuffer2; RWStructuredBuffer foo(int l) { if (l == 0) { // Compiler does not know which branch will be taken: // Branch taken depends on input i. return gRWSBuffer1; } else { return gRWSBuffer2; } } void main() { RWStructuredBuffer lRWSBuffer = foo(i); lRWSBuffer[i] = gSBuffer[i]; } The compiler is not able to legalize this example because it does not know which value will be returned. However, if the compiler is able to determine which path will be taken, then it can be legalized. .. code-block:: hlsl // 20-multi-func-ret-const-ok.hlsl // http://shader-playground.timjones.io/84b093c7cf9e3932c5f0d9691533bafe struct S { float4 f; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer1; RWStructuredBuffer gRWSBuffer2; StructuredBuffer foo(int l) { if (l == 0) { return gSBuffer1; } else { return gSBuffer2; } } void main() { gRWSBuffer1[i] = foo(0)[i]; gRWSBuffer2[i] = foo(1)[i]; } For each call to ``foo``, the compiler is able to determine which value will be returned. In this case, the code can be legalized. Combining elements ~~~~~~~~~~~~~~~~~~ Individually, these examples are simple; however, these elements can be combined in arbitrary ways. As one last example, consider this HLSL source code. .. code-block:: hlsl // 21-combined-ok.hlsl // http://shader-playground.timjones.io/9f00d2d359da0731cdf8d0b68520e2c4 struct S { float4 f; }; int i; StructuredBuffer gSBuffer1; StructuredBuffer gSBuffer2; RWStructuredBuffer gRWSBuffer1; RWStructuredBuffer gRWSBuffer2; #define constant 0 StructuredBuffer bar() { if (constant > 2) { return gSBuffer1; } else { return gSBuffer2; } } void foo(RWStructuredBuffer pRWSBuffer) { StructuredBuffer lSBuffer = bar(); pRWSBuffer[i] = lSBuffer[i]; } void main() { foo(gRWSBuffer1); foo(gRWSBuffer2); } The compiler will do all of the transformations that mentioned earlier to identify a single resource for each load and store from a resource. Conclusion ========== It is impossible to enumerate all of the possible code sequences that work or do not work, but hopefully this will give a guide as to what is possible or not. The general rule of thumb is that there must be a straightforward way to transform the code so that there are no copies of global resources.