This commit has two changes:
* Revert "[spirv] do not set result type for OpImageWrite (#3757)"
* Do not set RelaxedPrecision for OpImageWrite
The AST type of `OpImageWrite` can be used by other components.
For example, [OpImageWrite](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpImageWrite)
has `Texel` operand whose type must be the same with `Sampled Type` of `Image` operand.
If we use `1` for the `Texel` operand, we have to determine if it should be `uint` or `sint`.
It is done by `LiteralTypeVisitor` (tools/clang/lib/SPIRV/LiteralTypeVisitor.cpp) using the AST
type of `OpImageWrite`.
Therefore, we should keep its AST type. To prevent `EmitVisitor` from setting
`OpDecorate RelaxedPrecision` for `OpImageWrite`, this commit lets RelaxedPrecisionVisitor
not set RelaxedPrecision for `OpImageWrite`.
For example, following textures have types with minimum precision scalar
as template parameters. We support minimum precision scalar types for
images.
```
RWTexture2D<min10float4> tex0;
RWTexture2D<min16uint4> tex1;
RWTexture2D<min12int4> tex2;
```
Mostly reverts the change that added an error when offsets were not
immediate or out of range for gather operations. More research suggests
this error was not required and it broke some existing shaders
Fixes#3738
Needed to pick up new DCE opportunities and fully eliminate useless
loops. We ran it after the total runs anyway. Now we run it after every
repetition
Correct a variable name typo
Add a test where loop deletion requires instruction simplification to
completely eliminate the loops without effect
Two things:
First recurse down type hierarchy. The previous code only went down 2 levels...
Second, derive base type from enum classes when populating fake debug-store locations
The flat conversion method `SpirvEmitter::processFlatConversion()`
conducts bit-casts, composite construction, ... to covert the target to
the specific type. Since we cannot convert a pointer to other types, we
should make sure the target is r-value. In particular, it fixes bugs
when the target is a constant or texture buffer. The existing code just
returns `OpVariable` for the constant or texture buffer conversion, but
`OpVariable` is not r-value and it results in a spirv-val error.
In high level, we did
```
llvm::StringRef choppedSrcCode;
{
std::string text ...
choppedSrcCode = text ...
}
// use choppedSrcCode
```
`choppedSrcCode` points a freed string.
The value-to-declare pass replaces dbg.value with dbg.declare and some getElementPtr/store pairs that the PIX instrumentation uses to discover values. The problem here was that the dbg.value could reside in the program prior to the value it was referring to, and this seems to be legal. It's not correct once the dbg.value has been replaced with the GEP/store pairs, since the store was trying to operate on a value that had not yet been created. The solution is to move the builder's insertion point to that of the value itself, rather than that of the dbg.value.
Also in this change: a simple pair of copy-paste typos in the OffsetManager class.
When a CTBuffer contains a matrix and we use FXC memory layout for it i.e.,
`-fvk-use-dx-layout`, the memory layout of the generated struct is different
from what FXC generates. FXC memory layout rule for matrices or array of
matrices are:
1. `floatMxN` means `N` float vectors and each vector has `M` elements.
- How to calculate size: 16 * (N - 1) + 4 * M bytes
- How to calculate offset:
- If the size is greater than or equal to 16 bytes: the offset must be aligned to 16 bytes
- Otherwise (less than 16): it cannot be split into multiple 16 bytes slots.
- For example, float2x3 has 16 * (3 - 1) + 4 * 2 = 40 bytes as its size. Since its size 40 bytes is greater than 16 bytes, it must be aligned to 16 bytes.
2. `floatMxN[K]` means an array of `floatMxN` with `K` elements.
- size: (K - 1) * N * 16 + 16 * (N - 1) + 4 * M
- offset:
- If K > 1, it must be aligned to 16 bytes
- If K == 1, it is the same with floatMxN.
- For example, the size of float3x2 foo[7]; is (7 - 1) * 2 * 16 + 16 * (2 - 1) + 4 * 3 = 220.
The non-trivial case is `float1xN` which is a matrix with `N` vectors and each vector has 1 element.
Its size should be `16 * (N - 1) + 4` based on the FXC memory layout rule.
For example, the size of `float1x2` must be 20 in bytes, which means we want to put the first float value of
`float1x2` at the offset 0 in bytes and the second float value at the offset 16 in bytes.
It means we must not generate it as a SPIR-V vector type because setting it as a SPIR-V vector results in
putting the first at the offset 0 in bytes and the second at the offset 4 in bytes.
In addition, we cannot set it as a SPIR-V matrix type because SPIR-V does not allow a matrix with a single
row and a vector with a single element.
The only available option is to set it as a SPIR-V array with `ArrayStride 16`.
Since we currently consider `float1xN` as an `OpTypeVector` and generate all SPIR-V code based on the assumption.
Changing the type of `float1xN` to `OpTypeArray` needs huge engineering costs to handle all the cases.
For example, in many places e.g., addition, subtraction, multiplication, we use `OpVectorShuffle` for `float1xN` because we consider it as `OpTypeArray`.
Our solution is to create two variables for CTBuffer including `type1xN` with FXC memory layout:
1. Original: One with correct subtypes and memory layouts i.e., `OpTypeArray` for `type1xN`
2. Clone: One with Private storage class i.e., without physical memory layout
- `OpTypeVector` for `type1xN` as the current DXC does.
The Original variable is in charge of getting CTBuffer data from CPU.
We create a module initialization function to copy the Original variable to the Clone variable.
We insert `OpFunctionCall` for the module initialization function into all entry points.
We use the Clone variable for the CTBuffer in all places.
```
void getResource(out Texture2D<float4> result) {
result = global_resource_variable;
}
void main() {
Texture2D<float4> x;
getResource(x);
...
}
```
SPIR-V backend currently creates a temporary variable to pass `x` as an
argument. In pseudo code:
```
temp_var = x
call getResource temp_var
// use x in main()
```
Since `getResource` will set `temp_var = global_resource_variable`,
we have to use `temp_var` for `x` in `main()`.
However, in `main()`, it always just uses `x`, not `temp_var`.
Therefore, setting `x` as `global_resource_variable` does not work.
This CL lets SPIR-V backend not create a temporary variable for the
local variable with a resource type for argument passing with `out` or
`inout` keyword.
Note that we preserve the behavior of a global variable with a resource type i.e., use a temporary variable. It is because we interpret a local variable or a function parameter with some resource types as a pointer to the resource type while we interpret a global variable with a resource type just as the resource not pointer.
For example, we interpret a local variable or a function parameter with `RWStructuredBuffer<float4>` type as `%_ptr_Function__ptr_StorageBuffer_type_RWStructuredBuffer_v4float` while a global variable with `RWStructuredBuffer<float4>` type as `%_ptr_StorageBuffer_type_RWStructuredBuffer_v4float`. Because of this difference, before we use a local variable or a function parameter with the type, we always dereference it exactly once more than a global variable. If we pass a function argument with `RWStructuredBuffer<float4>` type without a temporary parameter variable, we pass a global variable directly and conduct one more dereferencing that results in the incorrect SPIR-V for both syntax and semantics.
These shouldn't ever really get hit, but there is a chance that the
exception is thrown inside a block that returns the hresult and discards
the exception with its message. We'd rather preserve the message, but as
a last resort, these messages are better than nothing
- In constant GEP case, the outer structure would be reported as used by
the intrinsic, preventing SROA of outer struct and resulting in the wrong
type for the intrinsic.
When converting the internal gatherimm intrinsic to the official dxil
variant after validation has been performed, user iteration went over
the edge, resulting in an app verifier problem. This corrects the
iteration.
Fixes#3674
When a diagnostic that should be emitted with DiagnoceOnce has already been emitted, an invalid DiagnosticsBuilder is returned to prohibit another emittance of the diagnostic. The current implementation, however, does not set IsActive=false and the DiagnosticsEngine tries to emit a diagnostic with an invalid DiagID. This function is currently only used with payload access qualifiers to emit a warning only once if qualifiers are dropped because PAQs are disabled because of the SM version or the user opted-out. (Update the corresponding test accordingly).
When calling a method using an object, we have to correctly pass "this"
object. When the class of the object used for the method call has
multiple base classes, it does not use the correct base class. It simply
uses "this" object. This commit fixes the issue.
Fixes#3644
Change the mapping of SV_ShadingRate from FragSizeEXT (VK_EXT_fragment_density_map) to PrimitiveShadingRateKHR/ShadingRateKHR (VK_KHR_variable_rate_fragment_shading).
spirv-val reports a validation error when we use variable offset for
OpImage*. We have to rely on spirv-val for the validation instead of
doing it in DXC. In particular, when the HLSL code uses the loop
invariant variable for the Sample() intrinsics, it should be a constant
value after conducting the loop unrolling. Since we rely on spirv-opt
for the loop unrolling, we should not report the validation error in
DXC.
Fixes#3575
dbg.value can occasionally return a null value. (Hit this in a customer (343) shader via PIX.) This is expected. From IntrinsicInst.cpp:
// When the value goes to null, it gets replaced by an empty MDNode.