Граф коммитов

3508 Коммитов

Автор SHA1 Сообщение Дата
Jeff Noyle 36454e5de2
PIX: Partial cherry pick of validator version fix (#6908)
Partial cherry pickfrom commit  0fd84aa814

0fd84aa814
Original PR:
https://github.com/microsoft/DirectXShaderCompiler/pull/6901
2024-09-09 15:21:02 -07:00
Jeff Noyle cf285db4ab
Cherry-Pick: PIX: Rationalize UAV generation (#6883) (#6897)
The impetus for these changes was unexplained crashes in a display
driver while attempting to create a debug-instrumented shader for PIX.
The heart of it is the new test in pixtest.cpp: use the compiler to
generate a raw UAV, and then compare the generated DXIL with what PIX
generates for the same purpose.
Some of the PIX passes need only one UAV for a module, but some need two
or more. In the latter case, the previous code was a bit loose about
what it was doing with respect to adding the UAV resource, and creating
its handles for each interested function. Most of the actual changes
herein are to do with that.
Lastly, the PIX UAV is raw at the D3D API level, but the instrumentation
had been doing non-raw writes. No driver seemed to care, but I've fixed
it anyway.

(cherry picked from commit bf24b7a54d)
2024-09-05 13:21:34 -07:00
Jeff Noyle c2d5eb1941
Cherry-Pick: Fix debug info offsets for vectors with 16-bit types (#6775) (#6893)
This fixes a bug where the offsets for elements in vectors with 16-bit
types doesn't take into account alignment bits and PIX wouldn't display
vector element values correctly in the shader debugger. Eg. if
`-enable-16bit-types` wasn't set, the offsets for a min16float4 would be
0, 16, 32, 48 instead of 0, 32, 64, 96.

Also removed the assert in PopulateAllocaMap_StructType that was
checking whether the calculated aligned offset matches the packed offset
(from SortedMembers) because it was false for members with sizes smaller
than the alignment size.

(cherry picked from commit 84c0a09557)

Co-authored-by: gracezhang72 <gracezhang@microsoft.com>
2024-09-04 15:02:26 -07:00
Xiang Li 416fab6b5c
Fix warning when use implicit launch type. (#6837) (#6841)
By default, the launch type should be set to ‘Broadcast’ when diagnosing
barriers. However, the current behavior sets the default launch type to
‘Invalid,’ resulting in warnings when the launch type is not explicitly
specified as an attribute.

To address this issue, we’ll adjust the default setting to ‘Broadcast’
and thereby resolve the problem.

Fixes #6836

---------

Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
(cherry picked from commit ef043e90f3)
2024-07-31 02:03:01 +00:00
Xiang Li af9a0ba8a5
Fix CB vector to scalar array translation generating invalid IR (#6777) (#6800)
In the special code to handle the memcpy pattern where a constant buffer
contains a vector array that initializes a local (or static global)
scalar array for use by the shader, an invalid assumption was made that
if the memcpy dest was global, that the src is global as well.

This was not the case, and when expecting to generate constant
expressions to index the src, these generated orphaned instructions
instead, leading to invalid IR.

This fixes the issue by leveraging ReplaceConstantWithInst, and setting
the insertion point for the Builder. Now, replacement *could* fail, if
src instructions don't dominate replacement uses, so bool for replaced
all is returned from replaceScalarArrayWithVectorArray.

Another issue was that it would replace the dest for the original memcpy
with src along the way. Now, if we don't replace all uses, this turns
the memcpy into a no-op and any remaining uses are no longer coming from
src, but an undef dest instead. This was also fixed to skip this
replacement, then clean up this use if all other uses have been
successfully replaced.

Fixes #6510

(cherry picked from commit 5cfefc7d0b)

Co-authored-by: Tex Riddell <texr@microsoft.com>
2024-07-22 12:24:57 -07:00
Raed Rizqie 4a81bdfa70
Avoid NOMINMAX redefinition warnings (#6729)
On MinGW, NOMINMAX is defined by default, so guard it to avoid
redefinition warnings.
2024-07-02 15:37:54 -07:00
Jeff Noyle 1fefbc429b
Fix a PIX pass's attempt to set the validator version (#6707)
This pass was attempting to compare different things. The return values
of GetDxilVersion are not shader models, but... dxil version. Since the
code is trying to upgrade the validator version, I changed this to
GetValidatorVersion, to pair with SetValidatorVersion.
The previous code was breaking the nvidia driver on workgraphs.
2024-07-01 12:58:55 -07:00
David Neto 206133c9e7
HLMatrixLower: allow exceptions to propagate out (#6710)
If an exception is thrown, don't block it in the TempOverloadPool
destructor. Allow it to propagate out as a user-visible error.

Explicitly clear the TempOverloadPool before returning from the
HLMatrixLowerPass::runOnModule. In the normal case, when no exception is
thrown, that will still verify that all the overloads actually have been
lowered, and will assert out if they aren't.
2024-06-26 11:57:18 -04:00
Steven Perron e5183a06b9
Set the layout rule explicitly for raw buffer operations (#6701)
The first first fix in #5392 was not correct. It relied on the layout
rule for the address to be the correct layout rule, but that is not
always the case. The address is just an integer that could exist in any
storage class. The correct solution is to explicitly set the layout rule
for the BitCast operation when expanding the RawBuffer* functions. We
know that the result of the BitCast is a pointer to the physical storage
buffer storage class, so we know the layout need to be the storage
buffer layout.

Fixes #6554
2024-06-25 10:01:49 -04:00
Adam Yang 74205f8c19
Fix non-determinism in Reassociate caused by address coincidences (#6717)
Originally @lizhengxing's PR.  Retargeting main.

This PR pulls the upstream change, Fix non-determinism in Reassociate
caused by address coincidences
(ef8761fd3b),
into DXC.

Here's the summary of the change:

Between building the pair map and querying it there are a few places
that erase and create Values. It's rare but the address of these newly
created Values is occasionally the same as a
just-erased Value that we already have in the pair map. These
coincidences should be accounted for to avoid non-determinism.

  Thanks to Roman Tereshin for the test case.

This is part 6 (the last part) of the fix for #6659.

Co-authored-by: Zhengxing Li <zhengxingli@microsoft.com>
2024-06-24 17:40:12 -07:00
Raed Rizqie 0eebec69d4
CMake: Add initial support for MinGW-w64 (#6715) 2024-06-24 15:25:14 -07:00
Zhengxing li b79169bc85
[llc/opt] Add an option to run all passes twice (#6666)
This PR pulls the following upstream changes into DXC:

[llc/opt] Add an option to run all passes twice
(04464cf731)
> Lately, I have submitted a number of patches to fix bugs that only
occurred when using the same pass manager to compile
> multiple modules (generally these bugs are failure to reset some
persistent state).
> 
> Unfortunately I don't think there is currently a way to test that from
the command line. This adds a very simple flag to both
> llc and opt, under which the tools will simply re-run their respective
> pass pipelines using the same pass manager on (a clone of the same
module). Additionally, we verify that both outputs are
>   bitwise the same.
> 
>   Reviewers: yaron.keren

[opt] Fix sanitizer complaints about r254774
(38707c45be)
> `Out` can be null if no output is requested, so move any access
> to it inside the conditional. Thanks to Justin Bogner for finding
> this.

This is for the test of this change
(ef8761fd3b)
to fix #6659.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Yang <31109344+adam-yang@users.noreply.github.com>
2024-06-24 15:03:08 -07:00
Steven Perron 8b18659aef
Avoid adding types to default namespace (#6700)
Some of the types that have been added to the vk namespace were being
added to the default namespace when compiling for DXIL. The if
conditions were such that they would fall through to a default case.

The solution is to explicitly add code that we should skip adding those
builtin types when the vk namespace is not defined.

Fixes #6646.
2024-06-21 15:20:16 +00:00
David Neto 1f8f79688e
CMake: Use find_package(Python3) (#6675)
Fixes the CMake CMP0148 warning at configure time.

find_package(Python3) is supported since CMake 3.12, and DXC requires at
least CMake 3.17
2024-06-20 13:11:49 -04:00
Zhengxing li 45018c752d
Rename WeakVH to WeakTrackingVH; NFC (#6663)
This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC
(e6bca0eecb),
into DXC.

Here's the summary of the change:

> I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
> 
>   Reviewers: dblaikie, davide
> 
>   Reviewed By: davide
> 
> Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb,
llvm-commits, nhaehnle
> 
>   Differential Revision: https://reviews.llvm.org/D32266

This is part 3 of the fix for #6659.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-18 12:34:55 -07:00
Steven Perron 4295b25934
Add error for invalid arguments to GetDimension (#6698)
When processing the GetDimension member function for textures, we do not
emit an error if the output variable is not an l-value. This change will
add this error.

Fixes #6689
2024-06-18 09:05:27 -07:00
Nathan Gauër 129c9f8760
[SPIR-V] Re-enable rich debug instructions for objects (#6696)
This commit bumps SPIR-V tools version, and re-add support for objects
debug instructions when using Vulkan's debug instructions.

Because OpenCL debug instructions are not a non-semantic set, the SPIR-V
spec would need to be modified, as today it does not allows forward
references.

Fixes #6691

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-06-18 10:15:29 +00:00
Steven Perron 8c3f40c0ae
Add warning when vk::offset is not correctly aligned (#6694)
We will start issues a warning when `vk::offset` is not correctly
aligned to make it easier for users to understand why their spir-v will
not validate. Note that we do not treat this as an error because we want
to allow someone to have the flexibility to do other things. For
example, they could be targeting an API that does not follow any of
the existing rules, which is why they are using `vk::offset`.

Fixes #6171
2024-06-17 13:19:14 +02:00
Nathan Gauër 56f3c40381
[SPIR-V] Emit OpUndef for undefined values (#6686)
Before this change, OpConstantNull was emitted when an undef value was
required.
This causes an issue for some types which cannot have the OpConstantNull
value.

In addition, it mixed well-defined values with undefined values, which
prevents any kind of optimization/analysis later on.

Fixes #6653

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-06-13 09:21:34 +02:00
Helena Kotas 84d39b66cf
ExecutionTest::UnaryHalfOpTest#AcosHalf: Update tolerance (#6690)
Update tolerance for ExecutionTest::UnaryHalfOpTest#AcosHalf test.

Enables implementations to calculate `acos` for fp16 type by converting
to fp32, doing fp32 math, and then converting back to fp16 using
round-to-nearest-even conversing (RTNE) per D3D11 spec. For more details
please see issue #6179.

As mentioned in the linked issue, for these floating point tests a fixed
point tolerance does not really make sense. It should vary based on the
magnitude of the expected value. But we are already using this approach
in many similar test cases and the simplest fix now is to update the
tolerance to accommodate the fp32-to-ft16 conversion.

Fixes #6179
2024-06-12 16:14:31 -07:00
Steven Perron 4353db3983
Make the location map run per entry point (#6688)
The code that adds the input and output decoration in the entry points
inputs and outputs assumes that there is a single entry point in the
module. When using the `lib` profile that is not true.

This commit modifies the code so that it groups the stage variables by
entry point, and runs the current code on each group separably.

I hesitate to make this change because it will change the locations for
code that currently works, and will force users to update their
applications accordingly. Or they could modify their shaders
to use explicit locations attributes. Neither is great.

However, the advantage is that this allows the implicit locations to
match what would happen if the shader were compiled individually. It
also makes the locations more predictable because change in another
shader would change all shader after it. This is a better design, and
worth the breakage.

Fixes #6678
Fixes #5213
2024-06-12 15:26:34 +00:00
Nathan Gauër 80f6e46bf8
[SPIR-V] Fix GroupNonUniform capabilities+ext (#6687)
[SPIR-V] Fix GroupNonUniform capabilities+ext

Fixes emission of GroupNonUniform capabilities and related extensions,
in particular SPV_NV_shader_subgroup_partitioned.

Since this PR bumps SPIR-V headers + tools, some test changes are
required due to opcode changes. Those are in a separate commit, but same
PR.

Fixes #6672

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-06-12 09:20:39 +02:00
Antonio Maiorano 1c7cb4ffb8
Fix instcombine overflow check inserting inst at wrong place (#6679)
When optimizing an overflow check of an add followed by a compare, the
new instruction was being inserted at the compare, and the add removed.
This produced invalid IR in cases where there were other uses of the
former add between it and the compare. This fix makes sure to insert the
new instruction at the old add location, rather than at the compare.

Note that this was also fixed in LLVM:

6f5dca70ed
2024-06-11 19:26:31 +00:00
Antonio Maiorano 3dc67421ba
Revert "Fix RemoveUnstructuredLoopExits when an exiting edge jumps out multiple levels of loops. (#6668)" (#6685)
This reverts commit 8206fbdc7f.

Reason for revert: since landing this, new asserts/crashes have been
found.
2024-06-11 18:39:57 +00:00
Natalie Chouinard 40c76f7585
Fix another UAF in SimplifyCFG (#6680)
In certain cases of unreachable code, SimplifyCFG could try to replace a
phi node with a select where the phi node itself was the select's
condition. This resulted in an ASAN use-after-free during SimplifyCFG.

The test case added isn't quite ideal because by the end of the
SimplifyCFG pass, the phi node is restored to its original state both
before and after this fix. However, an ASAN build of `dxopt` or
`check-clang-dxc` will identify a heap-use-after-free failure in the
intermediary steps of this test without this patch and succeeds with it.

This was also fixed in upstream LLVM:
602ab24833
2024-06-11 13:58:30 -04:00
Steven Perron 4b7993c78b
Add option to set the max id (#6654)
Vulkan implementation can have different limits on the maximum value
used as an id in a SPIR-V binary. SPIRV-Tools generall assumes this
limit is 0x3FFFFF because all implementations must support at least that
value for an id.  Since many implementations can support larger values,
the tools allows an option that will set a different limit. This commit
add an option to DXC to do the same.

Fixes #6636
2024-06-10 11:13:01 -07:00
Antonio Maiorano 1d196655b6
Fix crash in scalarrepl-param-hlsl when dynamically indexing a GEP of a constant indexed GEP (#6670)
When processing global values to determine when to flatten vectors, this
pass was only checking the immdiate users of the value for non-dynamic
indexing of the vector. But this would fail in the case of a dynamic
indexed GEP of a constant indexed GEP (e.g. h[0][a]) because the first
level GEP was constant indexed, but not the second. We fix this by
checking the full User tree of the value in `hasDynamicVectorIndexing`.
2024-06-06 14:56:20 -04:00
David Neto 8206fbdc7f
Fix RemoveUnstructuredLoopExits when an exiting edge jumps out multiple levels of loops. (#6668)
Before doing any major surgery on an exit from loop L, ensure that if an
exit edge from L goes to block X, then X is in L's parent loop or no
loop at all.

Add test cases:
- a reduced test case where the exiting block does not dominate its own
loop latch.
- a reduced test case where the exiting block is the latch for its own
loop. This reproduces the assert triggered by the original HLSL.
- the original HLSL that triggered this bug fix.
- the intermediate module from the original HLSL, taken just before the
attempt to remove unstructured loop exits.
2024-06-06 10:56:13 -05:00
David Neto 3a78b67849
NFC: Comment, refactor, and test hlsl::RemoveUnstructuredLoopExits (#6655)
Add a pass to run hlsl::RemoveUnstructuredLoopExits in isolation

  Example: opt -dxil-r-u-l-e a.ll -S

Add some basic tests.

No functional change to the pass itself.
2024-05-30 09:43:39 -04:00
Natalie Chouinard cdc56031b5
[SPIR-V] Add error message for SamplerFeedback (#6640)
Sampler feedback resource types are not supported by the SPIR-V backend,
but they would previously fail silently until a function was called on
them. This change makes the error message more explicit on the type.

Related to #6614
2024-05-24 16:05:53 +02:00
David Neto a1b945c1a3
Loop exit restructurizer: don't iterate over uses while mutating them (#6644)
The SkipBlockWithBranch function does the following:
- Splits the block into three blocks with an if-then-endif structure.
- Moves most instructions from the original block into the "then" block
- If any of those values are used outside the original block, they are
propagated through newly-constructed phis in the 'endif' block.

This algorithm had a bug where the uses of a value were being scanned
while the uses were also being updated. In some cases a downstream
out-of-block use could be skipped. That results in an invalid module
because now the original definition is now in the 'then' block, which
does not dominate the downstream out-of-block use.

Add a test that demonstrates the problem.
2024-05-23 16:00:37 +00:00
Antonio Maiorano b41d8a9478
Fix LoopDeletion incorrectly updating PHI with multiple duplicate inputs (#6643)
LoopDeletion was incorrectly updating PHI nodes in the target block when
it had duplicate input edges. This happens, for example, when deleting a
loop that uses a switch with multiple cases that exit the same way.

After determining that this was the bug, I found this fix in LLVM:
https://reviews.llvm.org/D34516 and applied it here.
2024-05-23 10:21:30 -04:00
Greg Roth a6f4025957
Calculate preferred alignment when lowering groupshared matrices (#6589)
When flattening the global for a groupshared matrix, the alignment
information was getting lost. As a result, the alignments of the loads
and stores were calculating their own alignment based on preferred
alignment and trailing zeros of the index. The preferred alignment
switched to 16 when the type size was over 128 bits due to a heuristic
whose rationale is lost to time. When the global has its own alignment,
that gets used, so by calculating it at lowering, the alignments are
consistent and reliable.

Includes testing for a few matrix variants and a pass test.

fixes #6416
2024-05-22 13:38:51 -07:00
Steven Perron 86da226c4f
Fix aligment for empty structs (#6635)
We have a special case to that the the size and alignment for an empty
struct is `{1,0}`. However that is not correct. See

https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#interfaces-alignment-requirements.

> An empty structure has a base alignment equal to the size of the
smallest
scalar type permitted by the capabilities declared in the SPIR-V module.
(e.g.,
for a 1 byte aligned empty struct in the StorageBuffer storage class,
StorageBuffer8BitAccess or UniformAndStorageBuffer8BitAccess must be
declared
in the SPIR-V module.

I'm not 100% sure how DXC handle this minimum alignment, but I figured I
would inialize the alignment to 1. If there are not members, then it
will remain 1, and I would let the rest of the logic happen. No special
case.

Fixes #2882
2024-05-22 12:44:02 -04:00
Nathan Gauër 9ad095dfe6
[SPIR-V] Add support for SampleCmpLevel (#6618)
SampleCmpLevel is similar to SampleCmpLevel0, except the LOD level can
be specified using either a const-offset, or a variable. This should be
available starting SM6.7

Fixes #6613

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-05-22 10:59:45 +02:00
Zhengxing li 9ee3f23d9c
More aggressive reassociations (#6626)
Although DXC applied the upstream change, Reassociate: add global
reassociation algorithm
(https://github.com/llvm/llvm-project/commit/b8a330c) in this PR
(https://github.com/microsoft/DirectXShaderCompiler/pull/6598), it still
might overlook some obvious common factors.

One case has been observed is:
```
  %Float4_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3
  %Float2_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1

  /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float4_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)

  /* %Float4_1.w is redundant with %Float4_0.w */
  %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 

  /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float2_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)

  /* %Float2_1.y is redundant with %Float2_0.y */
  %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1

  ....
  %11 = fmul fast float %Float4_0.w, %10
  %12 = fmul fast float %11, %Float2_0.y
  ....
  %14 = fmul fast float %Float4_1.w, %13
  %15 = fmul fast float %14, %Float2_1.y 

(%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor

```
The upstream change can't identify this common factor because DXC
doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y)
are redundant when running Reassociate pass. Those redundancies will be
eliminated in GVN pass.

For DXC can identify more common factors, this PR will aggressively run
Reassociate pass again after GVN pass and then run GVN pass again to
remove the redundancies generared in this run of Reassociate pass.

Changing the order of floating point operations causes the precision
issue. In case some shaders get unexpected results due to this PR, use
"-opt-disable aggressive-reassociation" to disable this PR and roll
back.

This is part 3 of the fix for
https://github.com/microsoft/DirectXShaderCompiler/issues/6593.
2024-05-21 15:37:04 -07:00
Zhengxing li 1ee70fdc64
Add a flag for the upstream global reassociation algorithm change (#6625)
This PR (https://github.com/microsoft/DirectXShaderCompiler/pull/6598)
pulls the upstream global reassociation algorithm change in DXC and can
reduce redundant calculations obviously.

However, from the testing result of a large offline suite of shaders,
some shaders got worse compilation results and couldn't benefit from
this upstream change.

This PR adds a flag for the upstream global reassociation change. It
would be easier to roll back if a shader get worse compilation result
due to this upstream change.

This is part 2 of the fix for #6593.
2024-05-21 13:45:38 -07:00
Natalie Chouinard 66acf8de22
[SPIR-V] Remove always disabled test (#6634)
This test was marked DISABLED_ in gtest at the time it was added in PR
#3155, so it appears that it was never passing. Specifically, the CHECK
for `DebugFunction [[func1]]` fails. I don't think it's a priority to
implement debug info for unreferenced functions at this point, so opting
to simply remove it.

This is the last test in the unsupported directory so it can now be
removed entirely.

Fixes #6616
2024-05-21 11:47:22 +02:00
Natalie Chouinard 1658b068b5
[SPIR-V] Enable more unsupported tests (#6630)
The following tests only required minor test syntax changes to pass:
- tools/clang/test/CodeGenSPIRV/cast.2float.interlocked.hlsl
-
tools/clang/test/CodeGenSPIRV/meshshading.nv.error.fncall.amplification.vulkan1.2.hlsl
(+ replacing NV ext with EXT)
- tools/clang/test/CodeGenSPIRV/var.init.extvector.hlsl

Issue #6621 has been filed to track the failure of
tools/clang/test/CodeGenSPIRV/oo.class.static.member.hlsl.

Related to #6616
2024-05-17 15:06:11 +00:00
Natalie Chouinard 2432517221
[SPIR-V] Implement WaveMutliPrefix* (#6608)
Implements the Shader Model 6.5 WaveMultiPrefix* intrinsic functions
using the group operation from SPV_NV_shader_subgroup_partitioned,
PartitionedExclusiveScanNV, which performs a partitioned exclusive scan
operation across a subset of invocations ("lanes") in a subgroup
("wave"). The subset of the partition is determined by the provided
ballot ("mask") parameter, which follows the same requirements for
valid partitioning and active invocations/lanes as the HLSL parameter.

Note that WaveMultiPrefixCountBits remains unimplemented because it does
not directly map to a SPIR-V GroupNonUniformArithmetic instruction that
accepts the PartitionedExclusiveScanNV Group Operation.

DirectX Spec:
https://microsoft.github.io/DirectX-Specs/d3d/HLSL_ShaderModel6_5.html#wavemultiprefix-functions
SPIR-V Extension:
https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/NV/SPV_NV_shader_subgroup_partitioned.html

Depends on #6596
Fixes #6600
2024-05-17 09:59:35 -04:00
Natalie Chouinard d64f5f2699
[SPIR-V] Enable rayquery tests (#6627)
These tests were previously marked as "unsupported" because they were
misconfigured at the time they were added and never run. Minor changes
have been made to make them passing tests.

Note that rayquery_assign.cs.hlsl has been removed because it no longer
produces an error. A similar non-erroring check exists in
rayquery_init_expr.hlsl

Related to #6616
2024-05-17 09:33:42 -04:00
Natalie Chouinard 83ba82e1f3
[SPIR-V] Remove bad test (#6632)
This test is marked as unsupported (ignored), but it currently fails to
compile both for SPIR-V and DXIL with the same Sema error: "cannot
implicitly convert from 'SecondStruct' to 'FirstStruct'". If it would
have succeeded for SPIR-V at some point in the past it's no longer valid
anyways, so removing it.

Related to #6616
2024-05-17 09:07:59 -04:00
Antonio Maiorano 348040254e
Fix use-after-free in SimplifyCFG (#6628)
When SimplifySwitchOnSelect calls SimplifyTerminatorOnSelect, it holds
onto the select's condition value to use for the conditional branch it
replaces the switch with. When removing the switch's unused
predecessors, it must make sure not to delete PHIs in case one of them
is used by the condition value, otherwise the condition value itself may
get deleted, resulting in an use-after-free.

Note that this was fixed in LLVM as well:

dc3b67b4ca
2024-05-17 03:22:52 +00:00
Antonio Maiorano cf566e1f3f
Fix dxil-remove-dead-blocks removing switch with multiple same successor (#6610)
Given a switch with a constant condition and all cases the same
(branching to the same successor), dxil-remove-dead-blocks would
incorrectly remove the switch when replacing it with a branch, by
forgetting to remove the N-1 incoming values to the PHIs in the
successor block.
2024-05-16 21:13:01 -04:00
Jeff Noyle fd7e54bcd5
Pix: Cope with group shared AS->MS payload (#6619)
This change copes with the AS->MS payload being placed in group-shared
by the application (and MSFT's samples do indeed do this). (TIL, thanks
to pow2clk, that the spec says that the payload counts against the
group-shared total, implying, if not explicitly stating, that at least
on some platforms, the payload will be in group-shared anyway.)

The MS pass needs to be given data from the AS about the AS's thread
group topology, and this is done by extending the payload struct to add
three uints. This can't be done when the payload is resident in
group-shared, of course, because that would change the layout of
group-shared memory.
So the new approach here is to copy the payload to a new alloca (in the
default address space) struct with the members of the base struct plus
the extended data the MS needs, and then to copy piece-wise because
llvm.memcpy isn't appropriate for group-shared-to-normal address space
copies.
2024-05-15 14:49:22 -07:00
Natalie Chouinard d9caef5289
[NFC][SPIR-V] Refactor SpirvGroupNonUniformOps (#6596)
A follow-up change will use the PartitionedExclusiveScanNV
GroupOperation, which requires that an additional operand is added to
all GroupNonUniformArithmetic instructions. This means that some of the
SPIR-V opcodes which are currently categorized as unary will become
either unary or binary depending on the GroupOp. Since the arity
distinctions between the OpGroupNonUniform* instructions were already
somewhat arbitrary, I'm prefacing that change by refactoring them into a
single SpirvGroupNonUniformOp instruction type for better reusability.

Follow up: #6608
2024-05-15 10:10:53 -04:00
Jeff Noyle ff623f8a74
PIX: Emit correct input sig elements and view ID state into PSV0 when input sigs are changed by debug instrumentation (#6351)
Two intertwined issues:

1:
The shader-debug-instrumentation and pixel-hit-counter passes add
SV_VertexId+SV_InstanceId to the input sig for a VS if they are not
present, and SV_Position for the PS. The changes in PixPassHelpers.cpp
centralize the SV_Position case and fix up a few missing fields whose
absence was breaking PIX shader debugging on WARP. The calller now has
to pass the upstream shader's SV_Position row, if it exists. (PIX has
been updated to do this.)

2:
However, independent of these tweaks, adding these new values means that
the view id metadata would be incorrect. A long-standing assert would
fire, but is herein fixed, since the above work exacerbated this
problem. The assert in question is this one in
CopyViewIDStateForOutputToPSV that previously fired when a
debug-instrumented shader was subsequently wrapped into a container:
```     DXASSERT_NOMSG((InputScalars <= IOTable.InputVectors * 4) &&
                   (IOTable.InputVectors * 4 - InputScalars < 4)); 
```
(InputScalars, which comes from the serialized state in PSV0, was too
small, since it did not include the added system values.)

To fix up these data, the caller now has to invoke the
generate-view-id-state pass, and follow that with the emit-metadata
pass, since emitting the metadata in the debug pass itself is now too
early (and that call has been deleted). (PIX has been updated to do
invoke these passes.)

These changes together are sufficient to allow PIX shader debugging to
operate on WARP in those cases where these system-values were not
included by the application, with correct view-id metadata, and the
assert is happy.
2024-05-14 17:09:48 -07:00
Natalie Chouinard 8fe99be640
[SPIR-V] Fix firstbit{high,low} elem check (#6607)
The code gen for firstbithigh and -low was incorrectly checking the size
of the full (possibly composite) type rather than the element size. This
is now fixed, and I've also switch the error check from whether the
element type is == 64-bit to != 32-bit, so that it matches the current
limitations of the GLSL extended instructions.


https://registry.khronos.org/SPIR-V/specs/unified1/GLSL.std.450.html#:~:text=%3Cid%3E%0AValue-,FindSMsb,-Signed%2Dinteger%20most

Related to #4702
2024-05-14 13:57:22 -04:00
Natalie Chouinard 4273354acf
[SPIR-V] Fix InterlockedMin/Max codegen (#6609)
RWByteAddressBuffer has overloads for InterlockedMin and InterlockedMax
for signed ints that were failing to compile due to mismatched types in
the generated SPIR-V instruction. This adds the missing cast if
necessary.

At the same time, some redundant code is removed from the
InterlockedMin/Max intrinsic non-member functions' codegen to modify the
opcode. If it was necessary in the past, the frontend has since been
fixed and it is no longer necessary. Tests to verify these combinations
and the necessary implicit casts have also been added.

Fixes #3196
Related to #4189, #6254, #5707
2024-05-14 13:27:01 -04:00
Natalie Chouinard e61ea5041f
[SPIR-V] Allow const-evaluatable spec constants (#6606)
Fixes #2957
2024-05-14 13:24:23 -04:00