Граф коммитов

1669 Коммитов

Автор SHA1 Сообщение Дата
Zhengxing Li 38830016ad Fix non-determinism in Reassociate caused by address coincidences
This PR pulls the upstream change, Fix non-determinism in Reassociate caused by address coincidences (ef8761fd3b), into DXC.

Here's the summary of the change:

  Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a
  just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism.

  Thanks to Roman Tereshin for the test case.

This is part 6 (the last part) of the fix for #6659.
2024-05-31 09:24:14 -07:00
Zhengxing Li e5d8f7e91e Add a new WeakVH value handle; NFC
This PR pulls the upstream change, UAdd a new WeakVH value handle; NFC (f1c0eafd5b), into DXC.

Here's the summary of the change:

  WeakVH nulls itself out if the value it was tracking gets deleted, but it does not track RAUW.

  Reviewers: dblaikie, davide

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32267

This is part 4 of the fix for #6659.
2024-05-30 21:48:24 -07:00
Zhengxing Li e77fa1919c Rename WeakVH to WeakTrackingVH; NFC
This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC (e6bca0eecb), into DXC.

Here's the summary of the change:

  I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit.

  Reviewers: dblaikie, davide

  Reviewed By: davide

  Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

  Differential Revision: https://reviews.llvm.org/D32266

This is part 3 of the fix for #6659.
2024-05-30 21:20:35 -07:00
Zhengxing Li 8d4df0d4e0 Emulate TrackingVH using WeakVH
This PR pulls the upstream change, Emulate TrackingVH using WeakVH (8a6238201f), into DXC.

Here's the summary of the change:

  This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle.  The size of the HandleBaseKind enum is important because we store a HandleBaseKind in
  the low two bits of a (in the worst case) 4 byte aligned pointer.

  Reviewers: davide, chandlerc

  Subscribers: mcrosier, llvm-commits

  Differential Revision: https://reviews.llvm.org/D32634

This is part 2 of the fix for #6659.
2024-05-30 20:38:49 -07:00
Zhengxing Li 2378f9b705 Use accessors for ValueHandleBase::V; NFC
This PR pulls the upstream change, Use accessors for ValueHandleBase::V; NFC (6f08789d30), into DXC.

Here's the summary of the change:
  This changes code that touches ValueHandleBase::V to go through getValPtr and (newly added) setValPtr.  This functionality will be used later, but also seemed like a generally good cleanup.

  I also renamed the field to Val, but that's just to make it obvious that I fixed all the uses.

This is part 1 of the fix for #6659.
2024-05-30 13:38:27 -07:00
David Neto 3a78b67849
NFC: Comment, refactor, and test hlsl::RemoveUnstructuredLoopExits (#6655)
Add a pass to run hlsl::RemoveUnstructuredLoopExits in isolation

  Example: opt -dxil-r-u-l-e a.ll -S

Add some basic tests.

No functional change to the pass itself.
2024-05-30 09:43:39 -04:00
David Neto a1b945c1a3
Loop exit restructurizer: don't iterate over uses while mutating them (#6644)
The SkipBlockWithBranch function does the following:
- Splits the block into three blocks with an if-then-endif structure.
- Moves most instructions from the original block into the "then" block
- If any of those values are used outside the original block, they are
propagated through newly-constructed phis in the 'endif' block.

This algorithm had a bug where the uses of a value were being scanned
while the uses were also being updated. In some cases a downstream
out-of-block use could be skipped. That results in an invalid module
because now the original definition is now in the 'then' block, which
does not dominate the downstream out-of-block use.

Add a test that demonstrates the problem.
2024-05-23 16:00:37 +00:00
Antonio Maiorano b41d8a9478
Fix LoopDeletion incorrectly updating PHI with multiple duplicate inputs (#6643)
LoopDeletion was incorrectly updating PHI nodes in the target block when
it had duplicate input edges. This happens, for example, when deleting a
loop that uses a switch with multiple cases that exit the same way.

After determining that this was the bug, I found this fix in LLVM:
https://reviews.llvm.org/D34516 and applied it here.
2024-05-23 10:21:30 -04:00
Greg Roth a6f4025957
Calculate preferred alignment when lowering groupshared matrices (#6589)
When flattening the global for a groupshared matrix, the alignment
information was getting lost. As a result, the alignments of the loads
and stores were calculating their own alignment based on preferred
alignment and trailing zeros of the index. The preferred alignment
switched to 16 when the type size was over 128 bits due to a heuristic
whose rationale is lost to time. When the global has its own alignment,
that gets used, so by calculating it at lowering, the alignments are
consistent and reliable.

Includes testing for a few matrix variants and a pass test.

fixes #6416
2024-05-22 13:38:51 -07:00
Zhengxing li 9ee3f23d9c
More aggressive reassociations (#6626)
Although DXC applied the upstream change, Reassociate: add global
reassociation algorithm
(https://github.com/llvm/llvm-project/commit/b8a330c) in this PR
(https://github.com/microsoft/DirectXShaderCompiler/pull/6598), it still
might overlook some obvious common factors.

One case has been observed is:
```
  %Float4_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3
  %Float2_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1

  /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float4_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)

  /* %Float4_1.w is redundant with %Float4_0.w */
  %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 

  /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float2_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)

  /* %Float2_1.y is redundant with %Float2_0.y */
  %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1

  ....
  %11 = fmul fast float %Float4_0.w, %10
  %12 = fmul fast float %11, %Float2_0.y
  ....
  %14 = fmul fast float %Float4_1.w, %13
  %15 = fmul fast float %14, %Float2_1.y 

(%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor

```
The upstream change can't identify this common factor because DXC
doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y)
are redundant when running Reassociate pass. Those redundancies will be
eliminated in GVN pass.

For DXC can identify more common factors, this PR will aggressively run
Reassociate pass again after GVN pass and then run GVN pass again to
remove the redundancies generared in this run of Reassociate pass.

Changing the order of floating point operations causes the precision
issue. In case some shaders get unexpected results due to this PR, use
"-opt-disable aggressive-reassociation" to disable this PR and roll
back.

This is part 3 of the fix for
https://github.com/microsoft/DirectXShaderCompiler/issues/6593.
2024-05-21 15:37:04 -07:00
Zhengxing li 1ee70fdc64
Add a flag for the upstream global reassociation algorithm change (#6625)
This PR (https://github.com/microsoft/DirectXShaderCompiler/pull/6598)
pulls the upstream global reassociation algorithm change in DXC and can
reduce redundant calculations obviously.

However, from the testing result of a large offline suite of shaders,
some shaders got worse compilation results and couldn't benefit from
this upstream change.

This PR adds a flag for the upstream global reassociation change. It
would be easier to roll back if a shader get worse compilation result
due to this upstream change.

This is part 2 of the fix for #6593.
2024-05-21 13:45:38 -07:00
Zhengxing li 6f9c107b78
Reassociate: add global reassociation algorithm (#6598)
This PR pulls the upstream change, Reassociate: add global reassociation
algorithm
(b8a330c42a),
into DXC with miminal changes.

For the code below:
  foo = (a * b) * c
  bar = (a * d) * c

As the upstream change states, it can identify the a*c is a common
factor and redundant.

This is part 1 of the fix for #6593.
2024-05-21 11:55:40 -07:00
Antonio Maiorano 348040254e
Fix use-after-free in SimplifyCFG (#6628)
When SimplifySwitchOnSelect calls SimplifyTerminatorOnSelect, it holds
onto the select's condition value to use for the conditional branch it
replaces the switch with. When removing the switch's unused
predecessors, it must make sure not to delete PHIs in case one of them
is used by the condition value, otherwise the condition value itself may
get deleted, resulting in an use-after-free.

Note that this was fixed in LLVM as well:

dc3b67b4ca
2024-05-17 03:22:52 +00:00
Antonio Maiorano cf566e1f3f
Fix dxil-remove-dead-blocks removing switch with multiple same successor (#6610)
Given a switch with a constant condition and all cases the same
(branching to the same successor), dxil-remove-dead-blocks would
incorrectly remove the switch when replacing it with a branch, by
forgetting to remove the N-1 incoming values to the PHIs in the
successor block.
2024-05-16 21:13:01 -04:00
Jeff Noyle fd7e54bcd5
Pix: Cope with group shared AS->MS payload (#6619)
This change copes with the AS->MS payload being placed in group-shared
by the application (and MSFT's samples do indeed do this). (TIL, thanks
to pow2clk, that the spec says that the payload counts against the
group-shared total, implying, if not explicitly stating, that at least
on some platforms, the payload will be in group-shared anyway.)

The MS pass needs to be given data from the AS about the AS's thread
group topology, and this is done by extending the payload struct to add
three uints. This can't be done when the payload is resident in
group-shared, of course, because that would change the layout of
group-shared memory.
So the new approach here is to copy the payload to a new alloca (in the
default address space) struct with the members of the base struct plus
the extended data the MS needs, and then to copy piece-wise because
llvm.memcpy isn't appropriate for group-shared-to-normal address space
copies.
2024-05-15 14:49:22 -07:00
Jeff Noyle ff623f8a74
PIX: Emit correct input sig elements and view ID state into PSV0 when input sigs are changed by debug instrumentation (#6351)
Two intertwined issues:

1:
The shader-debug-instrumentation and pixel-hit-counter passes add
SV_VertexId+SV_InstanceId to the input sig for a VS if they are not
present, and SV_Position for the PS. The changes in PixPassHelpers.cpp
centralize the SV_Position case and fix up a few missing fields whose
absence was breaking PIX shader debugging on WARP. The calller now has
to pass the upstream shader's SV_Position row, if it exists. (PIX has
been updated to do this.)

2:
However, independent of these tweaks, adding these new values means that
the view id metadata would be incorrect. A long-standing assert would
fire, but is herein fixed, since the above work exacerbated this
problem. The assert in question is this one in
CopyViewIDStateForOutputToPSV that previously fired when a
debug-instrumented shader was subsequently wrapped into a container:
```     DXASSERT_NOMSG((InputScalars <= IOTable.InputVectors * 4) &&
                   (IOTable.InputVectors * 4 - InputScalars < 4)); 
```
(InputScalars, which comes from the serialized state in PSV0, was too
small, since it did not include the added system values.)

To fix up these data, the caller now has to invoke the
generate-view-id-state pass, and follow that with the emit-metadata
pass, since emitting the metadata in the debug pass itself is now too
early (and that call has been deleted). (PIX has been updated to do
invoke these passes.)

These changes together are sufficient to allow PIX shader debugging to
operate on WARP in those cases where these system-values were not
included by the application, with correct view-id metadata, and the
assert is happy.
2024-05-14 17:09:48 -07:00
Jeff Noyle f9389db7c3
PIX: Correct the disambiguation of AS+MS threads for mesh shader output (#6592)
PIX requires that all vertex information generated by these passes be
uniquely identified by vertex id and MS thread id.
This change fixes the MS thread id part in two places: the amplification
shader and the mesh shader.
To be unique across an entire DispatchMesh call, we must uniquify the AS
thread group, the AS thread, the MS thread group and the MS thread. This
is a lot of multiplying and adding, and there wasn't quite enough math
going on here before.
In the AS case, we now generate a unique "flat" thread id from the
flat-thread-id-in-group (the already-available system value) and the
"flat group id", which we synthesize by multiplying together the group
id components with the DispatchMesh API's thread group counts, and then
multiplying that by the number of threads each AS group launches, then
add the flat-thread-id-in-group. (This flat id then goes into an
expanded version of the AS->MS payload, the code for which was
pre-existing.)
The MS will either treat the incoming AS thread id as its unique
thread-group-within-the-whole-dispatch id.
If the AS is not active, the instrumentation herein will synthesize a
flat id in the same way as the AS did before it passed that id through
the payload, again from the DispatchMesh parameters (newly-added params
to that pass) and the flat-thread-in-group.

In addition to the new filecheck tests for this, there is also a new
filecheck test to cover coercion of non-i32 types to i32 before being
written to PIX's output UAV, which I happened to notice wasn't
adequately tested.
2024-05-13 13:08:26 -07:00
Antonio Maiorano 773b012727
Fix invalid IR from scalarrepl-param-hlsl in ReplaceConstantWithInst (#6556)
ReplaceConstantWithInst(C, V) replaces uses of C in the current function
with V. If such a use C is an instruction I, the it replaces uses of C
in I with V. However, this function did not make sure to only perform
this replacement if V dominates I. As a result, it may end up replacing
uses of C in instructions before the definition of V.

The fix is to lazily compute the dominator tree in
ReplaceConstantWithInst so that we can guard the replacement with that
dominance check.
2024-05-06 16:55:20 -04:00
Jeff Noyle 381750e25b
PIX: Cope with arbitrary <32 integer widths in debugger pass (#6551)
Who knew codegen could generate i25 as part of a switch argument? Not
me!
2024-05-02 13:36:48 -07:00
Xiang Li 00cd8233d5
[DxilOp] Allow generation of illegal DXIL operations. (#6543)
This change removes the IsOverloadLegal check in OP::GetOpFunc.

It will permit the generation of illegal DXIL operations. Subsequently,
the validation should catch these illegal DXIL operations if they are
not optimized later in SimplifyDxilCall.


Fixes #6410
2024-04-26 20:52:46 +00:00
Greg Roth 1c66c4d798
Disable non-legacy cbuffer flags (#6559)
These flags had some effect on load code generation, but never worked
right and were not tested in the DXC compiler let alone with drivers.
This removes any notion of the flags beyond option processing and
produces a warning message that the flag is not supported and will be
ignored. The flags are both not listed in help anymore either. A simple
OptionsTest and lit test verify that the warning message is produced for either
flag and existing tests that used the flag are updated or removed.

Incidentally fixed the capitalization of an existing options warning message.

Fixes #6306
2024-04-23 11:59:49 -07:00
Adam Yang 4242b576ed
Fixed crash in loop unroll caused by bug in structurize loop exits (#6548)
Fixed a bug in `hlsl::RemoveUnstructuredLoopExits` where when a new
exiting block is created from splitting, it was added to the current
loop being processed, when it could also part of an inner loop. Not
adding the new block to inner loops that it's part of makes the inner
loops malformed, and causes crash.

This fix adds the new block to the inner most loop that it should be
part of. Also adds the `StructurizeLoopExits` option to `loop-unroll`
pass, which was missing before.
2024-04-23 00:17:48 +00:00
Jannik Silvanus e2f0b98440
Don't reuse WaveGetLaneIndex result across thread repacking points (#5607)
Wave intrinsics such as `WaveGetLaneIndex()` are invalidated at DXR
thread repacking
points such as `CallShader()`. We were however reusing the result of
`WaveGetLaneIndex()`.

Fix this by marking it as `Readonly` instead of `Readnone`.

Add a test case that also covers other wave intrisics, which are handled
correctly.

Fixes #5034.

---------

Co-authored-by: Jannik Silvanus <jasilvanus@users.noreply.github.com>
2024-04-18 13:02:34 -07:00
David Neto dc84d72d18
Fix recursive zero replacement bottoming out in scalars. (#6516)
During memcpy replacement, the scalarrepl-param-hlsl pass will replace
certain uses of a zero-initialized global variable. The pass uses a
recursive algorithm to replace uses of that global by a zero value. The
current code expects the recursion to bottom out in an aggregate type.
But it can bottom out in a scalar integer or scalar float.

This patch fixes the scalar cases.

Lit-based pass tests are included.
2024-04-12 11:02:20 -04:00
Cooper Partin 422a60e6d0
Update locale setting logic to support UTF-8 only for Apple and Linux distros (#6488)
This commit fixes the setlocale( ) logic to include the UTF-8 supported
string value for Mariner distros.

It also introduces a new RAII class `ScopedLocale' which ensure that the
locale setting is set/reset during string conversion operations.

Fixes: #6201

---------

Co-authored-by: Cooper Partin <coopp@ntdev.microsoft.com>
Co-authored-by: cooppunix <cooppunix&mariner.com>
2024-04-10 10:04:32 -07:00
Xiang Li 8322cabfc0
[ClangCL] fix clang-cl build warning. (#6492)
Fix sign-compare warnings.
2024-04-03 12:20:49 -07:00
Fumitoshi Ukai 773246eddc
fix #include path separator from backslash to slash (#6379)
backslash is not portable, so use slash instead.
2024-03-28 09:34:03 -07:00
Greg Roth dd14fe5a53
Allow disabling payload access qualifiers in 6.8+ (#6468)
The original payload access qualifiers change disallowed disabling them
on 6.8+ versions. This wasn't an intended feature of 6.8. We'd prefer to
maintain the ability to disable these when needed to avoid issues that
might crop up.

This change extends respect of the disable-payload-qualifiers flag for
all versions. It rewords the warning that is produced when they are
encountered when disabled to be more applicable. Tests are also added to
verify that the implicit enabling in 6.7+ works in the basic cases and
also that disabling in the 6.7+ cases works as well.

Fixes #6462

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-03-28 01:38:05 -07:00
Antonio Maiorano 63ebbb1430
Fix ASAN use-after-free on unreferenced self-assignment of struct instance (#6466)
When deleting an unused memcpy, ScalarReplAggregatesHLSL was attempting
to delete both the target and the source of the memcpy without first
checking if they were both same, resulting in a double-delete.
2024-03-26 21:48:40 +00:00
Fumitoshi Ukai e4fa1c5184
fix non-portable include path (#6380)
d3dcommon.h seems to be canonical path name in Win SDK

Windows Kits/10/Include/10.0.22621.0/um/d3dcommon.h

so use d3dcommon.h rather than D3Dcommon.h
2024-03-26 09:44:44 -07:00
Natalie Chouinard 122464aeef
Require valid outputcontrolpoints on Hull shader (#6196)
According to the documentation: "A hull shader is implemented with an
HLSL function, and has the following properties: [...] The shader output
is between 1 and 32 control points".


https://learn.microsoft.com/en-us/windows/win32/direct3d11/direct3d-11-advanced-stages-tessellation#hull-shader-stage

This change adds a check to verify that the outputcontrolpoints
attribute is set on Hull shaders and has an argument in the valid range.

Fixes #3733 (by making the error consistent between backends)
2024-03-25 13:41:12 -04:00
Natalie Chouinard 8de82c9415
[SPIR-V] Add support for Denorm Mode (#6435)
The -denorm option allows the shader to select the desired behavior with
respect to denormal values.

Note that this can't take advantage of the SPIRV-Tools capability
trimming pass for the same reasons as described in:
https://github.com/microsoft/DirectXShaderCompiler/pull/6248#discussion_r1481073348

Fixes #6434
2024-03-25 11:42:53 +01:00
Chris B 42294bff98
Fix HLMatrixLowerPass leaving call to dangling FunctionVal (#6441)
When lowering an hl.cast, when the operand was an undef matrix, the pass
would insert a call to a mat2vec stub, but since the undef value is not
an alloca, it never gets handled, and the call to the temporary stub
remains. Since the stub FunctionVal gets deleted, when the instruction
is accessed in a future pass, it reads a dangling pointer.

The fix is to handle undef similarly to how constant 0 is handled, and
to return an undef vector from lowerHLCast.

---------

Signed-off-by: Chris Bieneman <cbieneman@microsoft.com>
Co-authored-by: Antonio Maiorano <amaiorano@google.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-03-21 00:03:41 +00:00
Antonio Maiorano 227ea3d488
Fix ubsan failure on uninitialized read of bool param to WideCharToMultiByte (#6429)
If last arg lpUsedDefaultChar is non-null, the function must set this
value to true or false. WideToEncodedString conditionally passes in a
pointer to 'usedDefaultChar', which was not getting written to, and
causing ubsan to fail on reading an invalid boolean value:
```
.../lib/DxcSupport/Unicode.cpp:156:14: runtime error: load of value 208, which is not a valid value for type 'BOOL' (aka 'bool')
```
2024-03-20 15:42:01 -04:00
Cooper Partin 3704efb2bc
Implement -ftime-trace-granularity option to set time granularity value for the time profiler (#6381)
This commit adds a new optional -ftime-trace-granularity option that is
already implemented in llvm-project. This change is a surgical port of
an existing feature from the upstream llvm-project repo into the DXC
codebase.

The following commits in the llvm-project repo were copied and followed
for this change. Clean cherry picks were not possible due to the
differences in repos like change of file locations and other dependant
changes made in the repo.

*** Adds the granularity configuration setting ***
'Time profiler: small fixes and optimizations'
Commit: 26536728591d5fdac373ef535ae122b873f73292

*** Wires up the commandline option -ftime-trace-granularity to the
TraceProfiler code ***
'[Support] Fix -ftime-trace-granularity option'
Commit: 4fdcabf259c4ab94654e6cd5d95d0e0313159c70

Fixes #6372

---------

Signed-off-by: Cooper Partin <coopp@ntdev.microsoft.com>
Co-authored-by: Cooper Partin <coopp@ntdev.microsoft.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-03-05 09:02:15 -08:00
Tex Riddell 66ba5a1529
Add alignment to NodeRecordType including DXIL metadata update (#6279)
This change adds NodeRecordType alignment field to RDAT to make it
possible to validate pointer and stride alignment in the runtime.

This includes a change to DXIL metadata to preserve the record alignment
without requiring recovery by looking for GetNodeRecordPtr.

Fixes #6270
2024-02-29 11:22:56 -08:00
Jeff Noyle 5a31785905
PIX: Preclude debug output of ray query handle from phi or select (since the value isn't really an int, it's a handle) (#6309)
AllocateRayQuery returns an int, but it's not really an int: it's a
handle to the query. Thus, it's not sensible for the PIX debugging
instrumentation to attempt to write it out to the debug-data-UAV. The
previous code did an explicit check against the type of the value to be
written, but if that value is actually a phi itself, then that check
would fail. So now we recursively run through the phi values looking to
see if any of its antecedents are a phi, and if so, refuse to send its
value to the UAV.
2024-02-27 11:00:23 -08:00
Jonas Meyer f0970beaf8
Expose clang option `-fdiagnostics-format=` in dxc (#6357)
This change exposes the clang 3.7 `-fdiagnostics-format=` flag in DXC with supported values `clang`, `msvc`, `mdvc-fallback`, and `vi`.

This option allows having DXC output diagnostic messages in string formats that are recognized natively by Visual Studio.

Resolves #6160 & #1811, both of which had been closed as not planned.
2024-02-27 12:18:42 -06:00
Tex Riddell 696a13a2a8
DxilValidation: add ShaderCompatInfo call graph checking (#6332)
ShaderCompatInfo identifies compatibility for functions that could be
called from an entry point.
Currently, this checking detects compatibility problems between entry
points and internal function calls that the validator otherwise misses.

This change adds a check for ShaderCompatInfo, recursing into called
functions looking for a source of conflict when not compatible with the
current entry point properties.
Errors are emitted for the entry point and at one source of each type of
conflict that ShaderCompatInfo detects.
A function is considered the source of a conflict when it has the
conflict but none of the functions called from this function have this
conflict.

Removed early exit for ShaderFlags validation when module is a library,
since these flags should be validated for libraries, and running
CollectShaderFlagsForModule fills in the ShaderCompatInfo data we need
for validation.

Also fixed tests for illegal barrier and derivative use, as well as
replacing the ignored MaxOutputRecords with the correct MaxRecords
attribute.

Fixes #6292.
2024-02-26 15:02:28 -08:00
Tex Riddell 22bb0786b6
Fix downlevel validator compatibility with DXR 1.1 flag (#6333)
It turns out that in the prior validator version, if a subobject
required DXR 1.1, the DXR 1.1 flag would be set on each function in RDAT
output, as well as the global flags. It didn't appear this was the case
through a D3DReflect test because that goes through disassembly to
assembly step, where subobjects are lost because they aren't in the llvm
IR. The previous change assumed this was not the case when the
subobjects were removed, but this removal occurs after the RDATWriter
constructor, which does the full RDAT serialization since size if
required right away.

This restores the detection code and hooks it into
DxilModule::ComputeShaderCompatInfo when validator version is in range
[1.5, 1.7]. DXR 1.1 was introduced in 1.5, and we no longer tag every
function as requiring DXR 1.1 based on subobjects in 1.8.

At the moment, there is no way to CHECK the subobject RDAT in D3DReflect
because they get stripped from the module (even reflection and debug
modules) before serialization, and the test path for D3DReflect goes
through a disassemble/re-assemble step between the prior stage and the
D3DReflect stage.

Fixes #6321 (really).
2024-02-20 17:58:45 -08:00
Xiang Li 5dee81ce1d
Fix crash in DXIL.dll caused by illegal DXIL intrinsic. (#6302)
Replace assert on illegal DXIL op with return illegal value. Check the
illegal cases in validation.

Fixes #6168
2024-02-20 11:07:16 -08:00
Tex Riddell 2314d06cfe
New barrier DXIL Op fixes for Validation and RDAT function compatibility info (#6291)
Fix barrier allowed ops and flags by shader kind

New barrier operations lacked validation and for RDAT info: had
incorrect min target and shader stage flags.

- Identify barrier DXIL operations with new `is_barrier` in `hctdb.py`
and generated `OP::IsDxilOpBarrier`.
- Identify when a barrier op requires shader stage with group
(compute-like stage), or when it requires node memory.
- Add new `OptFeatureInfo_RequiresGroup` to identify function only
compatible with a shader stage with a visible group for access to
groupshared memory or use of group sync.
- Translate to original `BarrierMode` when compatible; adds
`BarrierMode::Invalid` to identify invalid cases.
- Account for `DXIL::MemoryTypeFlags::AllMemory` being allowed and
auto-masked by driver.
- Properly set min shader model and compatible shader stage flags.
- Validate barrier for shader stage.
- Added new barriers to counters which were missing.

Adressing parts of: #6256 and #6292
Fixes #6266
2024-02-20 10:44:34 -08:00
Tex Riddell fafbc42725
Don't set RaytracingTier1_1 based on subobjects; enable flag validation (#6320)
In validator version 1.7 and below, the RaytracingTier1_1 module flag
was set on every function if any StateObjectConfig subobject had the
AllowStateObjectAdditions flag set. This was incorrect, as the
requirement is validated in the runtime based on the use of the
subobject instead. Subobjects are removed from the module and placed in
RDAT during container serialization, so the requirement would be lost
when recomputing the flags in validation. This didn't break validation
because flag validation was completely disabled for libraries!

This change fixes this problem, and allows DxilValidation to validate
ShaderFlags because they will no longer mismatch due to this issue.
Running CollectShaderFlagsForModule is also necessary for collecting
per-function shader flags which will be used by call graph validation in
a subsequent change, so enabling flag validation unblocks that as well.

Fixes #6321
2024-02-16 23:16:14 +00:00
Xiang Li 823125b32e
[DXIL generation] Merge GepUse last to avoid crash in EmitGetNodeRecordPtrAndUpdateUsers (#6314)
In EmitGetNodeRecordPtrAndUpdateUsers, the type will be mutated. And the
GEP user of the RecordPtr will be merged at same time. This make things
complex because the GEP index need to be updated since type is mutated.

To make things easier, merge the GepUse after mutate type.

Fixes #6223
2024-02-16 13:34:57 -08:00
Jeff Noyle 64cdb9cfc0
PIX: Deduplicate globals when referenced in multiple library fns (#6305)
PIX's code for parsing debug data operates at the module level. When the
same global is referenced by multiple functions in a module, that
variable is referred to by multiple dbg.value/dbg.declare statements,
and those are mapped (by the PIX passes) to multiple fake allocas using
its usual scheme. This code was written before libraries were a thing,
and wasn't expecting this duplication. A little more attention to the
variable's scope fixes the issue.
Also, the changed code's original "return false" broke the whole process
of discovering variables with the results that PIX's shader debugger
locals window was completely empty. Makes more sense to ignore the one
variable and keep going.
2024-02-14 12:37:11 -08:00
Jeff Noyle 446da195b4
PIX: Optimize debug instrumentation for fewest emitted instructions (#6281)
This change switches from instrumentation per instruction to
instrumentation per basic block. Furthermore, not every instruction in a
basic block needs to write debug data to the output UAV- it's enough to
know that the basic block was entered, as long as the calling
application can figure out what instructions were in that block. To
support that knowledge, the pass now emits a text "precis" of each basic
block.

Also, the previous branchless UAV bounds enforcement was replaced with
something similar that emits fewer instructions at the cost of a larger
UAV. This tradeoff is WELL worth it.

Additionally, the debug pass used to add extra blocks in order to
solidify the arguments to phi instructions. This work was unnecessary,
and added a lot of complexity to the resulting instrumented shader. The
debugger application is only interested in the value of the phi itself
and the actual value produced via the actual preceding edge.
 
Full details are in the comments in the code.

This change reduces driver-side compilation overhead from "overnight" to
2 minutes on a 160k-instruction shader.
2024-02-13 08:08:22 -08:00
Tex Riddell 627e400dea
NFC: Fix switch with missing default case in DxilShaderModel.cpp (#6288)
This missing default case statement would cause build break when
compiling with additional warning settings, such as /W4 /WX in VC++.

Fixes #6287.
2024-02-13 01:42:59 +00:00
Tex Riddell b3be6f2522
Correct metadata load for NodeRecordType for robustness (#6277)
Metadata loading for node record properties assumed fixed positions for
values in a (key,value) property list, skipping and ignoring the keys.

This was not correct, fragile, and not the intended implementation of
the key,value property pattern.

There is no functional change, but this makes the loader robust to
additional properties instead of crashing or loading the wrong thing.

Fixes #6276
2024-02-12 19:26:48 +00:00
Joshua Batista cadf3bfed1
[Sema] Add final touches to wave size range implementation (#6202)
Upon further discussion, the team has agreed that in certain degenerate
cases, the current diagnostics are insufficient.
In the case that min == max in the wave size range attribute, a
defaultError warning should be emitted. Additionally, there should be an
explicit way to handle the case where 0,0,0 is passed to the wavesize
range attribute.
This PR directly handles and tests both of these cases.
Fixes #6161

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-02-10 02:15:59 +00:00
Tex Riddell 9c518dbd94
RDAT: Check flags and WaveSize for min SM; fix flag detection and merging (#6207)
Add ShaderKind::Last_1_8 for shader mask
Add shader model comments before flag groupings in DxilConstants.h and DxilShaderFlags.

Add missing flag checks for min shader model in RDAT function info. Move ShaderCompatInfo computation into DxilModule, propagate callee info.

Move computation of shader model requirements based on flags into DxilModule. Finalize requirements for entry functions in AdjustMinimumShaderModelAndFlags.

Fixes for function level flag tracking:
- DerivativesInMeshAndAmpShaders: use flag to track deriv use, then adjust for entry functions.
- hasUAVs: based on resource use in function instead of global resources.
- WriteableMSAATextures: based on use in function instead of global resources.
  - Also catch cases for dynamic res from any use by looking at create/annotate handle, not just the TextureStoreSample op.
- RaytracingTier1_1: move module-level detection to CollectShaderFlagsForModule
- Marked deriv and quad ops as being supported in node.
- Fixed SampleCmpBias to be considered gradient op.
- Update RDAT definitions to dump more useful info for testing

Fixes #6218.
2024-02-09 17:14:12 -08:00