DirectXShaderCompiler

Граф коммитов

Автор	SHA1	Сообщение	Дата
Zhengxing Li	38830016ad	Fix non-determinism in Reassociate caused by address coincidences This PR pulls the upstream change, Fix non-determinism in Reassociate caused by address coincidences (`ef8761fd3b`), into DXC. Here's the summary of the change: Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism. Thanks to Roman Tereshin for the test case. This is part 6 (the last part) of the fix for #6659.	2024-05-31 09:24:14 -07:00
Zhengxing Li	e5d8f7e91e	Add a new WeakVH value handle; NFC This PR pulls the upstream change, UAdd a new WeakVH value handle; NFC (`f1c0eafd5b`), into DXC. Here's the summary of the change: WeakVH nulls itself out if the value it was tracking gets deleted, but it does not track RAUW. Reviewers: dblaikie, davide Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32267 This is part 4 of the fix for #6659.	2024-05-30 21:48:24 -07:00
Zhengxing Li	e77fa1919c	Rename WeakVH to WeakTrackingVH; NFC This PR pulls the upstream change, Rename WeakVH to WeakTrackingVH; NFC (`e6bca0eecb`), into DXC. Here's the summary of the change: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 This is part 3 of the fix for #6659.	2024-05-30 21:20:35 -07:00
Zhengxing Li	8d4df0d4e0	Emulate TrackingVH using WeakVH This PR pulls the upstream change, Emulate TrackingVH using WeakVH (`8a6238201f`), into DXC. Here's the summary of the change: This frees up one slot in the HandleBaseKind enum, which I will use later to add a new kind of value handle. The size of the HandleBaseKind enum is important because we store a HandleBaseKind in the low two bits of a (in the worst case) 4 byte aligned pointer. Reviewers: davide, chandlerc Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32634 This is part 2 of the fix for #6659.	2024-05-30 20:38:49 -07:00
Zhengxing Li	2378f9b705	Use accessors for ValueHandleBase::V; NFC This PR pulls the upstream change, Use accessors for ValueHandleBase::V; NFC (`6f08789d30`), into DXC. Here's the summary of the change: This changes code that touches ValueHandleBase::V to go through getValPtr and (newly added) setValPtr. This functionality will be used later, but also seemed like a generally good cleanup. I also renamed the field to Val, but that's just to make it obvious that I fixed all the uses. This is part 1 of the fix for #6659.	2024-05-30 13:38:27 -07:00
David Neto	3a78b67849	NFC: Comment, refactor, and test hlsl::RemoveUnstructuredLoopExits (#6655 ) Add a pass to run hlsl::RemoveUnstructuredLoopExits in isolation Example: opt -dxil-r-u-l-e a.ll -S Add some basic tests. No functional change to the pass itself.	2024-05-30 09:43:39 -04:00
David Neto	a1b945c1a3	Loop exit restructurizer: don't iterate over uses while mutating them (#6644 ) The SkipBlockWithBranch function does the following: - Splits the block into three blocks with an if-then-endif structure. - Moves most instructions from the original block into the "then" block - If any of those values are used outside the original block, they are propagated through newly-constructed phis in the 'endif' block. This algorithm had a bug where the uses of a value were being scanned while the uses were also being updated. In some cases a downstream out-of-block use could be skipped. That results in an invalid module because now the original definition is now in the 'then' block, which does not dominate the downstream out-of-block use. Add a test that demonstrates the problem.	2024-05-23 16:00:37 +00:00
Antonio Maiorano	b41d8a9478	Fix LoopDeletion incorrectly updating PHI with multiple duplicate inputs (#6643 ) LoopDeletion was incorrectly updating PHI nodes in the target block when it had duplicate input edges. This happens, for example, when deleting a loop that uses a switch with multiple cases that exit the same way. After determining that this was the bug, I found this fix in LLVM: https://reviews.llvm.org/D34516 and applied it here.	2024-05-23 10:21:30 -04:00
Greg Roth	a6f4025957	Calculate preferred alignment when lowering groupshared matrices (#6589 ) When flattening the global for a groupshared matrix, the alignment information was getting lost. As a result, the alignments of the loads and stores were calculating their own alignment based on preferred alignment and trailing zeros of the index. The preferred alignment switched to 16 when the type size was over 128 bits due to a heuristic whose rationale is lost to time. When the global has its own alignment, that gets used, so by calculating it at lowering, the alignments are consistent and reliable. Includes testing for a few matrix variants and a pass test. fixes #6416	2024-05-22 13:38:51 -07:00
Zhengxing li	9ee3f23d9c	More aggressive reassociations (#6626 ) Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (https://github.com/llvm/llvm-project/commit/b8a330c) in this PR (https://github.com/microsoft/DirectXShaderCompiler/pull/6598), it still might overlook some obvious common factors. One case has been observed is: ``` %Float4_0 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3 %Float2_0 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1 /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters / %Float4_1 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) / %Float4_1.w is redundant with %Float4_0.w / %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 / %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters / %Float2_1 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) / %Float2_1.y is redundant with %Float2_0.y / %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1 .... %11 = fmul fast float %Float4_0.w, %10 %12 = fmul fast float %11, %Float2_0.y .... %14 = fmul fast float %Float4_1.w, %13 %15 = fmul fast float %14, %Float2_1.y (%Float4_0.w %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor ``` The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass. For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass. Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back. This is part 3 of the fix for https://github.com/microsoft/DirectXShaderCompiler/issues/6593.	2024-05-21 15:37:04 -07:00
Zhengxing li	1ee70fdc64	Add a flag for the upstream global reassociation algorithm change (#6625 ) This PR (https://github.com/microsoft/DirectXShaderCompiler/pull/6598) pulls the upstream global reassociation algorithm change in DXC and can reduce redundant calculations obviously. However, from the testing result of a large offline suite of shaders, some shaders got worse compilation results and couldn't benefit from this upstream change. This PR adds a flag for the upstream global reassociation change. It would be easier to roll back if a shader get worse compilation result due to this upstream change. This is part 2 of the fix for #6593.	2024-05-21 13:45:38 -07:00
Zhengxing li	6f9c107b78	Reassociate: add global reassociation algorithm (#6598 ) This PR pulls the upstream change, Reassociate: add global reassociation algorithm (`b8a330c42a`), into DXC with miminal changes. For the code below: foo = (a * b) * c bar = (a * d) * c As the upstream change states, it can identify the a*c is a common factor and redundant. This is part 1 of the fix for #6593.	2024-05-21 11:55:40 -07:00
Antonio Maiorano	348040254e	Fix use-after-free in SimplifyCFG (#6628 ) When SimplifySwitchOnSelect calls SimplifyTerminatorOnSelect, it holds onto the select's condition value to use for the conditional branch it replaces the switch with. When removing the switch's unused predecessors, it must make sure not to delete PHIs in case one of them is used by the condition value, otherwise the condition value itself may get deleted, resulting in an use-after-free. Note that this was fixed in LLVM as well: `dc3b67b4ca`	2024-05-17 03:22:52 +00:00
Antonio Maiorano	cf566e1f3f	Fix dxil-remove-dead-blocks removing switch with multiple same successor (#6610 ) Given a switch with a constant condition and all cases the same (branching to the same successor), dxil-remove-dead-blocks would incorrectly remove the switch when replacing it with a branch, by forgetting to remove the N-1 incoming values to the PHIs in the successor block.	2024-05-16 21:13:01 -04:00
Jeff Noyle	fd7e54bcd5	Pix: Cope with group shared AS->MS payload (#6619 ) This change copes with the AS->MS payload being placed in group-shared by the application (and MSFT's samples do indeed do this). (TIL, thanks to pow2clk, that the spec says that the payload counts against the group-shared total, implying, if not explicitly stating, that at least on some platforms, the payload will be in group-shared anyway.) The MS pass needs to be given data from the AS about the AS's thread group topology, and this is done by extending the payload struct to add three uints. This can't be done when the payload is resident in group-shared, of course, because that would change the layout of group-shared memory. So the new approach here is to copy the payload to a new alloca (in the default address space) struct with the members of the base struct plus the extended data the MS needs, and then to copy piece-wise because llvm.memcpy isn't appropriate for group-shared-to-normal address space copies.	2024-05-15 14:49:22 -07:00
Jeff Noyle	ff623f8a74	PIX: Emit correct input sig elements and view ID state into PSV0 when input sigs are changed by debug instrumentation (#6351 ) Two intertwined issues: 1: The shader-debug-instrumentation and pixel-hit-counter passes add SV_VertexId+SV_InstanceId to the input sig for a VS if they are not present, and SV_Position for the PS. The changes in PixPassHelpers.cpp centralize the SV_Position case and fix up a few missing fields whose absence was breaking PIX shader debugging on WARP. The calller now has to pass the upstream shader's SV_Position row, if it exists. (PIX has been updated to do this.) 2: However, independent of these tweaks, adding these new values means that the view id metadata would be incorrect. A long-standing assert would fire, but is herein fixed, since the above work exacerbated this problem. The assert in question is this one in CopyViewIDStateForOutputToPSV that previously fired when a debug-instrumented shader was subsequently wrapped into a container: ``` DXASSERT_NOMSG((InputScalars <= IOTable.InputVectors * 4) && (IOTable.InputVectors * 4 - InputScalars < 4)); ``` (InputScalars, which comes from the serialized state in PSV0, was too small, since it did not include the added system values.) To fix up these data, the caller now has to invoke the generate-view-id-state pass, and follow that with the emit-metadata pass, since emitting the metadata in the debug pass itself is now too early (and that call has been deleted). (PIX has been updated to do invoke these passes.) These changes together are sufficient to allow PIX shader debugging to operate on WARP in those cases where these system-values were not included by the application, with correct view-id metadata, and the assert is happy.	2024-05-14 17:09:48 -07:00
Jeff Noyle	f9389db7c3	PIX: Correct the disambiguation of AS+MS threads for mesh shader output (#6592 ) PIX requires that all vertex information generated by these passes be uniquely identified by vertex id and MS thread id. This change fixes the MS thread id part in two places: the amplification shader and the mesh shader. To be unique across an entire DispatchMesh call, we must uniquify the AS thread group, the AS thread, the MS thread group and the MS thread. This is a lot of multiplying and adding, and there wasn't quite enough math going on here before. In the AS case, we now generate a unique "flat" thread id from the flat-thread-id-in-group (the already-available system value) and the "flat group id", which we synthesize by multiplying together the group id components with the DispatchMesh API's thread group counts, and then multiplying that by the number of threads each AS group launches, then add the flat-thread-id-in-group. (This flat id then goes into an expanded version of the AS->MS payload, the code for which was pre-existing.) The MS will either treat the incoming AS thread id as its unique thread-group-within-the-whole-dispatch id. If the AS is not active, the instrumentation herein will synthesize a flat id in the same way as the AS did before it passed that id through the payload, again from the DispatchMesh parameters (newly-added params to that pass) and the flat-thread-in-group. In addition to the new filecheck tests for this, there is also a new filecheck test to cover coercion of non-i32 types to i32 before being written to PIX's output UAV, which I happened to notice wasn't adequately tested.	2024-05-13 13:08:26 -07:00
Antonio Maiorano	773b012727	Fix invalid IR from scalarrepl-param-hlsl in ReplaceConstantWithInst (#6556 ) ReplaceConstantWithInst(C, V) replaces uses of C in the current function with V. If such a use C is an instruction I, the it replaces uses of C in I with V. However, this function did not make sure to only perform this replacement if V dominates I. As a result, it may end up replacing uses of C in instructions before the definition of V. The fix is to lazily compute the dominator tree in ReplaceConstantWithInst so that we can guard the replacement with that dominance check.	2024-05-06 16:55:20 -04:00
Jeff Noyle	381750e25b	PIX: Cope with arbitrary <32 integer widths in debugger pass (#6551 ) Who knew codegen could generate i25 as part of a switch argument? Not me!	2024-05-02 13:36:48 -07:00
Xiang Li	00cd8233d5	[DxilOp] Allow generation of illegal DXIL operations. (#6543 ) This change removes the IsOverloadLegal check in OP::GetOpFunc. It will permit the generation of illegal DXIL operations. Subsequently, the validation should catch these illegal DXIL operations if they are not optimized later in SimplifyDxilCall. Fixes #6410	2024-04-26 20:52:46 +00:00
Greg Roth	1c66c4d798	Disable non-legacy cbuffer flags (#6559 ) These flags had some effect on load code generation, but never worked right and were not tested in the DXC compiler let alone with drivers. This removes any notion of the flags beyond option processing and produces a warning message that the flag is not supported and will be ignored. The flags are both not listed in help anymore either. A simple OptionsTest and lit test verify that the warning message is produced for either flag and existing tests that used the flag are updated or removed. Incidentally fixed the capitalization of an existing options warning message. Fixes #6306	2024-04-23 11:59:49 -07:00
Adam Yang	4242b576ed	Fixed crash in loop unroll caused by bug in structurize loop exits (#6548 ) Fixed a bug in `hlsl::RemoveUnstructuredLoopExits` where when a new exiting block is created from splitting, it was added to the current loop being processed, when it could also part of an inner loop. Not adding the new block to inner loops that it's part of makes the inner loops malformed, and causes crash. This fix adds the new block to the inner most loop that it should be part of. Also adds the `StructurizeLoopExits` option to `loop-unroll` pass, which was missing before.	2024-04-23 00:17:48 +00:00
Jannik Silvanus	e2f0b98440	Don't reuse WaveGetLaneIndex result across thread repacking points (#5607 ) Wave intrinsics such as `WaveGetLaneIndex()` are invalidated at DXR thread repacking points such as `CallShader()`. We were however reusing the result of `WaveGetLaneIndex()`. Fix this by marking it as `Readonly` instead of `Readnone`. Add a test case that also covers other wave intrisics, which are handled correctly. Fixes #5034. --------- Co-authored-by: Jannik Silvanus <jasilvanus@users.noreply.github.com>	2024-04-18 13:02:34 -07:00
David Neto	dc84d72d18	Fix recursive zero replacement bottoming out in scalars. (#6516 ) During memcpy replacement, the scalarrepl-param-hlsl pass will replace certain uses of a zero-initialized global variable. The pass uses a recursive algorithm to replace uses of that global by a zero value. The current code expects the recursion to bottom out in an aggregate type. But it can bottom out in a scalar integer or scalar float. This patch fixes the scalar cases. Lit-based pass tests are included.	2024-04-12 11:02:20 -04:00
Cooper Partin	422a60e6d0	Update locale setting logic to support UTF-8 only for Apple and Linux distros (#6488 ) This commit fixes the setlocale( ) logic to include the UTF-8 supported string value for Mariner distros. It also introduces a new RAII class `ScopedLocale' which ensure that the locale setting is set/reset during string conversion operations. Fixes: #6201 --------- Co-authored-by: Cooper Partin <coopp@ntdev.microsoft.com> Co-authored-by: cooppunix <cooppunix&mariner.com>	2024-04-10 10:04:32 -07:00
Xiang Li	8322cabfc0	[ClangCL] fix clang-cl build warning. (#6492 ) Fix sign-compare warnings.	2024-04-03 12:20:49 -07:00
Fumitoshi Ukai	773246eddc	fix #include path separator from backslash to slash (#6379 ) backslash is not portable, so use slash instead.	2024-03-28 09:34:03 -07:00
Greg Roth	dd14fe5a53	Allow disabling payload access qualifiers in 6.8+ (#6468 ) The original payload access qualifiers change disallowed disabling them on 6.8+ versions. This wasn't an intended feature of 6.8. We'd prefer to maintain the ability to disable these when needed to avoid issues that might crop up. This change extends respect of the disable-payload-qualifiers flag for all versions. It rewords the warning that is produced when they are encountered when disabled to be more applicable. Tests are also added to verify that the implicit enabling in 6.7+ works in the basic cases and also that disabling in the 6.7+ cases works as well. Fixes #6462 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-03-28 01:38:05 -07:00
Antonio Maiorano	63ebbb1430	Fix ASAN use-after-free on unreferenced self-assignment of struct instance (#6466 ) When deleting an unused memcpy, ScalarReplAggregatesHLSL was attempting to delete both the target and the source of the memcpy without first checking if they were both same, resulting in a double-delete.	2024-03-26 21:48:40 +00:00
Fumitoshi Ukai	e4fa1c5184	fix non-portable include path (#6380 ) d3dcommon.h seems to be canonical path name in Win SDK Windows Kits/10/Include/10.0.22621.0/um/d3dcommon.h so use d3dcommon.h rather than D3Dcommon.h	2024-03-26 09:44:44 -07:00
Natalie Chouinard	122464aeef	Require valid outputcontrolpoints on Hull shader (#6196 ) According to the documentation: "A hull shader is implemented with an HLSL function, and has the following properties: [...] The shader output is between 1 and 32 control points". https://learn.microsoft.com/en-us/windows/win32/direct3d11/direct3d-11-advanced-stages-tessellation#hull-shader-stage This change adds a check to verify that the outputcontrolpoints attribute is set on Hull shaders and has an argument in the valid range. Fixes #3733 (by making the error consistent between backends)	2024-03-25 13:41:12 -04:00
Natalie Chouinard	8de82c9415	[SPIR-V] Add support for Denorm Mode (#6435 ) The -denorm option allows the shader to select the desired behavior with respect to denormal values. Note that this can't take advantage of the SPIRV-Tools capability trimming pass for the same reasons as described in: https://github.com/microsoft/DirectXShaderCompiler/pull/6248#discussion_r1481073348 Fixes #6434	2024-03-25 11:42:53 +01:00
Chris B	42294bff98	Fix HLMatrixLowerPass leaving call to dangling FunctionVal (#6441 ) When lowering an hl.cast, when the operand was an undef matrix, the pass would insert a call to a mat2vec stub, but since the undef value is not an alloca, it never gets handled, and the call to the temporary stub remains. Since the stub FunctionVal gets deleted, when the instruction is accessed in a future pass, it reads a dangling pointer. The fix is to handle undef similarly to how constant 0 is handled, and to return an undef vector from lowerHLCast. --------- Signed-off-by: Chris Bieneman <cbieneman@microsoft.com> Co-authored-by: Antonio Maiorano <amaiorano@google.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-03-21 00:03:41 +00:00
Antonio Maiorano	227ea3d488	Fix ubsan failure on uninitialized read of bool param to WideCharToMultiByte (#6429 ) If last arg lpUsedDefaultChar is non-null, the function must set this value to true or false. WideToEncodedString conditionally passes in a pointer to 'usedDefaultChar', which was not getting written to, and causing ubsan to fail on reading an invalid boolean value: ``` .../lib/DxcSupport/Unicode.cpp:156:14: runtime error: load of value 208, which is not a valid value for type 'BOOL' (aka 'bool') ```	2024-03-20 15:42:01 -04:00
Cooper Partin	3704efb2bc	Implement -ftime-trace-granularity option to set time granularity value for the time profiler (#6381 ) This commit adds a new optional -ftime-trace-granularity option that is already implemented in llvm-project. This change is a surgical port of an existing feature from the upstream llvm-project repo into the DXC codebase. The following commits in the llvm-project repo were copied and followed for this change. Clean cherry picks were not possible due to the differences in repos like change of file locations and other dependant changes made in the repo. * Adds the granularity configuration setting * 'Time profiler: small fixes and optimizations' Commit: 26536728591d5fdac373ef535ae122b873f73292 * Wires up the commandline option -ftime-trace-granularity to the TraceProfiler code * '[Support] Fix -ftime-trace-granularity option' Commit: 4fdcabf259c4ab94654e6cd5d95d0e0313159c70 Fixes #6372 --------- Signed-off-by: Cooper Partin <coopp@ntdev.microsoft.com> Co-authored-by: Cooper Partin <coopp@ntdev.microsoft.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-03-05 09:02:15 -08:00
Tex Riddell	66ba5a1529	Add alignment to NodeRecordType including DXIL metadata update (#6279 ) This change adds NodeRecordType alignment field to RDAT to make it possible to validate pointer and stride alignment in the runtime. This includes a change to DXIL metadata to preserve the record alignment without requiring recovery by looking for GetNodeRecordPtr. Fixes #6270	2024-02-29 11:22:56 -08:00
Jeff Noyle	5a31785905	PIX: Preclude debug output of ray query handle from phi or select (since the value isn't really an int, it's a handle) (#6309 ) AllocateRayQuery returns an int, but it's not really an int: it's a handle to the query. Thus, it's not sensible for the PIX debugging instrumentation to attempt to write it out to the debug-data-UAV. The previous code did an explicit check against the type of the value to be written, but if that value is actually a phi itself, then that check would fail. So now we recursively run through the phi values looking to see if any of its antecedents are a phi, and if so, refuse to send its value to the UAV.	2024-02-27 11:00:23 -08:00
Jonas Meyer	f0970beaf8	Expose clang option `-fdiagnostics-format=` in dxc (#6357 ) This change exposes the clang 3.7 `-fdiagnostics-format=` flag in DXC with supported values `clang`, `msvc`, `mdvc-fallback`, and `vi`. This option allows having DXC output diagnostic messages in string formats that are recognized natively by Visual Studio. Resolves #6160 & #1811, both of which had been closed as not planned.	2024-02-27 12:18:42 -06:00
Tex Riddell	696a13a2a8	DxilValidation: add ShaderCompatInfo call graph checking (#6332 ) ShaderCompatInfo identifies compatibility for functions that could be called from an entry point. Currently, this checking detects compatibility problems between entry points and internal function calls that the validator otherwise misses. This change adds a check for ShaderCompatInfo, recursing into called functions looking for a source of conflict when not compatible with the current entry point properties. Errors are emitted for the entry point and at one source of each type of conflict that ShaderCompatInfo detects. A function is considered the source of a conflict when it has the conflict but none of the functions called from this function have this conflict. Removed early exit for ShaderFlags validation when module is a library, since these flags should be validated for libraries, and running CollectShaderFlagsForModule fills in the ShaderCompatInfo data we need for validation. Also fixed tests for illegal barrier and derivative use, as well as replacing the ignored MaxOutputRecords with the correct MaxRecords attribute. Fixes #6292.	2024-02-26 15:02:28 -08:00
Tex Riddell	22bb0786b6	Fix downlevel validator compatibility with DXR 1.1 flag (#6333 ) It turns out that in the prior validator version, if a subobject required DXR 1.1, the DXR 1.1 flag would be set on each function in RDAT output, as well as the global flags. It didn't appear this was the case through a D3DReflect test because that goes through disassembly to assembly step, where subobjects are lost because they aren't in the llvm IR. The previous change assumed this was not the case when the subobjects were removed, but this removal occurs after the RDATWriter constructor, which does the full RDAT serialization since size if required right away. This restores the detection code and hooks it into DxilModule::ComputeShaderCompatInfo when validator version is in range [1.5, 1.7]. DXR 1.1 was introduced in 1.5, and we no longer tag every function as requiring DXR 1.1 based on subobjects in 1.8. At the moment, there is no way to CHECK the subobject RDAT in D3DReflect because they get stripped from the module (even reflection and debug modules) before serialization, and the test path for D3DReflect goes through a disassemble/re-assemble step between the prior stage and the D3DReflect stage. Fixes #6321 (really).	2024-02-20 17:58:45 -08:00
Xiang Li	5dee81ce1d	Fix crash in DXIL.dll caused by illegal DXIL intrinsic. (#6302 ) Replace assert on illegal DXIL op with return illegal value. Check the illegal cases in validation. Fixes #6168	2024-02-20 11:07:16 -08:00
Tex Riddell	2314d06cfe	New barrier DXIL Op fixes for Validation and RDAT function compatibility info (#6291 ) Fix barrier allowed ops and flags by shader kind New barrier operations lacked validation and for RDAT info: had incorrect min target and shader stage flags. - Identify barrier DXIL operations with new `is_barrier` in `hctdb.py` and generated `OP::IsDxilOpBarrier`. - Identify when a barrier op requires shader stage with group (compute-like stage), or when it requires node memory. - Add new `OptFeatureInfo_RequiresGroup` to identify function only compatible with a shader stage with a visible group for access to groupshared memory or use of group sync. - Translate to original `BarrierMode` when compatible; adds `BarrierMode::Invalid` to identify invalid cases. - Account for `DXIL::MemoryTypeFlags::AllMemory` being allowed and auto-masked by driver. - Properly set min shader model and compatible shader stage flags. - Validate barrier for shader stage. - Added new barriers to counters which were missing. Adressing parts of: #6256 and #6292 Fixes #6266	2024-02-20 10:44:34 -08:00
Tex Riddell	fafbc42725	Don't set RaytracingTier1_1 based on subobjects; enable flag validation (#6320 ) In validator version 1.7 and below, the RaytracingTier1_1 module flag was set on every function if any StateObjectConfig subobject had the AllowStateObjectAdditions flag set. This was incorrect, as the requirement is validated in the runtime based on the use of the subobject instead. Subobjects are removed from the module and placed in RDAT during container serialization, so the requirement would be lost when recomputing the flags in validation. This didn't break validation because flag validation was completely disabled for libraries! This change fixes this problem, and allows DxilValidation to validate ShaderFlags because they will no longer mismatch due to this issue. Running CollectShaderFlagsForModule is also necessary for collecting per-function shader flags which will be used by call graph validation in a subsequent change, so enabling flag validation unblocks that as well. Fixes #6321	2024-02-16 23:16:14 +00:00
Xiang Li	823125b32e	[DXIL generation] Merge GepUse last to avoid crash in EmitGetNodeRecordPtrAndUpdateUsers (#6314 ) In EmitGetNodeRecordPtrAndUpdateUsers, the type will be mutated. And the GEP user of the RecordPtr will be merged at same time. This make things complex because the GEP index need to be updated since type is mutated. To make things easier, merge the GepUse after mutate type. Fixes #6223	2024-02-16 13:34:57 -08:00
Jeff Noyle	64cdb9cfc0	PIX: Deduplicate globals when referenced in multiple library fns (#6305 ) PIX's code for parsing debug data operates at the module level. When the same global is referenced by multiple functions in a module, that variable is referred to by multiple dbg.value/dbg.declare statements, and those are mapped (by the PIX passes) to multiple fake allocas using its usual scheme. This code was written before libraries were a thing, and wasn't expecting this duplication. A little more attention to the variable's scope fixes the issue. Also, the changed code's original "return false" broke the whole process of discovering variables with the results that PIX's shader debugger locals window was completely empty. Makes more sense to ignore the one variable and keep going.	2024-02-14 12:37:11 -08:00
Jeff Noyle	446da195b4	PIX: Optimize debug instrumentation for fewest emitted instructions (#6281 ) This change switches from instrumentation per instruction to instrumentation per basic block. Furthermore, not every instruction in a basic block needs to write debug data to the output UAV- it's enough to know that the basic block was entered, as long as the calling application can figure out what instructions were in that block. To support that knowledge, the pass now emits a text "precis" of each basic block. Also, the previous branchless UAV bounds enforcement was replaced with something similar that emits fewer instructions at the cost of a larger UAV. This tradeoff is WELL worth it. Additionally, the debug pass used to add extra blocks in order to solidify the arguments to phi instructions. This work was unnecessary, and added a lot of complexity to the resulting instrumented shader. The debugger application is only interested in the value of the phi itself and the actual value produced via the actual preceding edge. Full details are in the comments in the code. This change reduces driver-side compilation overhead from "overnight" to 2 minutes on a 160k-instruction shader.	2024-02-13 08:08:22 -08:00
Tex Riddell	627e400dea	NFC: Fix switch with missing default case in DxilShaderModel.cpp (#6288 ) This missing default case statement would cause build break when compiling with additional warning settings, such as /W4 /WX in VC++. Fixes #6287.	2024-02-13 01:42:59 +00:00
Tex Riddell	b3be6f2522	Correct metadata load for NodeRecordType for robustness (#6277 ) Metadata loading for node record properties assumed fixed positions for values in a (key,value) property list, skipping and ignoring the keys. This was not correct, fragile, and not the intended implementation of the key,value property pattern. There is no functional change, but this makes the loader robust to additional properties instead of crashing or loading the wrong thing. Fixes #6276	2024-02-12 19:26:48 +00:00
Joshua Batista	cadf3bfed1	[Sema] Add final touches to wave size range implementation (#6202 ) Upon further discussion, the team has agreed that in certain degenerate cases, the current diagnostics are insufficient. In the case that min == max in the wave size range attribute, a defaultError warning should be emitted. Additionally, there should be an explicit way to handle the case where 0,0,0 is passed to the wavesize range attribute. This PR directly handles and tests both of these cases. Fixes #6161 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-02-10 02:15:59 +00:00
Tex Riddell	9c518dbd94	RDAT: Check flags and WaveSize for min SM; fix flag detection and merging (#6207 ) Add ShaderKind::Last_1_8 for shader mask Add shader model comments before flag groupings in DxilConstants.h and DxilShaderFlags. Add missing flag checks for min shader model in RDAT function info. Move ShaderCompatInfo computation into DxilModule, propagate callee info. Move computation of shader model requirements based on flags into DxilModule. Finalize requirements for entry functions in AdjustMinimumShaderModelAndFlags. Fixes for function level flag tracking: - DerivativesInMeshAndAmpShaders: use flag to track deriv use, then adjust for entry functions. - hasUAVs: based on resource use in function instead of global resources. - WriteableMSAATextures: based on use in function instead of global resources. - Also catch cases for dynamic res from any use by looking at create/annotate handle, not just the TextureStoreSample op. - RaytracingTier1_1: move module-level detection to CollectShaderFlagsForModule - Marked deriv and quad ops as being supported in node. - Fixed SampleCmpBias to be considered gradient op. - Update RDAT definitions to dump more useful info for testing Fixes #6218.	2024-02-09 17:14:12 -08:00

1 2 3 4 5 ...

1669 Коммитов