This change applies clang-format to all of the sources added in DXC that
were not part of the original LLVM/Clang 3.7 release when DXC forked.
This provides consistent code formatting across the codebase.
Fixes#5591.
Two related things:
-Move value/store annotations before instruction number annotations so
that we can avoid giving instruction numbers to the allocas that the
value-to-declare pass adds, since they are only there to implement PIX
shader debugging and do not need to be debugged by the PIX end-user.
And, otherwise, adding them messes up the instruction numbering for the
next part:
-Add parameters that allow PIX to limit instrumentation to a smaller
range of instruction numbers. This is a crude way to make
instrumentation workable for very large (>100k or so instructions)
shaders which otherwise will break the GPU driver either via out of
memory or via TDRs when the instrumentation is run.
This PR adds a new PIX pass that instruments DXR 1.0 shaders
(intersection, any/closest hit and miss). The new pass emits each shader
invocation's [ray system
values](https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#system-value-intrinsics)
and DispatchRaysIndex into a UAV that can be read back later. This will
enable new DXR debugging features in PIX.
* remove warnings on dxc
* address feedback
* remove removed functions' declarations
* get rid of another unused function
* use maybe_unused instead of if 0 when removing unused functions
* address commenting issues
* Revert "use maybe_unused instead of if 0 when removing unused functions" because c++17 extensions aren't enabled
This reverts commit 3a00d8eb84.
* remove incorrect scope
In some cases, arrays embedded within an HLSL global static will be backed by an llvm global variable, rather than an alloca or a set of discrete values.
In those cases, there will be no dbg.value or dbg.declare statements that will allow PIX to associate stores to those globals with the HLSL variable.
But it's possible to stitch the storage to the HLSL variable and emit the dbg.value (and "fake" alloca/stores that PIX's debug instrumentation uses for tracking changes to values during execution).
Prior to this work, PIX had no visibility into non-const global statics at all. Now, PIX can see most things, inlcluding the above-described class of embedded arrays. However, more work remains:
-dynamic indexing of the arrays is not considered
-arrays that have been reduced to 1-dimensional equivalents are not covered
The first two will be tackled next. The last will probably be left until and unless requested by a PIX customer.
In brief, this change walks the global variables and compares types, offsets and sizes to determine where arrays might be embedded within other global variables. Maps are constructed to go from a global member's name to the actual member's offset/size, and from the HLSL DIGlobalVariable to the per-function DILocalVariable "alias" or "mirror" of that global (which are denoted by the "global." name prefix) . This allows the "fake" alloca/store and dbg.declares to be emitted for any store to a GEP operand that references those embedded arrays. (If no other members of the HLSL global are accessed, then there will be no DILocalVariable for the new dbg.declares to be associated with, so a new DILocalVariable must be emitted.)
In some cases structs like the RayDesc built-in type (one of the arguments to TraceRays) will be placed in an alloca.
The PIX annotation code wasn't properly counting all the leaf-node values of aggregates within such structs when determining offsets for later members.
The result of this was erroneous attribution of writes to those members during PIX shader debugging.
* PIX:Descend Type Hierarchy for Alloca'd structs
* Add test, vectors are scalar-only
* New file-check for metadata nodes
The previous formulation had a single member variable for several Values which would be used in multiple functions of a lib, which of course means that in all but one of those functions, the Values' creation function was not the right one. Funnily enough, I had got this right in the access-tracking pass but somehow neglected to do it in the debugging pass. (Also need to audit the other passes to make sure I haven't been repeating the mistake.)
These fixes were the result of a single shader from a well-known game engine, all of which resulted in mismatched offsets between PIX's offsets-into-fake-allocas and the offsets into aggregates for dbg.value statements.
-Base class sizes weren't being added to derived classes
-Samplers+Resources weren't being given any size at all
-Raygen payloads (referred to by pointer) were triggering an assert
Added a new set of tests for the type system to exercise these.
* Clean up and rework DXIL library depencencies
This change reworks the dependency specifications for the DXIL*
libraries. Many of these libraries had over and under specified
dependencies.
This also collapses the DxilRDATBuidlder into the DxilContainer library
to remove the cyclic dependency between the two libraries.
* Break assert dependency between DXIL and Analysis
In assert builds the DXILModule constructor was creating forced bindings
to debugging methods that are often used from the debugger. This forced
binding is useful, but doesn't need to live in the DXIL library.
To break the dependency I've moved the code into Analysis. I've also
made that code only included when building with MSVC. When using other
compilers `__attribute__((used))` can be applied to the function via the
`LLVM_DUMP_METHOD` annotation to have the same effect.
PIX is unique in that it needs to deserialize, modify and then reserialize root sigs. The focus of this checkin is adding such modifications to PixPassHelpers.cpp. (This closes a gap in PIX support: PIX can now support shaders that use dxil-defined and attribute-style root signatures.)
But this required some work outside of the purely PIX-focused areas. Deserialized root sigs are described by a C-like structure with embedded arrays that are new/delete-ed by routines in DxilRootSignature.cpp. Since modifying these structs requires more new/delete calls, I chose to add a new entry point in DxilRootSignature.cpp to do the bare minimum that PIX needs: extend root params by one descriptor. This approach keeps all those raw new/deletes in one file.
I found a leak in DxilRootSignatureSerialzier.cpp, which I fixed. There appear to be no unit tests that exercise this path. I welcome feedback on adding one.
There were other leaks in class CVersionedRootSignatureDeserializer, but this class is unused so I deleted it.
Oh, and there are bazillions of commits because I was cherry-picking from a recent change (#4845) as it eveolved, since I needed that change and this to test PIX.
Building DXC with another build system revealed that some includes
were dependent on headers only available by chance. Adding missing
headers.
Signed-off-by: Nathan Gauër <brioche@google.com>
Signed-off-by: Nathan Gauër <brioche@google.com>
This mistake was upsetting a certain driver. The UAV being declared is always u0 in space -2. Also changed the name of one struct type to make it more obvious what PIX had added.
authored-by: Jeff Noyle <jeffno@ntdev.microsoft.com>
A couple of minor changes to support PIX shader debugging for hull shaders generated by the 11on12 mapping layer for DXBC originals.
Co-authored-by: Jeff Noyle <jeffno@ntdev.microsoft.com>
HLSL 2017's enum and enum class compile to a type without an integer base.
So in this case we'll call it an unknown type for the purposes of PIX debug instrumentation.
Further work will be required to return the enumerants to PIX and to record an actual value during shader debug instrumentation.
These changes add a couple of new fields to the shader-access-tracking pass. The new fields are the shader kind and the instruction ordinal (from the dxil-annotate-with-virtual-regs pass). This will allow PIX to report richer feedback to the user about out-of-bounds access.
This revision adds out-of-bounds checking for SM6.6-style dynamic resources and samplers, a feature heretofore missing in PIX.
Fortunately, there was a bit spare (InstructionOrdinalndicator) that allows PIX to detect the new fields and thereby continue to operate with older versions of dxcompiler, a frequent customer scenario.
This fixes a regression introduced by #3985
One IHV's driver compiler was upset by the placement of the allocas generated by this builder (m_B, that is).
Restoring the insertion point to the start of the function makes the driver happy again.
* Automate generating of intrinsic code
This change removes generated code for HLSL and DXIL intrinsics and
sets up the build system so that it will generate the code and
automatically regenerate it on changes to any of the generation scripts
or input files.
* Covering the last few generated files
* Move namespace priting into hctgen
I somehow missed that the open namespace was in manual code and the
close was in the generated code...
* Fix pix passes generator
* Work around missing autocrlf on Windows checkout
*shakes fist* AppVeyor...
* Actually pass through --force-lf
These changes are a result of prototype work in PIX to support shader debugging for DXR,, which was an exercise to figure out how to support libraries in general in PIX. These changes are the result. It's all about iterating over all the library functions, rather than "the" entry point function.
In addition, to have something concrete to submit to a driver, there's the addition of codegen for the shader debugging "prolog", which chooses at runtime whether or not to emit instrumentation data for a given shader invocation based on its (in this case) thread id. This last is not yet ready for production in PIX, in that it is necessary but not sufficient to enable DXR shader debugging.
PIX's mesh shader output instrumentation needs to disambiguate groups of mesh shader invocations, so that PIX can reconstruct the intended mesh after the fact. In the case where an amplification shader is present, the previous code was incomplete: it included only the X value of the mesh shader's group ID relative to the AS DispatchMesh call. The full XYZ value is needed. Unfortunately, this value does not necessarily fit into a single DWORD for returning to PIX via the instrumentation, but various spec constraints mean that the product of the group ID cube's dimensions will fit in a DWORD.
So this change expands the custom data added to the AS->MS payload to include the Y and Z group counts that the AS passed to DispatchMesh. The mesh shader can then multiply its group ID's component values by these counts in order to come up with a single unique group ID value.
This change is required to enable PIX to reconstruct mesh shader output data correctly. Without knowing the thread id of the AS thread that launched the mesh shader threads, we don't know which probably-a-mesh to attribute each MS's output to.
Background: this pass is trying to find all dbg.value and replace them with dbg.declare. In the code being changed, the pass is trying to seek a valid location at which to insert the dbg.declare. It has to come after the value to which it applies (which isn't true of the dbg.value). So there's this little loop trying to move forward to find the right instruction before which to insert new stuff.
I was expecting getNextNode to return null when there is no next node. When called on a terminator, it actually returns a non-null but malformed instruction pointer. So we have to explicitly check for terminators in this loop.
This really short basic block tripped up the pass:
; <label>:274 ; preds = %.lr.ph55
%RawBufferLoad = call %dx.types.ResRet.i32 @dx.op.rawBufferLoad.i32(i32 139, %dx.types.Handle %lightBuffer_texture_structbuf, i32 %lightIndex.0, i32 28, i8 1, i32 4), !dbg !384
%275 = extractvalue %dx.types.ResRet.i32 %RawBufferLoad, 0, !dbg !384
switch i32 %275, label %288 [
i32 0, label %276
i32 1, label %280
i32 2, label %284
], !dbg !397
I think the pass could be smarter about seeking the right insertion point for the dbg.declare. It's currently assuming that the dbg.value always succeeds the value to which it refers, but as in this case, that's not always true. But that's a project for another day.
While working on another issue I noticed that I had forgotten these cases.
(PIX uses this pass to detect non-constant indexing into resource arrays in order to decide if it should run the whole "shader access tracking" pass to check for out-of-range access etc. So it was failing to detect dynamic indexing and so could choose not to run the SAT pass.)
This is a bit of a corner-case, but PIX tripped up on it while capturing an app under d3d11on12.
PIX is probably the only client who cares about this case, since it's only relevant when DXBC->DXIL has been run (e.g. 11on12), and then examined via reflection and/or the PIX-specific passes.
Here's the entry-point metadata provided by DXBC->DXIL for a representative case. It has a null entry point, but a non-null patch-constant function:
!dx.entryPoints = !{!7}
!7 = !{null, !"", !8, !2, !24}
!24 = !{i32 0, i64 256, i32 3, !25}
!25 = !{void ()* @pc_main, i32 3, i32 3, i32 2, i32 3, i32 3, float 0.000000e+00}
(Documentation of DXBC->DXIL's operation in this case is here: https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst#hull-shader-representation)
(PIX bug #32030124)
This fixes two issues found in a very large shader from a partner ISV.
Some background: PIX runs this value-to-declare pass that turns dbg.value into dbg.declare plus a "fake" store to an alloca that PIX can later note as part of its debug instrumentation. This requires building up a map of struct members to these allocas.
First issue: during this pass, any dbg.value that referred to a type tagged with DW_TAG_member was assuming that the type's base type was actually the base type. It's not: it's the containing type, together with an offset and bit-size. So this change adds code to go find the actual type at that offset in the containing type. Previously, the pass was breaking in the validator because it was creating bit-pieces (in the newly-added dbg.declares) that overlapped the end of the containing type.
Second issue: a previous PR (#3746) fixed a bug wherein the new fake stores were being inserted before the values were actually defined. That PR failed to maintain the convention that phi instructions must always be listed in a contiguous section at the start of any basic block. While this convention is not enforced by the validator, the nvidia driver has a hard dependency on this convention and will crash if it is violated.
Lastly, figuring out these sorts of things always takes days and days of staring at 680k instructions (as in the shader that provoked these bugs) trying to find the wrong one. I know I'll be doing this all again, so I've added some in-but-off logging and struct dumping that is much easier to follow and to use than calling .dump() from the immediate window in VS.
Two things:
First recurse down type hierarchy. The previous code only went down 2 levels...
Second, derive base type from enum classes when populating fake debug-store locations
The value-to-declare pass replaces dbg.value with dbg.declare and some getElementPtr/store pairs that the PIX instrumentation uses to discover values. The problem here was that the dbg.value could reside in the program prior to the value it was referring to, and this seems to be legal. It's not correct once the dbg.value has been replaced with the GEP/store pairs, since the store was trying to operate on a value that had not yet been created. The solution is to move the builder's insertion point to that of the value itself, rather than that of the dbg.value.
Also in this change: a simple pair of copy-paste typos in the OffsetManager class.
dbg.value can occasionally return a null value. (Hit this in a customer (343) shader via PIX.) This is expected. From IntrinsicInst.cpp:
// When the value goes to null, it gets replaced by an empty MDNode.
This is little more than a move of the "create-uav" code in lib/DxilPIXPasses/DxilShaderAccessTracking.cpp to PixPassHelpers.cpp, followed by a factoring-out of the parts that create a handle (either pre-SM6.6 fashion, or with the newer create-from-binding etc.).
All the other passes' near-identical code was then deleted and made to call the centralized function.
This change updates the existing PIX resource-tracking code to handle dynamic resources (and SM6.6's resource binding apparatus in general).
The output UAV is now segmented into three parts: the original formatted buffer at the beginning, for old-style createHandle resources, followed by a block for resource (texture, buffer etc) access, followed by a block for sampler access.
The latter two are divided into 8-byte records. The first dword records writes to a resource/sampler, the second reads.
The writes are encoded bit fields denoting the access performed by the shader.
A customer couldn't recompile their large shader with -Od for debugging in PIX.
-The symbol manager would quit early, so now it continues to discover other variables
-The value-to-declare pass was adding an incorrect debug location, which tripped up the verifier.
A customer provided a shader that uses many language features that the value-to-declare pass couldn't handle:
-member functions
-resource variables
-arguments passed by pointer
-struct inheritance
Note the highly "creative" method used to determine if a type is a resource- by name string compare. Unfortunately there isn't a cleaner way to do this based on type alone.
These changes aren't the end of this road: GS output payloads aren't covered. Resource types are merely worked-around, not turned into debuggable variables that PIX can display.
Also, these changes disturbed the DIA debug lines reporting, which isn't used by PIX anymore so I just deleted that test case rather than fix it. (A future checkin will delete the DIA implementation properly.)