The mark convergent pass is meant to prevent unwanted moving of
operations on derivative op input. It was previously only run on pixel
shaders. Because derivatives are supported in CS/MS/AS shaders as part
of shader model 6.6, it needs to be run on these stages for that target
too.
PR #3636 (commit 6b44d611f7) changes the definition of
`llvm_unreachable(msg)`. Now it evaluates the `msg` even when
NDEBUG flag is defined.
For the following code,
```
static const char *const BundlingNotImplementedMsg =
"Aligned bundling is not implemented for this object format";
void MCObjectStreamer::EmitBundleAlignMode(unsigned AlignPow2) {
llvm_unreachable(BundlingNotImplementedMsg);
}
```
`llvm_unreachable(BundlingNotImplementedMsg)` did not evaluate
`BundlingNotImplementedMsg`. After PR #3636, it is evaluated and it
results in "undefined object" for `BundlingNotImplementedMsg`.
Failing to remove this because it is marked as having side effects so it
can prevent unwanted code movement resulted in trivially dead code being
retained unnecessarily because the marker isn't removed until after dead
code elimination. By allowing its removal when the operation that needed
it has been removed so it has no users, this dead code can be
eliminated.
Instead of raising structured exceptions for unreachable and fatal
errors, raising c++ exceptions allows returning an error code and
getting a useful message instead of requiring a structured exception
handler to catch it.
Add cast failure assert
This is little more than a move of the "create-uav" code in lib/DxilPIXPasses/DxilShaderAccessTracking.cpp to PixPassHelpers.cpp, followed by a factoring-out of the parts that create a handle (either pre-SM6.6 fashion, or with the newer create-from-binding etc.).
All the other passes' near-identical code was then deleted and made to call the centralized function.
When compiling a library without 16-bit support, certain struct types
containing either min precision types or matrices must be saved in
reflection data for conversion after linking. However, this is only
necessary when the types are used by a resource.
Instead of evaluating all matrix types and saving those that meet this
criteria, only the types used by resources are evaluated and possibly
preserved. This significantly shrinks the reflection size in this case.
- Move validator/dxil version checks up-front
These should fail first rather than side effects of trying to validate
details of a version we don't support.
- Improve message for unsupported validator or dxil version
These errors are most likely if compiling separately from validation
and failing to override the validator version properly, or running on
an external validator that doesn't support a newer dxil.
- Use dxil version from metadata for DxilModule when loading,
rather than just setting it to minimum based on shader model.
- Remove TODO from validator messages that shouldn't be there
This change updates the existing PIX resource-tracking code to handle dynamic resources (and SM6.6's resource binding apparatus in general).
The output UAV is now segmented into three parts: the original formatted buffer at the beginning, for old-style createHandle resources, followed by a block for resource (texture, buffer etc) access, followed by a block for sampler access.
The latter two are divided into 8-byte records. The first dword records writes to a resource/sampler, the second reads.
The writes are encoded bit fields denoting the access performed by the shader.
Module with stripped reflection will have no name for error messages.
Try to get the name from the debug module.
Failing that, use ID, type and binding info.
The most common cause of internal compiler errors are access violations
or stack overflows. This registers an exception handler in dxc.exe for
these cases that are otherwise unhandled. It prints a simple message
for these errors and passes the exception along.
In case this is unwanted for some reason, a hidden disabling flag is
added as well.
Adds LLVM builtin exceptions for assert, fatal, and unreachable. Adds a
default message for exceptions not explicitly addressed.
Alters behavior of llvm_unreachable so it always raises an exception
regardless of compiler support for unreachable hints.
Reports errors using fputs instead of std::cerr to ensure that no
allocation is necessary. Custom output is performed in a static array
that is output with fputs.
This extension adds qualifiers for payload structures accompanied with semantic checks and code generation. This feature is opt-in for SM 6.6 libraries. The information added by the developer is stored in the DXIL type system and a new metadata node is emitted during code generation. The metadata is not necessary for correct translation of DXIL, so it may be safely ignored, but it provides hints to unlock potential optimizations in payload storage between DXR shader stages.
Cycles in the dependency graph can result in phis and other instructions
failing to get a wave sensitivity result after analysis. This can only
happen when nothing in the loop has wave sensitive operations.
By collecting phis that are unknown and marking them as non-sensitive
after analyze is complete and rerunning it, we can correctly evaluate
their dependants too.
Because incompletely processed blocks might still be wave-sensitive,
unknown phis are only added to be reprocessed when their preds have all
be processed fully.
In addition to cycles preventing wave sensitivity determination on
dependant operations, they can prevent entering blocks that depend on
values with cyclic dependencies. Even when reprocessing all unknown
phis, this can result in not all values getting a determination.
To resolve this, all blocks are added to the block list from the start
to ensure that they are all processed eventually. Since instruction
processing is more intelligent and targetted, this change is accompanied
by a change that only processes a single block from the worklist before
attempting to address queued instructions again.
To catch wave-sensitive instructions sooner, when visiting an
instruction, all arguments are checked for wave sensitivity even after
an unknown argument is found whereas before an unknown argument ended
the search.
Add some tricky loop tests that failed in ToT and various intermediate
stages of the change. Add a more aggressive assert to check for failures
to find the wave-sensitivity.
Fixed#2556 (again)
An unrelated earlier change mistakenly removed the return on a line
calling the Scatterer constructor for an argument value. As a result,
the constructor is called, but the resulting object which would insert
new extraction ops in the entry block is not used at all. Instead,
the default case is encountered where the vector PHI op is used as an
insertion point, which places the extractions after the new scalar PHIs
that use them. This is caught in the loop case by an assert because the
assumptions that caused LCSSA to be skipped for this BB are faulty.
Gather operations that take four separate offsets are meant to allow for
programmable, non-immediate values. This removes them from the immediate
and range validations by giving the immediate versions their own opcodes
For these intrinsics and loads, errors are generated when offsets are non-literal
or out of range. These largely use the slightly altered validation paths that the
sample intrinsics use.
Reword error when texture access offsets are not immediates to be
more all encompassing and grammatical.
Incidentally remove duplicate shaders being used for the validation
test from the old directory while identical copies in the validation
directory went unused. Redirected validation test to the appropriate
copies. This is consistent with the test shader re-org's stated intent
Sample operations have a maximum of 3 args, but gather has a maximum of
two since it always operates on 2D images. So gather operations only
pass in two offset args to the offset validation, which resulted in an
invalid access.
Rather than adding a specialized condition to evade this, just iterate
over the number of elements in the array. For sample it will be 3 and
for gather 2 and it will still check for expected undefined args
appropriately.
For the offset legalization pass, the opcode is used to determine the
start and end of the offset args
Only produce the loop unroll suggestion when within a loop
Base error line on call instruction instead of source of the offset
Sort by location in source when possible and remove duplicates
Adapt tests to verify and match these changes
Fixes#2590Fixes#2713
When a resource comes from a binding, we can't determine if it is on a
heap or not, so we set a flag on it to let the runtime sort it out.
When a resource comes from the heap directly, we know it's on the heap
so we set the shader flag to check for platform support.
Any extension function that includes the specical string "$o" in its name
needs to have the $o replaced with the type name of the overload. Previously
we used a default heuristic to select the overload type from a function
argument.
This commit adds support for explicitly setting the argument to use for the
overload name by appending a ":<ArgIndex>" to the overload marker. For example,
using a name like "my_special_function.$o:3" would take the overload type from
the third function argument.
- SROA was not skipping handle type, now present with dynamic resources
changes, or ConstantBuffer<> / TextureBuffer<> types, as it should.
- Updated unit tests to add combinations of -Od and -Zi.
The 64UAV flag is meant to indicate that 9 or more UAVs are in use.
Previously, each UAV was considered as one regardless of range. By
adding up the ranges, we get a more accurate determination.
Only use the new counting when using the latest validtor to maintain
compatibility with the old ones
Fixes: #2964
groupshared variables shouldn't be allowed outside of compute and
compute-like shaders. This adds a validation error when such
variables are used.
Add shader mask where groupshared are used
Correct several tests that used groupshared incorrectly
Incidental fix to atomics tests where the special case initializations
might have allowed the threads that failed to write to proceed
before the write took place and read the improperly initialized values
Changed up the assignments to avoid write collisions and be more
readable
Fixes#2603, #2677
A Dxil shader flag should be added when SV_Barycentrics semantic or
AttributeAtVertex() are used. It was defined and member variables
created for it, but it wasn't actually set.
When trying to fixup line numbers when inlining a function, if the call
instruction had no debug location, the DIsubprogram was queried. That
query took a very long time and was pointless when there is no debug
info.
The code now checks for debug metatdata before trying to get the
disubprogram
Fixes#3130
Under certain rare circumstances, it is possible for errors to be
reported for function or global variables that are null.
This eliminates the absolute necessity for a non-null function or GV by
passing in the context needed to report any error. It also avoids some
of the cases where trying to get a debug function would give a null
which would replace the non-null function.