The code `int foo; .. size(float) * foo` causes failure because
`getElementSpirvBitwidth()` in AstTypeProbe.cpp does not handle the type
of `sizeof()` which is the "unsigned long" type. This commit handles it.
Fixes#2814
The lifetime test had parameters for UAV creation in the wrong order so
they were misassigned. The use of lib creation also didn't depend on
raytracing support as it should.
correct some other sm66 cap bit checks too
Various resources have been created inconsistently with the shader
representation. After running out of bandaids, this refactors that code
to give the caller the tools necessary to create the resources
correctly.
Builtin object types such as Texture2D add all their object methods
to the associated decl context when they are first seen. However,
previously, if the code meant to effect that was found in a namespace
it would skip this addition. This removes that restriction so that
the methods can be added and then found when needed.
Fixes#2472
Previously, codegen attempted to skip the alloca creation that would
be encountered below and just emit the constant init. However, the
code in EmitAutoVarDecl() first checks whether the Init is null using
isTrivialInitializer() and just bails if it is.
This adds the isTrivialInitializer() call to the conditions here
since we fallback to the alloca if there is no constant initializer.
The ShaderOpTest code for mesh shaders had a lot of problems. This
corrects the initialization of the pipeline and the dispatch.
In addition, there were various problems with ExecutionTests for atomics
where mesh shaders were used. This adds testing for mesh results that
was being skipped. This required adjustment of the indices.
Also required was proper accumulation of groupshared values between
amplification and mesh shaders.
Various and sundry indexing corrections were also required.
* Fixed a bug in loop exit structurizer where exit values don't dominate users. Replaced the undef with 0 for numeric types. Enabled it by default.
* No longer using a separate list
* [spirv] Support for SM 6.6 pack_u8 and pack_s8.
* [spirv] Add support for SM 6.6 pack_clamp_{u|s}8.
* [spirv] Support SM 6.6 unpack intrinsics.
* [spirv] Remove unused variable.
* Remove comment.
* SPIR-V: Emit RayTmaxKHR builtin for RayTCurrent IOP
HitTKHR was [removed] from the spec as it merely shadows RayTmaxKHR.
Emitting it results in invalid SPIR-V:
generated SPIR-V is invalid: Operand 3 of Decorate requires one of these capabilities: RayTracingNV
OpDecorate %6 BuiltIn HitTNV
Replace the builtin by RayTmaxKHR if NV_ray_tracing enabled to resolve
this.
[removed]: bdd2aa34c4
* test/CodeGenSPIRV: Test RayTCurrent for RayTmax or HitT builtin
Corrected disabling compute shader and incorrect indices.
Removed test that attempted to confirm that the indepenent samples would
match where the derivatives were the same. First of all, they weren't as
the same as intended. Second, even with that corrected, position in
quads affected the LOD calculation. The check was beyond the intended
scope of this test anyway.
Improved comments describing the function of this test
Corrected quadread index calculation which was not fully updated
Also ups priority only skipping when warp is used so this won't be
missed again and improves the error output by using equals macros
A resource being treated like a texture in the shader was being created
as a buffer. This caused problems on some platforms including warp,
which required an ugly workaround that this removes
* [SPIR-V]Support KHR_Ray_tracing terminate Ops
Add OpIgnoreIntersectionKHR/OpTerminateRayKHR related
https://github.com/microsoft/DirectXShaderCompiler/issues/3285
* Remove redudant message in the header of check hlsl
* rebase against master (fix merge conflict)
* Create a new basic block after termination.
* Don't change the SPIRV-Headers hash.
* Add unit tests and add assertion.
* Fix missing ==
Co-authored-by: Ehsan Nasiri <ehsannas@gmail.com>
This is the work of many contributors from Microsoft and our partners.
Thank you all!
Add support for 64-bit atomics, compute shader derivatives, dynamic
resources (from heap), 8-bit Packed Operations, and Wave Size.
All of these require compiling with 6_6 targets with just a few
exceptions. Each of these features include unittests and Execution
tests.
64-bit atomics add 64-bit variants of all Interlocked* intrinsic
operations. This involves changing some of the code that matches
intrinsic overloads to call instructions. Also adds a few float
intrinsics for compare and exchange interlocked ops which are available
for all shader models 6.0 and up.
Compute shader derivatives adds dd[x|y], CalculateLevelOfDetail, and
Sample operations that require derivatives to compute. QuadRead
operations have been allowed in compute from 6.0+ and tests are added
for them here.
Dynamic resources introduce global arrays that represent the resource
and sampler heaps that can be indexed without requiring root signature
representations. This involves a new way of creating and annotating
resource handles.
8-bit Packed operations introduces a set of intrinsics to pack and
unpack 8-bit values into and out of new 32-bit unsigned types that can
be trivially converted to and from uints.
WaveSize introduces a shader attribute that indicates what size the
shader depends on the wave being. If the runtime has a different wave
size, trying to create a pipeline with this size will fail.
the VK namespace change introduced some variables and defines that are
used in areas not protected by the same ifdefs that protect the
declarations. As a result, where this wasn't defined, builds and hearts
were broken. </3
A customer provided a shader that uses many language features that the value-to-declare pass couldn't handle:
-member functions
-resource variables
-arguments passed by pointer
-struct inheritance
Note the highly "creative" method used to determine if a type is a resource- by name string compare. Unfortunately there isn't a cleaner way to do this based on type alone.
These changes aren't the end of this road: GS output payloads aren't covered. Resource types are merely worked-around, not turned into debuggable variables that PIX can display.
Also, these changes disturbed the DIA debug lines reporting, which isn't used by PIX anymore so I just deleted that test case rather than fix it. (A future checkin will delete the DIA implementation properly.)
* [spirv] Introduce the implicit 'vk' namespace
If any of these are used for DXIL code generation, the compiler will
report an error about the unknown "vk" namespace.
* [spirv] Introduce vk::ReadClock intrinsic.
The following Vulkan specific intrinsic functions are added:
```hlsl
uint64_t vk::ReadClock(in uint32 scope);
```
Also the following Vulkan-specific implicit constants are added:
```
vk::CrossDeviceScope; // defined as uint 0
vk::DeviceScope // defined as uint 1
vk::WorkgroupScope // defined as uint 2
vk::SubgroupScope // defined as uint 3
vk::InvocationScope // defined as uint 4
vk::QueueFamilyScope // defined as uint 5
```
Sample usage looks as follows:
```hlsl
uint64_t clock = vk::ReadClock(vk::WorkgroupScope);
```
If any of these are used for DXIL code generation, the compiler will
report an error about the unknown "vk" namespace.
* [spirv] Add documentation.
* Address code review comments.
* Test: Validate vk namespace is not allowed for dxil.
* Fix usage of DXASSERT.
* Move ValidateVkNamespaceNotAllowed test to HlslFileCheck.
* [SPIRV] support hlsl export function attribute
DXIL could export functions with export attribute while spirv would
ignore such functions. Currently glsl spirv does not have linkage
export decorate. so the exported function would be treated as normal
function while add dummy entry point according to the spirv validations
* Remove use of dummy entrypoint function.
* Add Linkage capability if needed.
* Add SPV_ENV_UNIVERSAL_1_5 as a possible target env.
* Get target environment for tests from the test file.
* Transition last test to using target env in test file.
* Remove unused variable.
* Address code review.
Co-authored-by: Ehsan Nasiri <ehsannas@gmail.com>
The fallback layer for GCC currently uses pointer comparison to match
IIDs as uuid tagging for types is not supported.
Thus, when an external program dlopens libdxcompiler and calls
with a proper REFIID these will not match internal pointers and result
in E_NOINTERFACE.
Note that the reverted patches before this commit alleviated the
situation somewhat by replacing the pointer with a hash of the interface
name. While this again grants a stable value to be used when called from
external binaries these (usually FFI wrappers) now need to understand
whether the library has been compiled with full IID support (as a binary
compiled with Clang has no such limitations), and then pass the hash
anywhere they'd otherwise pass an IID.
WIP: Not all IIDs are added yet, which requires the extra "empty-GUID"
check in IsEqualIID.
For a function `ReturnType foo( .. ) { .. return value; }`, if the type
of value has a memory layout, `value` cannot be used for the operand of
OpReturnValue because of the type mis-match. Note that the `ReturnType`
for the return value must have a void memory layout (i.e., unknown).
This CL converts the type of the return value if it has an explicit
physical memory layout.
When running valgrind over a program using `DxcIntelliSense` (to
validate our own deallocations [1]) a bunch of mismatching `new` with
`free()` show up:
Mismatched free() / delete / delete []
at 0x483B9AB: free (vg_replace_malloc.c:538)
by 0x52B06A8: IMalloc::Free(void*) (WinAdapter.cpp:34)
by 0x6DC7D19: DxcTranslationUnit::Release() (dxcisenseimpl.h:308)
by 0x14278E: com_rs::unknown::IUnknown::release (unknown.rs:55)
[...]
Address 0x4c3c930 is 0 bytes inside a block of size 40 alloc'd
at 0x483B07F: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:385)
by 0x6DC06EB: DxcIndex::ParseTranslationUnit(char const*, char const* const*, int, IDxcUnsavedFile**, unsigned int, DxcTranslationUnitFlags, IDxcTranslationUnit**) (dxcisenseimpl.cpp:1192)
by 0x13020A: hassle_rs::intellisense::ffi::IDxcIndex::parse_translation_unit (macros.rs:108)
by 0x119B74: hassle_rs::intellisense::wrapper::DxcIndex::parse_translation_unit (wrapper.rs:101)
[...]
And so on for the other intellisense classes. All these classes have
`DXC_MICROCOM_TM_ADDREF_RELEASE_IMPL` which deallocates `this` on the
associated `m_pMalloc` with `free()`:
The "TM" version keep an IMalloc field that, if not null, indicate
ownership of 'this' and of any allocations used during release.
Yet are allocated using `new`, resulting in this mismatch. The solution
is to follow a similar approach as the introduction of `IMalloc` to
`DxcIntelliSense` in d5bb308 by rewriting all classes to take an
`IMalloc *` in the constructor and invoking it either through `::Alloc`
from `DXC_MICROCOM_TM_ALLOC` or `CreateOnMalloc`.
[1]: https://github.com/microsoft/DirectXShaderCompiler/pull/3250#issuecomment-726216850
* Enable generation of llvm.lifetime.start/.end intrinsics.
- Remove HLSL change from CGDecl.cpp::EmitLifetimeStart() that disabled
- generation of lifetime markers in the front end.
- Enable generation of lifetime intrinsics when inlining functions
- (PassManagerBuilder.cpp).
- Both of these cover a different set of situations that can lead to
- inefficient code without lifetime intrinsics (see examples below):
- Assume a struct is created inside a loop but some or all of its
fields are only initialized conditionally before the struct is being
used. If the alloca of that struct does not receive lifetime intrinsics
before being lowered to SSA its definition will effectively be hoisted
out of the loop, which changes the original semantics: Since the
initialization is conditional, the correct SSA form for this code
requires a phi node in the loop header that persists the value of the
struct field throughout different iterations because the compiler
doesn't know anymore that the field can't be initialized in a different
iteration than when it is used.
- If the lifetime of an alloca in a function is the entire function it
doesn't need lifetime intrinsics. However, when inlining that function,
the alloca's lifetime will then suddenly span the entire caller, causing
similar issues as described above.
- For backwards compatibility, replace lifetime.start/.end intrinsics
with a store of undef in DxilPreparePasses.cpp, or, for validator
version < 1.6, with a store of 0 (undef store is disallowed). This is
slightly inconvenient but achieves the same goal as the lifetime
intrinsics. The zero initialization is actually the current manual
workaround for developers that hit one of the above issues.
- Allow lifetime intrinsics to pass DXIL validation.
- Allow undef stores to pass DXIL validation.
- Allow bitcast to i8* to pass DXIL validation.
- Make various places in the code aware of lifetime intrinsics and their
- related bitcasts to i8*.
- Adjust ScalarReplAggregatesHLSL so it generates new intrinsics for
each element once a structure is broken up. Also make sure that lifetime
intrinsics are removed when replacing one pointer by another upon seeing
a memcpy. This is required to prevent a pointer accidentally
"inheriting" wrong lifetimes.
- Adjust PromoteMemoryToRegister to treat an existing lifetime.start
- intrinsic as a definition.
- Since lifetime intrinsics require a cleanup, the logic in
CGStmt.cpp:EmitBreakStmt() had to be changed: EmitHLSLCondBreak() now
returns the generated BranchInst. That branch is then passed into
EmitBranchThroughCleanup(), which uses it instead of creating a new one.
This way, the cleanup is generated correctly and the wave handling also
still works as intended.
- Adjust a number of tests that now behave slightly differently.
memcpy_preuser.hlsl was actually exhibiting exactly the situation
explained above and relied on the struct definition of "oStruct" to be
hoisted out to produce the desired IR. And entry_memcpy() in
cbuf_memcpy_replace.hlsl required an explicit initialization: With
lifetime intrinsics, the original code correctly collapsed to returning
undef. Without lifetime intrinsics, the compiler could not prove this.
With proper initialization, the test now has the intended effect, even
though the collapsing to undef could be a desireable test for lifetime
intrinsics.
Example 1:
Original code:
for( ;; ) {
func();
MyStruct s;
if( c ) {
s.x = ...;
... = s.x;
}
... = s.x;
}
Without lifetime intrinsics, this is equivalent to:
MyStruct s;
for( ;; ) {
func();
if( c ) {
s.x = ...;
... = s.x;
}
... = s.x;
}
After SROA, we now have a value live across the function call, which will cause a spill:
for( ;; ) {
x_p = phi( undef, x_p2 );
func();
if( c ) {
x1 = ...;
... = x1;
}
x_p2 = phi( x_p, x1 );
... = x_p2;
}
Example 2:
void consume(in Data data);
void expensiveComputation();
bool produce(out Data data) {
if (condition) {
data = ...; // <-- conditional assignment of out-qualified parameter
return true;
}
return false; // <-- out-qualified parameter left uninitialized
}
void foo(int N) {
for (int i=0; i<N; ++i) {
Data data;
bool valid = produce(data); // <-- generates a phi to prior iteration's value when inlined. There should be none
if (valid)
consume(data);
expensiveComputation(); // <-- said phi is alive here, inflating register pressure
}
}
* Implement lifetime intrinsic execution test.
- Test SM 6.0, 6.3, and 6.5. The 6.5 test behaves exactly the same way
as 6.3, it is meant a placeholder for 6.6.
- Test validator versions 1.5 and 1.6.
- Abstract a few things in the ExecutionTest infrastructure to enable
better code sharing, e.g. for lib compilation.
* Make memcpy replacement conservative by removing lifetimes of both src and dst. Add regression test for this case.
* Allow to force replacing lifetime intrinsics by zeroinitializer stores via compile option.
* Fix regression where lifetimes caused code that was not cleaned up properly.
- Add SROA and Jump Threading passes as early in the optimization
pipeline as possible without interfering with lowering. These two are
required to fully remove redundant code due to insertion of cleanup blocks
for lifetimes. Previously, SROA ran much too late, and Jump Threading
was disabled everywhere.
- A side effect of this is that we can now have unstructured control
flow more often. This also breaks one test that was originally written
when a part of SimplifyCFG that could also create unstructured control
flow was disabled. That part is still disabled, but jump threading has
the same effect. I don't know why unstructured control flow is a
problem for the optimization pipeline.
- Add a regression test that requires the two phases to be cleaned up properly.
- Disable the simplifycfg test which now fails even though simplifycfg
still does what it should.
* Disable lifetime intrinsics for SM < 6.6, add flag to enable explicitly.
- Add missing default value for unrelated option StructurizeLoopExitsForUnroll.
- Re-enable simplify cfg test disabled in a previous commit.
* [spirv] `-fspv-flatten-resource-arrays` is not needed for flattening global struct of resources.
* [spirv] Move descriptor scalar replacement to legalization stage.
Previously, trying to pass a scalar to a vector out parameter would
cause an assert and no truncation warning. This scales back the assert
and adds the missing warning.
Trying to pass a scalar to an inout parameter would cause a crash. This
allows for the necessary splat and avoids the erroneous attempts to
create a cast that leads to the crash.
Finally, as an incidental, this adds output parameter information to an
error that ostensibly required it, but never had it.
For some cases, spirv-opt removes all line information because of the
optimization. Emitting at least a single OpLine helps debuggers know it
is the end of the shader execution.
This commit adds the SourceLocation to SpirvDebugDeclare class that will
result in emitting the OpLine for the declaration (where the declaration
happens). It will help the spirv-opt to preserve the value assignment
location. In particular, it will be useful for simple shaders that
spirv-opt almost strips out all the instructions and keeps only
DebugValue instructions. We can recognize the location of value
assignments based on OpLine.
The current DXC emits `OpLine` for the first instruction in the location
and does not emit the same `OpLine` for the following instructions.
However, it does not specify the end of the effectiveness of the
`OpLine`, which is technically wrong based on the spec of OpLine and
OpNoLine. We have to specify the `OpLine` is not applied to the
following instructions when we meet an instruction without the location
information.
I wasn't noticing that the last fix#3215 only fixed the error-checking. It was generating an OpTypeStruct without any member.
This commit fixes that push constant should use the templated type of constant buffer to create the struct.
propagate through:
- non-FP paths that might have been dependent on FP paths
- intrinsics that aren't simple FP math intrinsics (such as Sample)
- pointers to matching GEPs
- conditions for control dependent branches
This commit fix a regression introduced by #3147: when`[[vk::push_constant]]` is apply to a ConstantBuffer, error shows that `'push_constant' attribute only applies to global variables of struct type`.
Skip extra arg copies in codegen, but copy when matrix orientation change
must be captured for call.
Copy what's necessary for translation around external calls, like cbuffer,
in HLLegalizeParameter.
This fixes cbuffer RayDesc case as well.
Also fixes issue with -file-check-dump.
As SPIRV-Tools supports the debug info preservation for the full
optimization, we want to allow DXC users to use the full optimization
for the debug info generation.
This change updates SPIRV-Tools and SPIRV-Headers.
- Fixed value tracking for dxil intrinsics
- Fixed some selects holding on to invalid resource uses
- Fixed some cases where unused globals hold on to invalid resource uses
- Fixed some cases where stores of undefs stick around
This PR will add `DontInline` function control flag to the OpFunction in generated SPIR-V if the function has `[noinline]` in HLSL shader.
This is the first step of work for #3158.
After the `DontInline` flag is added, spirv-opt needs an option to control whether it should honor the flag or not (it ignores the flag currently), and the option should be passed down by DXC.
* Cleanup and minor changes to improve integration with internal projects
Adds -Debug option to hcttest.cmd
* Case insensitive compare for -Debug and -Release flags in hctbuild and hcttest
Before/after handling a compound statement i.e., `{ .. }`, we have to
add DebugScope for the lexical scope updates. We added it before
handling a compound statement, but we did not add it after doing it.
This commit adds a DebugScope after handling a compound statement.
When reporting a globalvariable associated error message without -Zi,
an attempt is made to report the associated function, but there was no
check to see if that function existed, so it crashed.
Fixes#2989
Instead of skipping past allocas whenever inserting a new insruction,
which ate up a lot of compilation time, they are inserted at the default
insertion point.
The result is that allocas that would have coallesced just after the
global load an input loads are dispersed throughout the commands. So as
part of dxil finalization, the allocas are moved to the beginning of the
entry block of each function. This results in some minor changes to a
couple tests due to the allocas preceding the loads.
OpenCL.DebugInfo.100
is a SPIR-V extended instruction set that provides DWARF style debug information.
This PR allows DXC SPIR-V backend to generate the rich HLSL debug information
using OpenCL.DebugInfo.100 instructions.
* Call ctor function for static globals in patch constant function.
* Also copy static globals which has const initialize value at beginning of patch constant.
Fix compatibility with released DXIL.dll binaries:
- 1.4: 18362 SDK
- 1.5: GitHub release v1.5.2003
There are expected test failures when run against 1.5 19041 SDK DXIL.dll,
due to later fixes in the GitHub v1.5.2003 release. Tests that fail by
design on this DLL have a comment added.
Also moved readme text to more discoverable readme.
CBV array index uses special HL CreateHandle intrinsic for indexing
instead of GEP. When TranslateNonUniformResourceIndex is marking GEPs
with metadata, there was no GEP to mark for CBV array. This change
also marks HL CreateHandle calls when direct users of this intrinsic.
Later, in DxilGenerationPass, this HL CreateHandle call with the index
is translated to a GEP + CreateHandleForLib call. This now adds the
non-uniform metadata to the GEP if the indexed HL CreateHandle call
being translated had it.
Prior to a more robust mechanism of handling types determined by
explicit templates in MatchArguments, a special case was made for
ByteAddressBuffer Loads that explicitly assigned the return type. When
the more reobust mechanism was added as part of support for
ByteAddressBuffer Store, not only was this explicit assignment left in,
but it was also applied to the new Store method. The result was an
assert later on a store operation that is expected to return void.
By removing this now unneeded assignment, the assert no longer fires and
nothing else is changed sonce the return value was never read apart from
that anyway.
Fixes#2438
* Modify the extension mechansim to handle custom lowering for resource methods
The goal is to allow resource extension intrinsics to be handled the same way
as the core hlsl resource intrinsics.
Specifically, we want to support:
1. Multiple hlsl overloads map to a single dxil intrinsic
2. The hlsl overloads can take different parameters for a given resource type
3. The hlsl overloads are not consistent across different resource types
To achieve these goals we need a more complex mechanism for describing how
to translate the high-level arguments to arguments for a dxil function.
This commit implements custom lowering by allowing the mapping from high-level
args to dxil args to be specified as extra information along with the
lowering strategy.
The custom lowering info describes this lowering using the following format.
* Add missing virtual destructors
refactor usage of resourceparams to just use RP
create common way of extracting res properties
make getoverloadtype static
so it can be used to determine version
requirements
Disallow literals where not included
Previously, whether an intrinsic parameter type included an entry for
literals or not, a literal was permitted to be cast to it.
By moving up the explicit per-param check to before this cast takes
place, these implicit casts won't be added
The existing code uses the Function storage class for the output patch
of a hull shader that will be used by the patch constant function.
DirectX automatically transmits the output patch to the all threads,
which are identical between threads. Even though Vulkan does not have
the patch constant function, we can simulate it using the execution
barrier and the variable with WorkGroup storage class.
Matrix lowering was not handling subscript oeprations when the matrix
was a shader input. copying the matrix to a temporary explicitly or
implicitly as part of a copy in function call worked around the problem.
When the argument copy was eliminated, this problem was exposed.
By adding a load for the indicated matrix for a subscript of a shader
input matrix, the lowering can procede using the lowered matrix from the
load.
Additionally, hull shaders with their uninlineable functions were
failing to detect the constanthull function as a graphics function so
the subscript lowering was being treated as if it were in a library.
Even when it got to signature lowering, there was no support for
lowering matrix loads.
The test for shaderism now tests the function attributes of the module's
entry function. The signature lowering for matrix loads is moved into a
function that is called by input lowering for both the constant hull and
entry functions.
Added general tests for matrix subscripts from different memory types as
well as specific tests for the argument passing problem in pixel and
domain shaders.
A more correct way to identify functions that should delay the lowering
of their matrix parameters to signature lowering time is to query
whether the function has a signature. This covers entry functions for
graphics shaders and also constant patch functions.
Fixes#2958
For structurize-returns, the codegenopts assigned were a copy of those
the compiler possesses. This is by design in clang as explicitly stated
in comments. As a result, when the compiler codegenopts are updated, the
codegenmodule's are not.
By adding the ability to query the optimization enables from the hlsl
extension helper, these options can be reflected.
Additionally, the structurize-returns flag would have required a define
with a '-' in it, which isn't possible. So now it converts between '_'
for defines and '-' for options, which is consistent with how other
flags are interpretted.
Adds a unit test similar to that of disable gvn that makes a simple
check to verify that the resturn structurization is enabled.
* Add astcontext const decls to the astDecl table if not inserted before
Fix [SPIRV] Inline ray tracing doesn't compile #3047
1)Trying to add astcontext decls to the astDecls tables if decl is not
inserted before
2) Add the unit test of previously failed cs shader.
* [spirv] Create implicit constant VarDecls lazily.
Co-authored-by: Ehsan Nasiri <ehsannas@gmail.com>
Because the defined types of Interlocked* Parameters do not restrict
type casting, they can result in late errors that cause crashes due to
reference params with casts that aren't actually respected by dxil
generation.
By constraining the parameters as they should be, instead of crashes, we
get errors.
Fixes#2077Fixes#2483
In a couple of cases, intrinsics with parameters that weren't supposed
to allow casts were getting at least one parameter cast to a type not in
the accepted list.
Where an intrinsic referred to another parameter to determine
acceptable types, all parameters fed into a single parameter
representing the common type. If the first parameter was acceptable, but
the second wasn't, the second would get merged with the first regardless
of allowable casts. By checking the parameters before the merging is
performed, these failures are detected.
If an intrinsic call with valid arguments is followed by one
with invalid arguments, the second will be accepted and casts added to
convert to the existing intrinsic because it has already been added to
the list of possible overloads. By disallowing the function matching
code to match intrinsics, this can't happen. What's more, it prevents
confusing error messages when the overload is invalid but existing
overloads have already been seen. The failed overload is all that is
processed so the error is the same as if it were the first use of that
intrinsic.
Even when failures are correctly detected, the messages produced were
less than helpful. By fully defining HLSL intrinsic candidates even if
they fail and adding them to the candidateset, with an indication of
which parameter was faulty, the existing code will report about these
faulty candidates when no others are available.
Reshuffle some type lists to make literals not the default, which leads
to confusing error messages.
Add a bunch of tests for various ways intrinsics can fail due to
parameters.
Fixes#2693Fixes#818
This conforms to the established -opt-enable -opt-disable flag standard
to enable this optimization as well as tying in with the semantic
defines mechanism.
* [spirv] Fix translation for WaveReadLaneAt
WaveReadLaneAt should be translated to OpGroupNonUniformShuffle, not OpGroupNonUniformBroadcast.
* Add comments.
A single character omission made WaveReadLaneFirst lack double support.
This corrects that omission, evading the unneeded validation error and
adds a test that covers various types for various wave and quad
intrinsics.
Fixes#2501
In SROA, we create a GEP and later expect it to be an Instruction.
However, if the indices are all constant, it will return a constant
expression, which crashes hard. By setting the flag added for just this
purpose, this is avoided.
This situation results when a second bitcast of a non POD type follows a
pair of memcpy replacements that propagate the global variable to the
bitcast operation. This test contrives a memcpy by assigning the global
to a local and then the bitcasts by passing the local into a function
call.
Adds to -opt-enable and -opt-select flags to the -opt-disable flag and
broadens the utility of all by storing the settings as strings instead
of booleans. -opt-select requires a parameter indicating the setting
name and another indicating the value.
options are repesented as lowercase until they are converted into
semantic defines in metadata. presence of semantic define prefixes
is no longer assumed when trying to write semantic defines to
metadata.
Semantic defines with the user-specified prefix will be propagated to
the corresponding -opt-<enable|disable|select> CodeGen option.
This allows either way of specifying these options to have the
same effect.
Adds two tests to verify that semantic defines will be
detectable as codegen flags. Another verifies that correct errors are
produced for contradictory flags.