R8G8_B8G8_UNORM and G8R8_G8B8_UNORM are YUV formats really in spite of
not explicitly indicating by their names. We are leaving such out of
format casting for now
* add metadata on numthreads for CS, AS, and MS
* add associated hlsl test file
* move test hlsl file to correct location
* update check values
* tested each shader type separately, added breaks in switch statement
* add some extra numthread checking to each separate type of shader in other existing tests
- Emit a diagnostic when input cannot be parsed.
- Add exception handling code, for exceptions that can originate from
DXC utility code.
This cannot be easily tested with FileCheck because we would have to
negate the dxil2spv return code.
Similar to how allocas are skipped when they are popped off of the
worklist and found to have no users, GVs are similarly removed and no
further processing is performed. That processing flattened the unused
GV, creating new child GVs and associating the debug info from the
parent with the children. Since the unused GV had all its uses replaced
with another, no debug info is found and the compiler crashed.
Also includes more graceful failure if debug info isn't found for any reason
Add a test that prompts that memcpy replacement order and verifies that
it doesn't crash and doesn't flatten the unused GV
A change was made to the order in which identically sized GVs are processed
in SROA to make the flattened GVs order consistently
The previous approach was assigning garbage to the final field of desc1
due to alignment padding in desc0 where that field would be.
This just duplicates the utility function behavior to copy one desc to
the new one
* Replaced the code to delete unread allocas in dxil-cond-mem2reg with calling the helper in dxilutils.
* Fixed a crash in the helper caused by walking the entry in forward order, which could invalidate next iterator.
* Added handling of memcpy's in the helper.
* Added tests that actually exercise the memcpy path.
The check for a const destination param for atomic functions mistakenly
encompassed the first param of atomic methods, which is an offset and
can very well be const. This changes the check to account for methods
when relevant.
Adds a test for this and a few other aspects of atomic methods
* Teach llvm-dis to read DXIL out of DXBC files
This change has no impact on the DXC codebase since we don't use
llvm-dis here, but in LLVM upstream I'm using llvm-dis from DXC to
verify bitcode compatability.
The change here is to intercept the file passed into llvm-dis to detect
DXIL Container files, and read the bitcode out of the DXIL part.
* Updating error handling code
Thanks for the review @tex3d!
The codepath to detect lifetime-marker only uses (bitcast -> lifetime intrinsic), failed to check the instruction itself, which could be a user function call directly using the alloca ptr instead of the expected bitcast for the lifetime marker scenario.
If that user function call was unused (for instance, if it returns void), then it will improperly assume that the instruction is inconsequential and only used by lifetime markers, or is unused, thus doesn't matter.
The fix is to only check onlyUsedByLifetimeMarkers if the instruction is a bitcast, since that's the pattern for lifetime marker use on the alloca. If the bitcast is unused, that's also safe to continue, since it does nothing on its own.
BreakUpArrayAllocas during DxilLoopUnroll had the same issue. Targeting and testing this led to several issues (crashes) around multi-dimensional arrays and library paths. For now, there is no simple way to pass multi-dimensional arrays directly to function calls across library boundaries, without either relaxing validation or lowering function parameter types. However, it's legal to use the offline-linking target with these, and that scenario shouldn't crash or generate bad code.
Tests have been added for multi-dim resource array passing and passing of a sub-array type to a user function, which both used to crash or assert.
Disable NRVO for entry functions and patch constant functions.
For entry functions (and patch constant functions) each write to an output argument creates a new call to dx.storeOutput (or dx.storePatchConstant). With RVO enabled, every write to the return variable becomes an output instruction, which could be excessive. To avoid this, disable NRVO for any entry functions and patch constant functions.
Backport an LLVM cmake option to specify the C/C++ linker to use.
This implementation is simplified and unlike the llvm-project one
does not contain any linker-specific tweaks.
On my system, setting `-DLLVM_USE_LINKER=mold` reduces the full build
time from 81s to 43s.
This change does a few things.
First and foremost, it disallows a number of illegal destination
parameters to atomic operations. This includes SRVs and other const
values, non-groupshared and non-resource variables, members of
typed resources, and bitfield members of any resource.
In addition, this correctly propagates const information so that they
can be properly rejected and also the information is reflected.
This involves the consolidation and earlier detection of a number of
these failures as well as an expansion of that detection.
Also adds validation checks that the targets are not const and either
UAVs or groupshared address space.
The const errors are produced in clang and manifest as overload
mismatches. The errors for member access of typed buffers, bitfields,
or non-groupshared and non-UAV destinations are generated
as part of the lowering operation. This is where they were before
and easier to replicate though they would be better in codegen.
In some cases, existing errors were moved earlier in the process
and the logic to generate them simplified.
Add compilation and verifier tests for all.
Also adds validation errors for const destination targets and other
invalid targets and tests for the same.
In order to make those tests, I had to give locations to the function
candidate notes for function builtins. This changed a single verifier
test that depended on the locationless notes.
These tests make use of const groupshared variables that are of
questionable utility, but are valid for now.
Incidentally enhances some existing atomics tests to use a more
complex sequence of references and subscripts.
Finally incidentally added some foolproofing for mistakenly comparing
opcodes without taking into account the "table" or namespace that
they belonged to by changing some utility functions in Sema to require
the table as well as the opcode to produce results.
Fixes#4319Fixes#4377Fixes#4437
Lifetime markers require a cleanup pass to eliminate some unnecessary
code, however one of those passes was JumpThreading, which only
eliminated certain dead blocks, which eliminated phi instructions
leaving looped self increments unconditionally incrementing themselves,
which ended up with infinite recursion when trying to process that
instruction.
By adding Instcombining first, all unreachable blocks are elimianted
first so the partial elimination doesn't leave such bad phi users. To
avoid losing convergence for DXIL specific operations, we must first run
the convergent marker pass, so these passes are moved a little later
than before.
ReplaceUsesForLoweredUDT was calling use.set(NewV), or calling dropAllReferences() on constant user.
This results in modification of the constant operand, which is illegal and led to corruption of constants.
Loop in ReplaceUsesForLoweredUDT relies on all uses being eliminated to terminate, so constant or potentially constant users are now handled with legal replacements and removeDeadConstantUsers at the end as needed.
CodeGen for unreachable break would normally skip generation, but DxBreak
mechanism would insert conditional branch still. Unreachable location
means the builder has no insertion point. This led to null preds for
blocks, and an AV in SimplifyCFG.
Checking CFG.HaveInsertPoint() is the mechanism used elsewhere to skip
CodeGen in unreachable locations, so this is added to EmitHLSLCondBreak.
* add non-uniform dynamic indexing option to dynamic resources uniform indexing test
* assign non-uniform results to different result indices, reduce thread count to 2, use NonUniformResourceIndex correctly
* add non-uniform indexing for samplers, add a definition option in the test, and update the fallback path.
* fixed global variable names, made some simplifications
* unified testing, hardcode 0 for fallback case
* checkpoint, some concerns were addressed
* change test name, make result arrays separate depending on non-uniformity, other fixes
* swapped loops, simplified verification function, fixed 2 fields that weren't reset, other stuff
* variable changes and condition clarity
Based on the Vulkan spec "15.1.5. Component Assignment":
"A Component decoration must not be specified for any type that is not a
scalar or vector."
we cannot decorate `Component` for array/matrix stage variables, which
is critical for the signature packing. We conduct the scalar replacement
for the stage variables to assign `Component` all stage variables, which
allows us to reduce the number of assigned `Location`s. This commit uses
the interface-variable-scalar-replacement spirv-opt pass for the SROA.
Note that the interface-variable-scalar-replacement spirv-opt pass is
experimental. We want to avoid the side effect caused by the pass as
much as possible. Therefore, we enable the pass only when the option for
the signature packing is enabled.
This introduces the correct mechanism for creating resources that might
be cast to a different format of the same bit size when enhanced
variables are found to be available. It also limits the formats to those
that are natively castable unless the relaxed format casting feature is
available.
To allow this, the resource creation utility function now has the
ability to provide these potential casting targets if supported, new
queries for these features are added, and some of the formats were shuffled
to make it easier to isolate the relevant ones.
Also, retreive advanced tex ops caps bits from from proper options struct
The previous implementation drew from options13, but the settings ended
up on options14
Add FileCheck testing to dxil2spv, and a first static vertex shader test case to test #4426. Also includes documentation for how to add additional tests, and replaces the existing WholeFileTestFixture with regular FileCheck.
* enable optimization passes to throw hlsl exceptions to abort further compilation and capture relevant error data
* add HLSL change comments, and simplify Diagnostics data saving
* throw legalization exception, give more info when aborting
* change exception throw back to return, and remove error string
When we handle `Load/Store` method of ByteAddressBuffer, we first load
the ByteAddressBuffer object if it is a reference of an alias variable.
`this` object is a reference of an alias variable if a
ByteAddressBuffer is a member of the class. The existing code checks
members of the class to determine if it has ByteAddressBuffer or not,
but it does not check its parent. Since we add all members of a parent
to its child class, we have to check members of the parent as well.
This commit fixes the issue and now `loadIfAliasVarRef(..)` properly
loads the access chain to ByteAddressBuffer.
For example,
```
// HLSL:
struct Base {
ByteAddressBuffer buffer;
};
struct Child : Base {
float load(in uint offset) {
return asfloat(buffer.Load(offset));
}
};
// Before:
%param_this = OpFunctionParameter %_ptr_Function_Child
%base = OpAccessChain %_ptr_Function_Base %param_this %uint_0
%ptr = OpAccessChain %_ptr_Function__ptr_Uniform_type_ByteAddressBuffer %base %int_0
OpAccessChain %_ptr_Uniform_uint %ptr %uint_0 ; <-- Error!!
// After:
%param_this = OpFunctionParameter %_ptr_Function_Child
%base = OpAccessChain %_ptr_Function_Base %param_this %uint_0
%ptr = OpAccessChain %_ptr_Function__ptr_Uniform_type_ByteAddressBuffer %base %int_0
%buffer = OpLoad %_ptr_Uniform_type_ByteAddressBuffer %ptr
OpAccessChain %_ptr_Uniform_uint %buffer %uint_0
```
SV_Position elements in DXIL input signature should be treated as SPIR-V
BuiltIn variables with the Position type rather than regular stage IO
variables with a Location.
The already broad i64 atomics testing was missing most coverage for
texture (image) types. This adds to atomicop_i64, and the various
atomic*_i64_and_i32 tests to use int64 texture types and verify the
DXIL output
The following subtests have updated expected results to account for
helper lanes being a part of wave operations in SM 6.7:
- WavePrefixSum
- WaveMultiPrefixSum
- QuadReadAcross for lane 0
- WaveActiveBitXor
The WaveReadLaneFirst waterfall loop test has also been updated to use a
new loop structure for SM 6.7 as suggested by Tex.
GetBlobAsWide was added as a new method to the existing interfaces,
but the existing GetBlobAsUtf16 should have simply been renamed to
GetBlobAsWide instead, since that's what GetBlobAsUtf16 actually did.
Otherwise interface compatibility is broken between builds/releases.
If we want to add a GetBlobAsUtf16 which guarantees UTF16 across platforms,
we can add that in a future interface revision.
Resource used as UAV must have TransitionTo="UNORDERED_ACCESS" for correct
runtime state.
These tests (by name in ShaderOpArith.xml) were missing this:
- Derivatives
- QuadRead
- ComputeSample
- ProgOffset
- SampleCmpLevel
- Saturate
- AtomicsRoot
- AtomicsHeap
- FloatAtomics
* [SM6.7][WritableMSAA] Front-end and test fixes
This change fixes DXC to properly pass the SampleCount attribute to DXIL
metadata. A few test issues are also fixed, namely:
- Do not create upload resource for MSAA textures, since this will cause
a sample count mismatch in the runtime and we do not need the upload
resource anyway.
- Fix inverted logic to detect support for ATO and WritableMSAA.
- Pass D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET resource flag when
creating MSAA resource, which is required by the runtime.
* Actually return after skipping ATOWriteMSAATest
* Fix props_ms.hlsl test case encode sample count
Co-authored-by: Tex Riddell <texr@microsoft.com>
Writable MSAA ExecutionTest was setting TestResults::Skipped for devices
that have the necessary support and continuing to run the test.
Follow the style of the RawGather ExecutionTest.
* Made nologo option hidden
* working tests for compute and ps/vs shaders
* working tests for compute and ps/vs shaders
* Fix test and add fallback mode for SM 6.0 to verify test logic
* Correct StructuredBuffer descriptors
* add separate result buffers for each shader, other changes from Tex
* fallback works, add transition to field to uav's
fallback works, add transition to field to uav's. 6_6 still returns all 0's, and there's a message about highest debug layer model supported being 6_5
* fallback works, and on non-fallback, CS works.
* change hardcoded to true rval
* swap root signature and set descriptor heaps so that heaps comes before root
* Remove unnecessary comments, fix indexing to be completely dynamic
* remove extraneous resources in root sig, and use Test_F macro in execution test
Co-authored-by: Tex Riddell <texr@microsoft.com>