Fix module loading and validation issues
There are a couple interacting issues:
1. When RefreshCache occurs, it doesn't fill in the m_pResRetType cache.
- This is because in GetOpFunc, if it finds a function matching the
expected name, it simply returns that, skipping function type
construction.
- This causes a validation error if CheckAccessFullyMapped is validated
before a function that calls it. This depends on the order of
function declarations/definitions in the module.
- The solution is to move the check after type construction, and verify
that the function type matches expectations as well.
2. When RefreshCache() and FixOverloadNames() was called from
DxilOperations
constructor, the m_LowPrecisionMode has not yet been set, leading to
incorrect overload types when allowing function type construction to
proceed in GetOpFunc before looking for a matching function by name.
- This leads to an incorrect type for 16-bit CBufRet type being used.
- The solution is to move RefreshCache() and FixOverloadNames() calls
to after we actually know the m_LowPrecisionMode, so that the correct
types may be constructed.
Rename hlsl::OP::SetMinPrecision to InitWithMinPrecision to make clear
what's being done.
Added validation tests to verify the fixes.
Fixed validation test that produced invalid DXIL op function signature.
Fix HLModule not setting min-precision mode when loading from metadata.
Fixes#5879
This commit fixes a crash in the compiler when lowering a groupshared
variable with a multi-dimensional array type. The root cause of the bug
was that we had a nested gep expression that could not be merged into a
single gep because of an intervening addrspacecast.
The `MultiDimArrayToOneDimArray` pass flattens the multi-dimension
global variables to a single dimension. It relies on the `MergeGepUse`
function to flatten any nested geps into a single gep that fully
dereferences a scalar element.
The fix is to modify the `MergeGepUse` function to look through
addrspacecast instructions when trying to merge geps. We can now merge
geps like
gep(addrspacecast(gep(p0, gep_args0)) to p1*, gep_args1)
into
addrspacecast(gep(p0, gep_args0+gep_args1) to p1*)
We also added a call to `removeDeadConstantUsers` before flattening
multi-dimension globals because we can have some dead constants hanging
around after merging geps and these constants should be ignored by the
flattening pass.
DXIL doesn't support the bswap intrinsic, normally a backend would lower
this intrinsic to something else, in this case we don't have a backend
so we just end up generating invalid DXIL.
The solution here is to disable bswap matching when targeting DXIL.
Fixes#5104
1. printf format mismatch.
2. avoid cast from CComPtr<IncludeHandlerVFSOverlayForTest> to
CComPtr<IDxcIncludeHandler>.
3. fix signed unsigned mismatch.
4. fix order of fields in constructor.
5. add override for override methods.
6. port
01f4209631
7. add copy assignment operator to avoid
-Wdeprecated-copy-with-user-provided-copy
8. disable -Wignored-qualifiers for taef header WexTestClass.h.
9. remove unused fields.
10. add -clang-cl for hctbuild.
Header changes to make Direct3D WSL build work:
- Move non-WIN32 IMalloc definition back to `WinAdapter.h` (it was there
until commit 44f8833938)
- Define DEFINE_ENUM_FLAG_OPERATORS only if it was not already defined
1. add noexcept for methods with __declspec(nothrow)
2. add override for override methods.
3. add const when cast const ptr for a const argument.
4. change 0 to 0u for immediate to avoid signed unsigned mismatch.
1. fix order of fields in constructor.
2. fix compare warning for type mismatch by cast to same type.
3. use {} instead of {0} when initialize to all zero.
4. add override for override methods.
1. signed unisgned compare warning.
Fixed by change 1 into 1u. Or cast directly.
2. writeable string.
Use const char/wchar_t * instead of char/wchar_t*.
3. lost const when cast.
Add const.
This change applies clang-format to all of the sources added in DXC that
were not part of the original LLVM/Clang 3.7 release when DXC forked.
This provides consistent code formatting across the codebase.
Fixes#5591.
This integrates formatting generated sources with clang-format with the
hctgen generation process.
This change makes a few small adjustments to how the build-time header
generation works. This change, disables automated build-time generation
for C++ sources if you don't have clang-format available on your system.
CMake can automatically detect clang-format installed as part of Visual
Studio, or based on your PATH. You can also explicitly set
`CLANG_FORMAT_EXE` when configuring to manually override. **Developers
on macOS** will need to install clang-format manually and place it on
their path to generate DXC's generated sources.
If clang-format cannot be found and `HLSL_COPY_GENERATED_SOURCES` is On
a fatal error will be reported at configuration time.
This change _does not_ make clang-format a requirement to build DXC, but
it does make it a requirement if you are modifying the generated
sources.
AllocVFSIncludeHandler was returning it's IUnknown object by Detach()ing
from a CComPtr, then assiging to CComPtr. This would result in the
object have a ref count of 2, and when the second CComPtr destructed, it
would go down to 1, and leak memory.
The fix is to return the object as a CComPtr; however, with Clang, this
failed to compile because it finds no conversion from a CComPtr of child
class type to a CComPtr of base class type. This compiles fine on MSVC,
so I debugged it, and realized that MSVC compiles this as first a
conversion to T*, then a constructor call. I suspect MSVC is wrong here.
In any case, this change adds a template conversion constructor to
CComPtr, very much like the existing template assignment operator.
`DxilRuntimeReflection.h` uses `size_t`, which is defined by
`<cstddef>`.
`dxcutil.h` uses `hlsl::SerializeDxilFlags` by value whcih is declared
in `DxilContainer.h`.
`D3DReflectionStrings.h` relies on windows tyoes like `LPCSTR`. It also
includes `D3DReflection.h`, which in turn includes `d3d12shader.h` (from
DirectXHeaders), which relies on basically the whole `WinAdapter.h`.
I've included `WInAdapter.h` in both `D3DReflectionStrings.h` and
`D3DReflection.h` to make both headers indipendently resilient.
Add -Wno-unused-parameter -Wno-unknown-pragams -Wno-switch which are
enabled for clang in linux build.
Fix
-Wimplicit-fallthrough
-Wconstant-logical-operand (Thanks patch from Chris)
-Wunused-function
-Wunused-variable
-Wtrigraphs
-Wnonportable-include-path
Still more warnings to be fixed.
We don't keep these annotations up to date or rely on them, so to
simplify our platform layering and ease code formatting this change
removes all the SAL annotations from internal APIs. This change also
replaces most `_Analysis_Assme_(...)` annotations with `assert(...)`.
One `_Analysis_Assume_` in ParseDecl.cpp needed to be updated because it
was incorrect. The code was `_Analysis_Assume_(assert(name.size() >
sizeof("space"));`. When converted to an `assert` it fired in our test
suite because the sizeof a string literal includes the null terminator,
but the size of a StringRef does not.
A few of the `_Analysis_Assume_` annotations were removed because they
didn't make sense (like the ones inside the `DXASSERT` implementation,
and a few others were removed to avoid introducing additional header or
linkage dependencies.
This change does not introduce any functional or behavioral changes.
Add dxcutil::ComputeSerializeDxilFlags to share code with linker
Use refactored ComputeSerializeDxilFlags in DxcCompiler::Compile()
---------
Co-authored-by: Tex Riddell <texr@microsoft.com>
including `WinIncludes.h` here pulls in a bunch of dependencies like ATL
even though we really don't need those in this header. Given that the
only things we're using that include for is to use LPCWSTR and LPCVOID
spellings in a way that works on both windows and unix, just expand
those macros to their definitions instead.
This change fixes a few of the most awful clang-format missteps in the
codebase. I'm not intending to fix all the things clang-format will do
something bad with, just a few of the worst ones.
Further formatting fixups can be done after the clang-format change goes
in.
One more batch of NFC changes to get DXC ready to clang-format. This
change disables clang-format of some include blocks and breaks other
include blocks into groups. With this change all of the sources in DXC
can be clang-formatted and will continue to build on all supported
platforms.
A downstream consumer requires the use of custom function lowering using
a json string as well as methods. For instance, a direct memory load not
based on a resource.
This change prepares DXC sources to be ready to clang-format. This
change adds annotations to skip formatting parts of the code that will
break if clang-format modifies them due to include ordering or generated
C++. It also adds missing include and forward declarations where
clang-formatting resulted in moving things around that caused code
breakages.
Refactored -opt-enable and -opt-disable.
- Added `hlsl::option::OptimizationToggles` to manage the toggles
instead of checking a `std::map`.
- All the options are moved into `hlsl::options::TOGGLE_*` constants,
where each constant contains both the option's name and whether it's
default on or off.
+ Previously, every check had to be either `toggles.count("my-flag") &&
toggles.find("my-flag")->second` for default off, and
`!toggles.count("my-flag") || toggles.map.find("my-flag")->second` for
default on.
+ Now, a check is simply `toggles.IsEnabled(TOGGLE_MY_FLAG)`.
Add any_sampler builtin parameter type. This will match any sampler type
as the select() built-in added with HLSL 2021 requires.
Adds a new built-in overload that takes the any_sampler type and ensures
that the last two arguments and return type match. Many of the changes
were required because we never had to ensure object parameter types
matched nor transmit that type to the return type before. Most of the
changes to MatchArguments in SemaHLSL are addressing new issues this
revealed. Non-basic types can pass the conversion check if they outright
match.
Adds testing to select() and the ternary operator that it supplements
for these usages
Fixes#4673
Merged from
0a92aff721
This patch removes all uses of `std::iterator`, which was deprecated in
C++17. While this isn't currently an issue while compiling LLVM, it's
useful for those using LLVM as a library.
For some reason there're a few places that were seemingly able to use
`std` functions unqualified, which no longer works after this patch.
I've updated those places, but I'm not really sure why it worked in the
first place.
Merge upstream change
91c8daf300
In converting this over to iterator_facade_base, some member
operators and methods are no longer needed since iterator_facade
implements them in the base class using CRTP.
This is part of enable C++17.
Allow new/delete operators to be called before DllMain and allocator
initialization in DxcInitThreadMalloc().
The operators can be called before DllMain from CRT libraries when
static linking is enabled. If that happens, the new/delete operators
will fallback to the standard allocator and use CoTaskMemAlloc/Free
directly instead of CoGetMalloc, Alloc/Free & Release for better perf.
Enables fixing of
[#5163](https://github.com/microsoft/DirectXShaderCompiler/issues/5163)
and
[#5178](https://github.com/microsoft/DirectXShaderCompiler/issues/5178).
Add another template typename for pointer <-> ilist_iterator comparison
operators. C++20 considers these ambiguous for the same reasons outlined
in https://github.com/godotengine/godot/issues/54058
The rest of the changes are general bug fixes which are now caught by
the compiler.
Last set of cherry picks for reverse iteration, this time enabling
utilities that the ReverseIterator test uses in upstream
---------
Co-authored-by: Tim Shen <timshen91@gmail.com>
Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>
Co-authored-by: Justin Lebar <jlebar@google.com>
Co-authored-by: Mehdi Amini <mehdi.amini@apple.com>
Add macro DXC_DISABLE_ALLOCATOR_OVERRIDES which, if defined, will use
the default CRT allocator for new/delete and malloc/realloc/free.
We need this on Chrome for our GN build of DXC, because our component
builds use libc++as a shared library, and we end up with mismatched
new/delete calls between libc++ and these custom overrides.
---------
Co-authored-by: Chris Bieneman <cbieneman@microsoft.com>
This is the first commit in a series that will enable array of
RW, consume, and append Structured buffers. This commit removes the
error when one is used on its own.
There will be a failure if a counter variable is needed.
This is the first PR is a series that will slowly enable different
features for arrays of RWStructuredBuffers. To see the general direction
the changes will go see
https://github.com/microsoft/DirectXShaderCompiler/pull/5407.
These are to prepare for a 'zip' change that makes writing iterator
tests a lot easier (and is used heavily in ReverseIterator changes still
to come).
---------
Co-authored-by: Tim Shen <timshen91@gmail.com>
Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>
Add `-fspv-preserve-interface` CLI option to prevent DCE optimization
pass from compiling out interface variables. It happens if these
variables are unused.
The option may be useful when decompiling SPIR-V back to HLSL.
Personally, I need it to convert DX12 shaders to DX11 ones using
SPIRV-Cross as a tool for converting SPIR-V, produced by DXC, to the old
shader model HLSL.
SPIR-V Tools now have a parameter in `RegisterPerformancePasses()` and
`RegisterLegalizationPasses()` for this. This PR creates a new command
line option in DXC and passes it to the `spvtools::Optimizer`.
Closes#4567
This PR is the first of 2 PRs to prevent libraries of different compiler
versions from linking together. This first PR implements adding the
compiler version information into the Dxil Container itself, to allow
for comparison later. It also adds some basic validation to the new part
in the Dxil Container.
Cherry-picks to get some good changes, specifically in STLExtras
---------
Co-authored-by: Pete Cooper <peter_cooper@apple.com>
Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>
The macro LLVM_ENABLE_ABI_BREAKING_CHECKS is moved to a new header
abi-breaking.h, from llvm-config.h. Only headers that are using the
macro are including this new header.
LLVM will define a symbol, either EnableABIBreakingChecks or
DisableABIBreakingChecks depending on the configuration setting for
LLVM_ABI_BREAKING_CHECKS.
The abi-breaking.h header will add weak references to these symbols in
every clients that includes this header. This should ensure that a
mismatch triggers a link failure (or a load time failure for DSO).
On MSVC, the pragma "detect_mismatch" is used instead.
Differential Revision: https://reviews.llvm.org/D26876
llvm-svn: 288082
Co-authored-by: Mehdi Amini <mehdi.amini@apple.com>
Continuing work from
https://github.com/microsoft/DirectXShaderCompiler/pull/5329
Another 2 cherry picks. Bit bigger this time, but no changes required
apart from the free->delete change as before
---------
Co-authored-by: Matthias Braun <matze@braunis.de>
Co-authored-by: Benjamin Kramer <benny.kra@googlemail.com>
There are a number of changes required to get ReverseIterator support
into DXC. Rather than attempting to do it all as one large PR, it makes
more sense to break it up into smaller PRs of 2-3 changes.
These first two are just some small refactoring in the SmallPtrSet
classes and should not change any behavior.
There was one small change in the DXC codebase that was necessary to
bring forward which deviates from the original cherry-pick slightly
(will mark it in a comment).
---------
Co-authored-by: Matthias Braun <matze@braunis.de>
For `-Od`/`-O0` compilations, the compiler generates nop instructions in
the form of:
```
@dx.nothing.a = internal constant [1 x i32] zeroinitializer
...
%0 = load i32, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @dx.nothing.a, i32 0, i32 0)
```
This instruction is intended to be a complete no-op, and backends that
recognizes this pattern can turn it into proper hardware specific no-op
instructions for debug purposes.
When linking DXIL libraries, the passes that run end up removing these
nops. This change adds a new pass `DxilReinsertNops` to run at the
beginning of linking to read the `LoadInst`s representing nops and turn
them back into calls of `dx.noop`, before turning back into loads again
at the end.
This PR adds a new PIX pass that instruments DXR 1.0 shaders
(intersection, any/closest hit and miss). The new pass emits each shader
invocation's [ray system
values](https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#system-value-intrinsics)
and DispatchRaysIndex into a UAV that can be read back later. This will
enable new DXR debugging features in PIX.
When using libc++, the build fails with errors like:
error: incomplete type 'clang::DeclContext::udir_iterator' used in type trait expression
: public integral_constant<bool, __is_base_of(_Bp, _Dp)> {};
^
note: while substituting deduced template arguments into function template 'iterator_adaptor_base' [with U = const llvm::iterator_adaptor_base<clang::DeclContext::udir_iterator, clang::DeclContextLookupResult::iterator, std::random_access_iterator_tag, clang::UsingDirectiveDecl *> &]
class iterator_adaptor_base
This was fixed in upstream LLVM in this CL:
e78e32a443
(also see original review: https://reviews.llvm.org/D22951#change-zKJSAlLXXy11)
Unfortunately, the CL does not explain why this change was made, so I
can only assume that it was failing a libc++ build as well.
I also added the static_assert that was later added in this CL:
0aecae3452
This restores the build failure that would occur if U is not a base of DerivedT.
Although seemingly unnecessary for CMake builds of DXC, for our
GN build, we needed to write a script to replicate the behaviour of
CMake's "configure_file". Having an variable expansion with no variable
("${}") made the script more complex. Besides, this looks like a bug.
- Added `-opt-enable partial-lifetime-markers` to only generate `@llvm.lifetime.start` and not `@llvm.lifetime.end`.
`@llvm.lifetime.start` can sometimes be helpful for codegen, but inserting `@llvm.lifetime.end` in the right places sometimes requires modifying cfg in ways that could negatively impact codegen quality. This change adds a way to generate only `@llvm.lifetime.start` and not `@llvm.lifetime.end` for all the benefits it provides without the problems of inserting `@llvm.lifetime.end`.
The added `-opt-enable partial-lifetime-markers` is off by default, and only kicks in when lifetime-markers are enabled.
In cases where generating `@llvm.lifetime.end` is easy (around function calls for arguments), they are generated as normal.
-----------------------------------------------------
In certain cases, lifetime markers can help reduce register pressure by reducing lifetime of values. In this example:
```
for (int i = 0; i < N; ++i) {
S state;
if (conds[i]) {
state.x0 = input[i+0];
}
if (conds2[i]) {
output[i+0] = state.x0;
}
}
```
`State state` is declared inside the loop, but the alloca is effectively "hoisted" to the entry block, and the compiler has no knowledge that no value of `state` flows between loop iterations. The codegen ends up looking like this:
```
loop:
%last_iteration_val = phi float [ %new_val, %if.end.b ], [ undef, <preheader> ] ; <--- Value from previous iteration.
br i1 %same_cond, %if.then.a, %if.end.a
if.then.a:
%new_val.then = ...
if.end.a:
%new_val = phi float [ %last_iteration_val, %loop ], [ %new_val.then, %if.then.0 ]
br i1 %same_cond, %if.then.b, %if.end.b
if.then.b:
store %new_val
br %if.end.b
if.end.b:
br i1, <loop-exit>, %loop
```
`mem2reg` has special logic for handling `lifetime.start`, where it basically assumes it as a def for the alloca. With lifetime-markers turned on, the IR ends up looking like this:
```
loop:
// Notice the phi is gone.
br i1, %if.then.a, %if.end.a
if.then.a:
%new_val.then = ...
if.end.a:
// The undef is now here!
%new_val = phi float [ undef, %loop ], [ %new_val.then, %if.then.0 ]
br i1 %same_cond, %if.then.b, %if.end.b
if.then.b:
store %new_val
br %if.end.b
if.end.b:
br i1 %same_cond, <loop-exit>, %loop
```
When `state` is a big struct, and when the loop has many temporary values, the phis for previous loop iterations can cause very large register pressure.
* [SPIR-V] Add fspv-preserve-bindings option to preserve decoration bindings even when unused
* Set preserve bindings option in SpirvEmitter::spirvToolsLegalize() also