GlobalOpt has an explicit test for a function called "main" (eww...) to enable an optimization. This updates the test to match the entry point, independent of its name.
Instead of hard-coding properties here, constructing every overload of
every resource method we know about, then iterating users, we:
- iterate through used intrinsic functions, collecting the ones that
have handle parameters (resource accesses). This will pick up all
used overloads.
- Instead of hard coding read/write access in a table here, we use the
function attribute which carries this same information for main resource
operations.
- Exceptions to matching function attribute:
GetDimensions: DescriptorRead, but commented to match prior behavior
BufferUpdateCounter: Counter
TraceRay[Inline]: Read, but commented to match prior behavior for now
- Then, this type of resource access is emitted for the first handle arg
- Subsequent handles are considered DescriptorRead operations,
- DescriptorRead currently just aliases to Read, but could be useful in
the future if we want to differentiate or support GetDimensions.
Move dxclib one level up / out of dxc directory
Remove unused member in DxilRuntimeData
Change DxilRuntimeReflection handling of strings to be platform independent
Fix opt tool (DxcSetThreadMallocOrDefault no longer exist)
- Array of [1] would crash in UpdateStructTypeForLegacyLayout
since IsResourceArray would be false.
- Unbounded array has weird quirk in D3DCompiler reflection:
all resources report BindCount == 0, except cbuffer reports UINT_MAX.
Added simple unbounded cases for other resource types to verify.
- Fxc has some auto-binding bugs.
Worked around in the test with explicit binding.
- Added suggested helper functions for strip/wrap array types.
Used in code that's part of this change, but didn't search for other
places to replace yet.
- base of struct should always be aligned - or internal bug
- offset for array member must always be aligned - (new) validation error
- alloc and verify struct layouts even when not array field
- out of bound check would have missed OOB on last array element
Copy of original annotation would copy original struct type,
instead of one for legacy layout. This leads to internal validation
using a different type when validating cbuffer layout than external
validation would use, since this member would be synthesized by the
key, rather than saved and restored directly.
This would break when official DXIL.dll was used, since it would exclude
the shader hash part that the validator wouldn't recognize, and is not
required for a valid shader. This prevented shader signing for the
output of newer compiler versions.
Add IDxcSourceLocation::GetPresumedLocation as a wrapper for clang_getPresumedLocation.
This is important for (e.g.) showing errors in the correct location in IDEs when the #line directive is used, primarily in generated code.
* Hide Unix symbols by default
Using the -fvisibility=hidden by default only exposes symbols that have
the __attribute__((visibility("default"))) attribute applied to them.
There are only a couple of functions that are meant to be exposed from
libdxcompiler. To expose the proper functions, the macros used to create
the UUIDs for class types are redefined with the attribute and
DXC_API_IMPORT is defined with the attribute above for non-windows
platforms.
Since we're hiding everything now, the portions that were explicitly
hidden to make MacOS work are removed.
This exposed a number of missing dependencies of libraries and unit
tests. Namely clang-hlsl-tests, clang-spirv-tests, and libclangCodeGen.
Resolves#2203
* Remove explicit marking of DxcThreadMalloc hidden
This was a workaround that is not longer necessary since all symbols are
hidden by default on Unix platforms now.
* Moves the implementation of DxcThreadMalloc to dxcmem.cpp; removes DxcSwapThreadMalloc
* Remove DxcSetThreadMalloc(OrDefault) -- they were always been used for installing the default allocator
* Deletes copy and move ctors and assignment
* stores the allocator in the TLS slot.
* DxcSwapThreadMalloc should be able to install a null allocator
* Marks the DxcThreadMalloc members as hidden (linux only)
Adds support for templatized RWRawByteBuffer.Store<T>. To avoid SROA making us lose the original layout of any struct arguments, a new pass runs before SROA and breaks down such cases into per-element stores. So better be careful with the likes of buf.Store(0, (int[65536])0);...
-Qstrip_reflect would reserialize the root signature, leading to
validation failure #2162. Fixed by moving root sig to writer to clear
from module and prevent re-serialization to metadata.
Fixed -Qstrip_debug with -Zi and no output location still embeding
debug module.
- New -Qembed_debug is required to embed PDB in shader container
- -Zi used without -Qembed_debug will not embed debug info anymore,
and will issue a warning from CompileWithDebug().
- When compiling with Compile() and -Zi, -Qembed_debug is assumed
for compatibility reasons (lots of breaks without it)
- In dxc and CompileWithDebug() -Fd implies -Qstrip_debug
- Debug name is based on -Fd, unless path ends with '\', meaning you
want auto-naming and file written under the specified directory
- Debug name always embedded when debug info used, or -Fd used
- -Fd without -Zi just embeds debug name for CompileWithDebug(),
still error with dxc, since it can't write to your file.
- If not embedding debug info, it doesn't get written to the container,
only to be stripped out again.
- Fix padding for alignment in DebugName part.
- Default to DebugNameForBinary instead of DebugNameForSource if no
DebugInfo enabled
- Also fixed missing dependency on table gen options from libclang
DxcCreateBlobOnHeapCopy allocated using the COM allocator and freed using the default thread allocator, leading to crashes.
But more generally, some DxcCreateBlob functions would take ownership of the passed-in buffer without also having the caller explicitly specify what IMalloc to use for deallocation, which is an error-prone pattern, so I reworked these methods a bit.
This also lead me to find some memory leaks and general memory mismanagement in the the rewriter.
* Allow clip/cull elements to be declared as array [2]
- This approach fixes validation and packing to handle this case.
- There could be implications to runtime ViewID validation
- fix some issues found in packing related to rowsUsed result from Pack
functions. Make these return 0 on failure, instead of startRow.
- Split PackNext into FindNext and PackNext that uses it for greater
flexibility.
After loading a vector from a cbuffer, we attempt moving the debug info upstream, to the individually loaded elements, assuming that they got converted into a vector by a sequence of insertelement instructions. This is not the case for bool vectors, because the insertelement sequence will be followed by a mem-to-reg conversion. Made the code more resilient to such unexpected patterns (better to lose debug info than to crash compilation).
Some large shaders exhibit a behavior where phi nodes are created for resources, in which one of the possible incoming values is undef. This gets cleaned up later such that there are no undef resources left. However the fail-undef-resources pass has already failed compilation by that point. The fix is to change that pass to invalidate-undef-resources and replace the undefs with a special invalid value, such that we can produce an error later if still necessary, when optimization passes have been run such that temporary undef resources have been eliminated.
These new DXIL instructions are added to SM 6.5. The valid operations
for <Op> are:
- BitAnd
- BitOr
- BitXor
- CountBits
- Product
- Sum
In HLSL, these are exposed as:
uint4 WaveMatch(<type> val)
<type> WaveMultiPrefixBitAnd(<type> val, uint4 mask)
<type> WaveMultiPrefixBitOr(<type> val, uint4 mask)
<type> WaveMultiPrefixBitXor(<type> val, uint4 mask)
uint WaveMultiPrefixCountBits(bool val, uint4 mask)
<type> WaveMultiPrefixProduct(<type> val, uint4 mask)
<type> WaveMultiPrefixSum(<type> val, uint4 mask)
In DXIL, these are exposed as:
[BitAnd,BitOr,BitXor,Product,Sum]
%dx.types.fouri32 @dx.op.waveMatch.T(i32 %opc, T %val)
T @dx.op.waveMultiPrefixOp.T(i32 %opc, T %val, i32 %mask_x,
i32 %mask_y, i32 %mask_y, i32 %mask_z,
i8 %operation, i8 %signed)
[CountBits]
i32 @dx.op.waveMultiPrefixBitCount(i32 %opc, i1 %val, i32 %mask_x,
i32 %mask_y, i32 %mask_y,
i32 %mask_z)
Scalarization of vector types occur as per the existing wave intrinsics.
For WaveMatch, the match is performed on each scalar and the results
are combined with bitwise AND. For WaveMultiPrefix, the operation is
performed on each scalar and combined into an aggregate.
- add addrspace stress test and addrspace inst test
- add ll test for cleanup pass targeting paths that may be difficult to
hit from HLSL
- fix handling of base-class bitcast cast for structured buffer
- handle addrspacecast in SimplifyBitCast during CodeGen
- set address space for groupshared QualType and fix downstream effects
Fix:
- double LValue expression emit for in aggregate arguments
- in agg param modifying caller's value instead of copy
- groupshared matrix support in HLMatrixLower
- groupshared base class member access
- groupshared matrix member casting in class method
Still in need of more fixes and tests:
- incomplete array and auto dimensions from initializer
Argument handling still needs overhaul. This fix retains old behavior, even
when not quite correct to avoid worse regressions. Objects are not copied
in, aggregate LValueToRValue cast expr is emitted as LValues instead of
RValues because RValue emit path doesn't have HLSL changes necessary to handle
certain cases, such as derived-to-base cast.
Updates the HLMatrixLower pass logic to eliminate the nondeterminism and randomly bogus codegen that came from iterating over a pointer-keyed map and modification of that map during the iteration. The new approach does a single pass on every instruction consuming or producing matrices, replacing it by its vector equivalent. Any consumed matrix is replaced by a temporary mat-to-vec translation stub, and any formerly produced matrix is emitted as a temporary vec-to-mat translation stub. Stubs get cleared as both ends of a consumer-producer dependency get lowered.
Lots of different code in CGHLSLMS.cpp did numerical conversions in slightly different ways, most of them forgetting to take into account bools. This change introduces a single ConvertScalarOrVector function which handles both and updates code to use it. This implied a partial rewrite of const init list handling code because they lost the QualType information necessary to properly cast. They were also not handling register/memory representations properly so that got fixed it at the same time.
llvm::Twines are dangerous. When concatenated, they store a pointer to their operands to defer the actual string concatenation as long as possible. Hence every Twine being concatenated must be kept alive. FormatMessageWithoutLocation took a twine by value, concatenated, and returned the result. This means that the concatenation pointed to the by-value argument, which gets destroyed when the function returns.
The parameter SROA's LowerMemcpy function deletes the memcpy and bitcast instructions after it has processed them. However, higher up the stack we created an IRBuilder pointing to the first non-alloca instruction, which happens to be the bitcast being deleted. Hence any further use of that IRBuilder will crash.
The solution is to delay creating the IRBuilder until after LowerMemcpy, since the rest of the code is well-behaved with respect to instruction deletion (ie it defers them to the end).
This change fixes an inconsistency between scalar and vector bools, where the formers were stored as i32s in memory (as opposed to i1s in registers) while the latter remained <n x i1>s. This lead to a number of crashes in more complex bool scenarios where scalar and vector element types were expected to match.
The solution is to extend Clang's strategy for scalar bools to vector of bools. Clang relies on three key functions to deal with differences between register and memory type representations which are special-cased for bools and now also for bool vectors. Several HLSL code additions did not properly leverage those functions and this was addressed wherever possible, in some cases removing the need for special cases for bools.
To deal with matrices, a similar concept of register/memory representation was introduced in the matrix lowering code and the lowering passes were updated accordingly.
Added front-end `-flegacy-resource-reservation` to control behavior (added automatically for SM <= 5.0)
Added an error for unbounded resources used with `-flegacy-resource-reservation`
Added a mechanism for `DxilModule` to preserve intermediate options during its passes but not in the final DXIL
Changed `HLModule` to not remove unreferenced `DxilResources` but rather make their symbol UndefValue
Fixed assumptions of `DxilResources` having a valid `GlobalVariable`
Added reserved resource range gather pass before elimination of unused resources (if SM <= 5.0)
Changed resource allocation to account for reserved ranges
* Fix subobject raw bytes storage; don't point into RDAT for reflection
DxilSubobject:
- combine string and raw bytes storage in m_BytesStorage
- use StringRef as key, since memcmp is used with explicit length
DxilRuntimeReflection:
- add m_BytesMap for local unique copy of raw bytes in RDAT
to prevent pointing into RDAT blob for root sig
- clean up some insertion patterns
* Rename Get[SubobjectString/RawBytes] to Intern[String/RawBytes]
- re-serialize module when changed by removing either subobjects or
root signature metadata
- improve error detection/handling for loading/creating subobjects
- load subobjects from RDAT if not in DxilModule for RDAT comparison
- fix subobjects missing from desc in DxilRuntimeReflection
- default ReducibilityAnalysis to pass so it doesn't fail when no
functions are present
- report error on subobject if validator (< 1.4) will not accept
them, so they will not be captured
- container validation for required/disallowed parts for libraries
- Replace HitGroup subobject by TriangleHitGroup and ProcedurePrimitiveHitGroup
- Add internal HitGroupType enum that maps to D3D12_HIT_GROUP_TYPE
- Add corresponding HitGroupType field to metadata, RDAT and reflection
* Add root signature flags
- Add LOCAL_ROOT_SIGNATURE flag in RootFlags
- Add DESCRIPTORS_STATIC_KEEPING_BUFFER_BOUNDS_CHECKS flag in DescriptorTable
- New DxilRootSignatureCompilationFlags argument on Compile/ParseRootSignature
to specify whether the root signature is local or global.
* Fix root signature flag issues, add cmd test for root sig from define
- Add HLSL subobject classes to clang and enable their initialization
via initialization lists.
- Translate declared subobjects and add to DXIL module.
- Fix bugs in subobject metadata serialization and disasm.
- Preserve original input string in root signature subobject.
- Remove unused IsHlslObjectType method from CGHLSLRuntime.
- Remove unnecessary collection of global strings in CGHLSLRuntime
- Change type printer to report 'literal string' for constant char arrays.
Fix crash caused by NonUniformSet being invalidated before use
Instead of using NonUniformSet, defer lowering of
NonUniformResourceIndex until last, then mark all GEPs that use
the incoming value.
This will mark all uses of the value used in NonUniformResourceIndex
as non-uniform, including indexing that was not marked non-uniform,
if it ends up being the same index. This matches the prior behavior,
and avoids loss of non-unifom metadata when merging GEPs, but could
be considered not quite correct, although the difference in indexing
would probably not have been intentional.
* Fix string interning for Subobjects
* DxilRuntimeReflection: Remove dependency on llvm/ADT/STLExtras.h
- This file is meant to be included in other projects that do not include
llvm headers or libs, so it must avoid those dependencies.
* Avoid unnecessary strlen operations in GetSubobjectString
* StringStorage/RawBytesStorage: use pair of unique_ptr and size
* Support obsolete semantics with -Gec flag
Shaders with obsoleted semantics now compile with -Gec flag. Each
semantic will trigger a warning and be remapped to the current name.
The obsolete semantics & remappings introduced in this change are:
POSITION -> SV_Position (VS output)
VPOS -> SV_Position (PS input)
COLOR -> SV_Target (PS output)
DEPTH -> SV_Depth (PS output)
* Switch to case insensitive compare; add tests
* Address code review feedback
* IDxcRewriter2 interface; change RewriterTest to report errors
Added IDxcRewriter2 interface with method RewriteWithOptions that allows
setting compiler flags for the rewrite. The current IDxcRewrite methods
use HLSL Version 2015 only. RewriteWithOptions allows changing that
and other compiler options.
Modified RewriterTest to report compilation errors. Changed data of existing
tests that started failing because of this change.
Fixed DeadLoopFail test.
Fixes#1525
This CL changes dxcapi.use.h back to what it was before the Linux-port work. It does, however, add a snippet into it to redirect FreeLibrary to dlcose, LoadLibraryW to dlopen, and GetProcAddress to dlsym for non-Windows platforms.
Some HLSL features does not have direct mapping to SPIR-V, instead
we emulate them using lots of SPIR-V instructions. Because they
can cause performance hit, the compiler warns by default. This
commit add an option, -Wno-vk-emulated-features, to let developers
turn off such warnings.
Added a new command-line option: -fspv-debug=<category>, where
category can be file, source, line, and tool, to give developers
fine-grained control of what debug information they want in
the generated SPIR-V code.
format: -fvk-bind-register <type-number> <space> <binding> <set>
Also created a short alias for it: -vkbr.
This option gives the ultimate manual control of descriptor
assignment. It requires:
* All resources are annotated with :register() in the source code
* -fvk-bind-register is specified for every resource
It overrules all other mechanisms.
It cannot be used together with -fvk-{u|b|s|t}-shift.
CreateBlobWithEncodingFromPinned is unlike any of the other
CreateBlob* functions in that its pText argument is a non-const
byte pointer. All the others use const void pointer for this
same parameter. Because the parameter is not const, in many cases
const pointers need to have their const qualifier cast off to be
used for this parameter in spite of the fact that the next level
call passes it to a const pointer. This produces a bunch of
warnings in places where its used.
By changing the variable to const void, there is less unnecessary
casting, fewer warnings, no change in behavior, and a clearer
external indication of the way it is used by the function from the
prototype.
Windows char conversion functions differ from the std methods in
a few ways. Windows ignores null terminators when lengths are
provided, length limits represent character lengths not byte sizes,
the current linux implementation ignores codepages, but the files
used in many HLSL tests are in iso 8859-1 not utf8. So some
differentiation is needed.
This solves all but one of the above problems. To limit conversion
by source characters instead of solely the size of the output
buffer, where necessary, a temporary buffer is created to hold the
source string that corresponds to the provided character length and
a null terminator is appended to make the std:: conversion stop there.
If there is a null terminator there already, this is skipped for
performance purposes. Such is often the case.
To respect the codepage parameter, it is converted to the corresponding
locale string and setlocale is used to temporarily set the locale
accordingly and set it back afterwards.
The aspect that is missed is that the Windows conversion functions
will happily blow past a null terminating character if the length
specified encompasses it. This is rare enough in general, and unlikely
in the way these functions are used in the source base.
corrects one errant non-breaking space character that found its way
into an HLSL file that the std:: conversion choked on. It didn't seem
to serve any purpose anyway.
Enable the UUID for IDxcLangExtensions so its interface can be
properly queried. Sundry const, semicolon, and override warning
eliminating changes that might have been included in the larger
corresponding changes had this test not been an afterthought.
Validate raytracing shader properties and RDAT blob part
- no signatures for ray tracing shader functions
- payload/params/attribute sizes are >= argument type allocation sizes
- RDAT is bit-identical to RDAT generated
- Minor fix for CS: should not have input signature elements either.
- update val version comment
- Prevent strange behavior with library target and entry point not empty.
- remove DXASSERT in LoadDxilMetadata (validation should catch this case)
Mostly required converting simple member function definitions
to include the TEST_F macro, which works for either gtest or taef,
but is optional for the latter.
Fixed an additional bug with the Linux implementation of CComPtr
which contained a single virtual function, which causes the compiler
to add a vtable pointer to each object created. This is fine most of
the time, but when it is cast to another kind of pointer and attempts
to increment it as that type instead of CComPtr, it doesn't account
for the bytes that the vtable pointer take up.
By simply removing the sole virtual qualifier on the destructor,
which was on the destructor, the vtable pointer is omitted and the
cast and increment works. The CComPtrBase destructor prototype is
shown to not be virtual in Windows documentation.
Add lib_6_x for offline linking only and -export-shaders-only
- Add lib_6_x target for offline linking use only
- Implement -export-shaders-only to limit function exports to shaders on lib targets
- Fix disable validation case downgrading Dxil as if valver is 1.0
This option reciprocates (multiplicatively inverts) SV_Position.w
after reading it from stage input in PS. This is used to accommodate
the difference between Vulkan and DirectX.
Heap allocation functions were previously partially implemented via
a few macros that provided some of the functionality.
This replaces those macros with a more expansive implementation of
the interface as defined in Microsoft documents. Simple structs are
used to store pointer and size implementation, enabling queries of
allocation sized from pointers, putting an upper limit on the total
amount of memory allocated by the given heap, retrieving a
default heap for the process, and freeing of all heap memory at the
time of heap destruction.
Most flags are unsupported save ZERO_MEMORY. Heap creation includes
an initial size that is meant to preallocate memory and make latter
allocations more efficient. This does not support that. Still for
the usages in the code base, this should appear externally to be
a fully functional implementation.
Includes a macro for CaptureStackBackTrace as well for CompilerTest
There are various points where system functions are called via
OS-specific wrappers beginning with llvm::sys::fs. As part of the
development of the HLSL shader compiler, some of the microsoft
versions were altered to call into a further abstraction via the
MSFileSystem interface. In certain cases, this allows non-existent
"included" files to be accessed with the outward appearance of being
real files. However, many of these changes were missed by the Unix
abstraction. This adds them in with the facilitator of adding a few
functions to the MSFilesystem abstraction that better match Unix
system calls.
Needed for various tests that provide fake source files to compile
and include fake headers from within HLSL code. Otherwise, these
files are not found because they are not on the filesystem.
The Unix-like methods are added because in cases where the current
filesystem abstraction just calls the system calls directly, the
windows-style accesses were cumbersome and incomplete. Tablegen
and some tests make use of these simpler implementations.
Several HLSL tests, notably VerifierTest, require intellisense classes
To get these working on Unix platforms, we need to add support.
This involves mostly exposing the instance creation, the UUID, and
adding the right headers.
Additionally adds definitions of a couple UUIDs that were declared
but previously not defined. These are also useful for HLSL tests.
* [linux-port] Win adapter changes for HLSL tests
A number of Windows-specific features are present in the HLSL tests
as well as driver features that had to be enabled to get those tests
running, notably intellisense classes.
This change adds to the existing WinAdapter and WinFunctions files
to include the features needed for these additions to driver and tests.
They cover a lot of ground, but mostly consist of memory allocation
and string changes, because you can never have enough ways to
allocate memory and manipulate strings.
An incidental change is to the existing CreateFileW implementation
to retry a file if the error indicated it was interrupted. This was
taken from the LLVM Path.inc implementation for Unix.
Eliminate phi on resources/handles for library
- ensure all handles use only one global resource
- replace resource path with indices into global resource
- replace resource alloca with index allocas
- AddCreateHandleForPhiNodeAndSelect now unnecessary, deleted
- error on resources in library function params or return
- mark entry clones as internal linkage since they could contain params
that are not allowed for libraries.
- fix MarkUavUpdateCounter - handle nested GEP
- disallow static resource use from exported library functions
- add/update tests
- detect undef resource paths that would otherwise be lost (FailUndefResource)
- set internal linkage for HL intrinsic function bodies so they aren't processed with user functions
- Rename legalize [static] resource use passes to DxilPromote(Local|Static)Resources
- Remove various dead code paths for resources, such as:
GenerateParamDxilResourceHandles, RemoveLocalDxilResourceAllocas,
lower handle cast, non-promotable error (may translate to index allocas)
- force resource GVs to external constant with no initializer
MultiByteToWideChar and WideCharToMultiByte were returning -1 on
failure, which caused failures to be ignored. When successful, they
were returning the number of characters converted excluding the
terminating null character because that's what the std:: functions
used for the implementation do. Both are inconsistent with the
original Windows implementations.
Simply adding one to the return value works in all cases save those
where the number of characters converted was limited by the parameter
In this case, the number of characters converted doesn't include
the terminating null. So no addition is warrented.
Added addition errors in keeping with the function descriptions.
The CW2A conversion class left out the terminating null character
when calculating the max size of the converted string.
This adds an isolated component *IDxcDxrFallbackCompiler* that is used by the D3D12 Raytracing Fallback Compiler. The component is responsible for linking together DXR shaders into a state machine capable of emulating function invocations including recursion, patching HLSL intrinsics, export renaming, and emulating Shader Record support. This component is compiled to a DxrFallbackCompiler.dll that gets consumedd with D3D12 Raytracing Fallback Compiler apps.
For deeper implementation details, I will be fleshing out lib\DxrFallback\readme.md
Some leftover work still required:
1. Currently the Fallback Layer tests compile to an exe that must be manually run. This should be refactored to conform with existing DXIL unittests.
The REGEX_H define prevented the inclusion of an Apple system header.
Not coincidentally, the same header regex_impl.h was derived from.
Prevent Apple warning by splitting macro args with an #else. Add HLSL
change comments too which were missing from the start.
Search for libdxcompiler.dylib instead of .so on Apple.
MacOS resolves symbols differently from Linux. Linux takes the first
valid resolution and then uses it wherever it is referenced. MacOS
more similar to Windows, favors symbols that are local to the
compilation unit.
This comes into play where functions set static globals such as
the malloc tls and the IMalloc contained there. The dxc binary
initializes these. For Linux, that's enough because all calls
resolve to the ones from this binary because they are encountered
first. MacOS calls that originate from within the dynamic library
resolve to the local functions, which don't have their globals
initialized. To initialize them, we need to do what is done in
DLLMain in DXCompiler.cpp.
Performing these calls results in duplicate initializations on Linux
unless the calls in question are given visibility hidden. This is
the default on Windows, but not on Unix OSes.
LLVM includes a DynamicLibrary class that works on multiple platforms
Rather than ifdefing between dlopen for non-Windows and LoadLibraryW
on Windows, using this class allows us to abstract platform differences
while using the same code.
Changed DxilLibInitialize to attempt to initialize dxil.dll so the
smart mutex inside the dynamic library object uses the same IMalloc
as the other ManagedStatic objects so they can all be freed in order.
Some interactions with the FileSystem are not used by any API and are
therefore not implemented for Unix at the moment. We have put assertions in
place to make it easy to spot if any of those methods are ever used.
Some code originally in llvm/clang was commented out, and was replaced
with Windows-specific implementation. In this change we bring back some
of those.
Also disabled some code that is not relevant any more.
And a few minor improvements towards cross-platform compilation.
This change completes the necessary adapter code needed to enable
cross-platform compilation.
Discussion on the design/implementation of these are in issue #1341 and
isuee #1342.
In addition to the adapter classes/structs, there exists usages of
Windows-specific functions in the codebase. In some cases, we won't need
to compile/run some of those methods on non-Windows platforms. For the
few cases where we do, we have implemented equivalent functionality in
WindowsFunctions.cpp.
Add shader mask and min SM to RDAT
- add OP::GetMinShaderModelAndMask to compute mask and min SM
- account for ops that can be translated during linking
- update PS ops that could be allowed in library functions
- update min SM with newer optional feature flags
Fix AV in TranslateHLSubscript for typed buffer with struct element
A number of warnings unique to release buildsi persist. Some of these
same errors were fixed earlier for debug builds.
DXASSERT_LOCALVAR still took a variadic macro on release builds.
Fixes 85 clang warnings
Expressions used only as conditionals for asserts were not properly
dummied out with (void) cast.
Fixes 7 clang warnings
Variables used only in asserts dummied out.
Fixes 27 clang warnings
Potential uninitilized uses of variables assigned in switches/ifs.
fixes 5 clang warnings
put braces around empty else statements
11 gcc warnings
- add OP::GetMinShaderModelAndMask to compute mask and min SM
- account for ops that can be translated during linking
- update PS ops that could be allowed in library functions
- update min SM with newer optional feature flags
This will give users a way to know the commit from which a DXC
library/binary was built from, so that they can compare different
libraries/binaries and report bugs easily.
dxc.exe now also prints the commit info using this new interface
method. It uses the following format:
(dev;<commit-count>-<commit-hash>)
In this CL, we have introduced the "WinAdapter.h" header file.
This file includes declarations and definitions that will enable compilation of the code base on non-Windows platforms. These include:
* Disabling SAL annotations for non-Windows platforms.
* Defining some Windows-specifc macros that are not defined on other platforms.
* Defining Windows-specific types that are used throughout the codebase for non-Windows platforms.
This file is currently not complete. As we go forward, we will add more classes/structs/type definitions to this file for specific purposes. The contents of this file is completely disabled (#ifdef'd away) for Windows, and do not affect Windows compilation in any way. For the most part, this is our way of enabling compilation of the codebase on non-Windows platforms without making invasive changes to the existing cpp files, and without hindering any future development. Moreover, I have added the #include of this file for all the locations that will need it for successful Linux compilation.
- Make sure entry/signature data is available once SetShaderModel() is
called for shader targets (not for library).
- Add bUseMinPrecision flag to SetShaderModel() since the global switch
is needed at that point.
- Remove SetUseMinPrecision() since it should be set in SetShaderModel()
and should not be changed.
- Modify DxilModule loading to initialize UseMinPrecision flag correctly
for entry signatures, and fix signature copy constructor.
- Remove DxilModule level entry property duplicates, use the
DxilFunctionProps instead, except where useful (GS stream mask),
and make sure these are always available/in-sync.
- Fix various code accessing properties that don't match target
- Fix various ordering issues causing bugs in properties
- Don't duplicate functions to strip them of parameters when they have
already been stripped (linker hits this).
Finalize OpCode changes for Dxil 1.3 / SM 6.3
- Rename CreateHandleFromResourceStructForLib to CreateHandleForLib
- Add PrimitiveIndex
- Add final NumOp[Codes|Classes]_Dxil_1_3 values
- Fix legal shader stage set for PrimitiveID
Regardless of the order of variables placed in a constructor
member initializer list, the variables are initialized in the
order they are listed in the class or struct. This can cause
issues where the assignment of one variable depends on the other.
For this reason, a warning is produced if the member variables
are listed in a different order. This change moves the member
initializer lists into the order they were being initialized.
Fixes 23 clang and 24 gcc warnings.
Initialization of members of a larger aggregate type may benefit
from braces to clarify which variables are expected to go where.
At the same time, scalars should not be so segregated. This places
or removes such braces wherever appropriate.
Fixes 144 clang warnings
pParameters and pStaticSamplers were both marked const in spite of
the qualifier being cast off at nearly every usage. So they lose
their const. If you're the target of memcpy or memset, you are
not const.
Fixes 12 gcc and 12 clang warnings
Integration fixes.
- Remove unused constructor from FunctionTableReader
- Remove hlsl::Exception dependency from DxilRuntimeReflection.inl
- Fix field size sizeof(RuntimeDataFunctionInfo) in legacy loader
- Add __cdecl to raw_ostream::operator<<(...) for ios_base change
- Move string constant definitions to from CGHLSLMS to DxilModule, since
they are required by HLSL.lib, and that shouldn't depend on clang CodeGen.
- Revise DxilRuntimeData for versioning support and robustness.
- Rename DxilRuntimeReflection desc structures.
- Add size checking to DxilRuntimeData/Reflection
- CheckedReader/CheckedWriter to catch potential problems
- Backward compatible reader for prerelease RDAT format
- Rewrite IndexArraysPart and avoid duplicate index arrays
- Always start string buffer with null for empty string at offset 0
The various DXASSERT macros produced a number of different warnings.
The most obvious was providing an empty string for a macro expecting
a message. Substituting in the _NOMSG macro fixes this.
Fixes 13 clang warnings.
A trickier problem for GCC and Clang is a variadic macro that is
provided no variadic arguments. Both Clang and GCC will avoid
errors by using the ##__VA_ARGS__ non-standard behavior, but it's
not an ideal solution. It turns out there aren't that many DXASSERT
macros that actually take variadic arguments. So I added an _ARGS
variant and replaced those instances with that. An ancillary benefit
is that non variadic asserts can present a message with the assert.
Fixes 583 clang warnings and 582 gcc warnings.
In what I expect was an earlier attempt to overload the DXASSERT
macro, it was defined twice each time with a different number of args.
Sadly, macros can't be overloaded, so this just produced more errors.
Fixes 69 clang warnings and 69 gcc warnings.
Finally, I removed the user declaration of std::error_code because
it is defined in a standard header <system_error>. Because clang was
including that header some places, it saw it as a redefinition. But
the Linux version is a struct and windows is a class. So nothing we
put here can make everyone happy. It doesn't really fit with this
change except for in proximity but doesn't fit anywhere else either.
Fixes 52 clang warnings
A most pervasive warning from a minimal amount of code. The Wintypes
header included an anonymous struct in the form of LARGE_INTEGER
which is specified by Microsoft, but the actual MS header includes
another struct in the union for environments that don't want
anonymous structs. Given the minimal usage, I think this can
qualify. Microsoft's own headers will exclude the anonymous struct
in favor of the named one if NONAMELESSSTRUCT is defined before
windows.h. To prevent future uses of the anonymous part of LARGE_
INTEGER, this defines that in WinIncludes.h.
Fixes 148 clang and 148 gcc warnings.
There are a lot of cases where const is carelessly cast off. Some
of them lead to really dramatic changes in multiple interfaces to
fix properly. This is instead a subset of cases where const was
disrespected, often for no apparent reason at all. likely just a
mistake in type casting.
Fixes 88 gcc and 88 clang warnings.
Clang produces a warning when a method that overrides another is
not marked with the override keyword. Adding the keyword has no
effect on overridden functions, but it is good practice to mark
all methods intended to override as such because if they fail to
do so, the keyword will produce an error.
Fixes 78 Clang warnings.
There is the possibility of a few variables being used without
initialization, which causes some warnings.
Fixes 2 clang warnings
GCC worries that assigning a variable to the result of an
assignment of that same variable may be undefined. It's a silly
thing to do anyway. So it's removed.
Fixes 1 gcc warning.
GCC is quite sure that Interp is too small to hold values of up
to 9 with only 4 bits. Perhaps because the RHS is a variable of
8 bits.
Fixes 47 gcc warnings
Terminating classes may create a muscle memory for typing a
semicolon after a left-justified closed brace. Whatever the reason
to GCC's chagrine there were many functions ended with unnecessary
semicolons. Each one caused a warning. This removes them.
Fixes 285 GCC warnings
GCC has what seems to me to be a bug where it disapproves of the members
of a member struct of class OP having the same name as a type imported
by "using". It's the only compiler with the issue and I don't see any real
conflict, but it's easier to placate it than to fix GCC. GCC
claims its ambiguous. Making the member variables lowercase fixes
this neatly.
Removed some PointerStatus:: prefixes from references within the
class itself which upsets GCC.
HLSL changes excluded Transforms/Instrumentation directory entirely
including BoundsChecking. However it didn't excluder one reference
to a function declared there in BackendUtil that fails to link on GCC.
The function that references the missing BoundsChecking function
is static and not used anywhere with current settings. So it's not
considered a symbol to be linked by MSVC or clang, but GCC insists.
InterlockedIncrement and InterlockedDecrement are atomic operations to
increment or decrement the value pointed to by the pointer parameter and
return the same.
There are already platform abstractions available to LLVM in Atomics.h.
This change leverages them to implement the same functionality in a
platform-agnostic way. Mostly it's just a matter of swapping
InterlockedIncrement for AtomicIncrement.
GCC5 has a few limitations that later versions do not. The
most pervasive issue is the confusion of scoped operators on enum
names when the name of the enum is shared with a local variable.
The only other issue concerns intialization of a union containing
structs and how many curly braces is enough.
Add -exports export1[[,export2,...]=internal][;...] for lib target
- Remove IDxcLinker::LinkWithExports, use -exports option instead
- Added renaming/cloning support to linker
- Added validation and tests
- Fix AV in DxilPreparePasses when patch constant function is shared
ID3D12LibraryReflection:
- Fix shader input type for typed buffer and dimension for tbuffer
- Deterministic function order by name
Just a few warning conditions in a handful of common headers
was producing pages and pages of warning messages. By fixing them,
the output spew is reduced significantly.
Static variables in a header produce a separate variable for every
file that includes it. In many of these, these variables were unused.
Instead, variables used in multiple files should be defined only
once in a source file and declared as extern in the header.
Many unsigned variables were checked to be greater than zero and less
than 3 in HlslTypes.h. If they were somehow assigned a negative
value, it would appear as much greater than 3 in unsigned form.
So we can just check that they are less than or equal to 3.
In Type.h, parammods_begin() was casting its return value to the
proper pointer, but without const, producing an error. The best
part is that the function returned a const pointer. The cast was
just wrong. Adding const silences the warning.
Added -Wno-vk-ignored-features to suppress warnings on features
that are ignored because of no Vulkan support. Examples include
cbuffer member initializer.
1. Change llvm.dbg.contents into dx.source.contents.
2. Fix crash when SemanticIndexVec is empty.
3. Fix case Utf8StringVector is reallocated in MainArgs::operator=(const MainArgs &other).
- removes stl dependency from header
- change namespace from hlsl::DXIL::RDAT to hlsl::RDAT
- use enum class for RuntimeDataPartType to avoid name collisions
- bit of cleanup
MSVC is very forgiving of crimes committed in templated functions
and methods of templated classes that never actually get instantiated.
Consequently, the fact that the public variant of Find in SpanAllocator
referenced an unknown variable, seemingly as a copy paste error
was left unnoted. Other compilers are not so magnanimous.
The call to the private version of Find() takes an iterator. The
public Find() searches through the known spans to find the one just
past the range requested by the current pos and size. If it reaches
the end, then the span can be placed at the end safely. If it
fits before some other span, overlap needs to be checked. If the
new span would overlap it's neighbor, another location is needed.
It makes sense to start looking for that location after the end
of the neighbor whose span you just treaded on. So passing the
next iterator into the private Find() makes sense.
Replaces -i64 suffixes with -LL which are said to be equivalent by
MSDN. Both should result in 64 bit behavior.
Additionally, I dropped a superfluous repetition of the class name
on the CheckVectorConditional method of HLSLExternalSource. By all
rights, it should be its own change, but it gets lonely so it came
along for the ride. The repeated class name made some compilers
unhappy and losing it doesn't bother anyone.
The DxilSigPoint.inl file depends on an esoteric distinction in how
different preprocessors expand macros included as parameters to
other macros.
When a macro receives another macro as its argument, different
preprocessors expand these parameter macros differently. MSVC
defers expanding the macro until the macros taking parameters
have been completely expanded. GCC and others expand the parameter
macro just as soon as it can.
At issue is the expansion of the three _## macros: _41, _50, and
_61 which expand to ,#,# respectively. These macros can be passed
into an initial macro that plugs them directly into a function that
takes multiple parameters without issue with any preprocessor.
Regardless of when the _## macro is expanded, it will be interpretted
correctly as passing multiple arguments to the function.
If the _## macro is passed into a parameterized macro that passes
it to another such parameterized macro that takes a single argument,
preprocessor inconsistencies come into play. If the preprocessor
expands the _## macro immediately, instead of passing the parameter
macro directly into the second paramaterized macro, the expansion
of the parameter macro will be passed in, which will be
interpretted as passing multiple arguments to a single-parameter
macro and an error will reseult. If the _## macro is not expanded
until the end, then it is expanded only for the final multi-
parameter function call as the author expected.
The simplest example of this is something like this:
define AB 1,2
define SEC(args) (args)
define FIRST(args) SEC(args)
FIRST(_AB);
A preprocessor that delays expansion of _AB will expand thus:
FIRST(_AB) => SEC(_AB) => (_AB) => (1,2)
whereas an early expanding preprocessor will expand thus:
FIRST(_AB) => SEC(_AB) => SEC(1,2) => ERROR! SEC only takes 1 arg!
Some of the details of the example might be slightly off, but the
important thing is that the parameter gets expanded at different
times.
The solution to this is to complicate this already long macro.
The middle macro that introduces the opportunity for the _##
macros to be expanded early is DO_ROW, which doesn't do much
except allow for the larger data entry macro, DO_INTERPRETATION_TABLE
to be a bit shorter and simpler. By eliminating DO_ROW and
shortening the DO macro to D, the large table specifying behaviors
is only a little wider and harder to read and it no longer depends
on this inconsistent element of preprocessor behavior.
(I really hope that made sense. I wish I could come up with a
simpler explanation or solution. If any have suggestions for
either, I would be happy to adopt either. Feel free to email me.)
No functional change. Inching further toward cross-platform.
- Add all resources up front
- Add resource map to look up DXIL_RESOURCE ptr from Class&ID
- Add resource pointers looked up to function's resource ptr list
- Use pointer to function's resource ptr list, rather than the original
array used for global storage.
CDxcTMHeapPtr templated with 2 classes does not exist.
A code block has also been moved so that the caller function knows about
the existence of the callee function. Previous code order is not allowed
by some compilers.
Add -auto-binding-space option enabling auto-binding for lib targets
- Added DxilAllocateResourcesForLib pass to allow auto-binding any unbound
resources in a DxcOptimizer pass.
Avoid inserting other insts before allocas in several passes.
- Improve code that creates allocas/non-allocas to keep allocas first
- Add helper functions to dxilutil
- Use name AllocaBuilder for CreateAlloca for clarity
GLSL as a shading language does not define the layout rules for
std140/std430; the OpenGL graphics environment defines that.
This is also more consistent with -fvk-use-dx-layout.
DxilRuntimeReflection uses RDATTableReader to obtain information about the library. InitFromRDAT call will allocate all memory required to get DXIL_LIBRARY_DESC.
* [spirv] Add -fspv-target-env command line option.
The valid values for this option currently are:
vulkan1.0
vulkan1.1
If no target environment is specified, vulkan1.0 is used as default.
Added FeatureManager to record all extensions specified from
the command-line and emit error if trying to use one not permitted.
Added command-line option -fspv-extension= to specify
whitelisted extensions.
This commit uses the HlslCounterBufferGOOGLE decoration to link
the main RW/Append/Consume StructuredBuffer with its associated
counter buffer. It also uses HLSLSemanticGOOGLE to decorate
stage IO variables with their semantic strings from the source code.
- Improve code that creates allocas/non-allocas to keep allocas first
- Add helper functions to dxilutil
- Use name AllocaBuilder for CreateAlloca for clarity
Based on GLSL std140/std430 layout rules, relaxed layout allows
using vector's element type's alignment as the vector types's
alignment, so that we can pack a float value and a float3 value
tightly. This is the default right now.
Also add an option, -fvk-use-glsl-layout, to turn off the relaxed
layout for vectors and use conventional GLSL std140/std430 layout
rules.
The fxc compiler evaluates both operands before performing the
token-pasting operation. But the default Clang way is to follow
C standard, which token-pastes operands as-is, without any
pre-expanding.
This commit adds support for the fxc behavior via the command
line option: -flegacy-macro-expansion.
Fixes https://github.com/Microsoft/DirectXShaderCompiler/issues/1005
2. Not SROA on resource.
3. avoid unpack for resource.
4. Use value type for createHandleForLib.
5. Move all scalar global resource load to the beginning of function.
This will stop other pass do optimizations on the loads of scalar global resource.
6. Not remove local resource for lib.
- Keep track of patch constant functions for later identification
- functions that require input/output signature processing identified
with IsEntryThatUsesSignatures
- update lib_rt.hlsl intrinsics and naming