Fix compiling issues in x86 configuration when using a target other than x64
The script didn't set the -A parameter that vs2019 depends on in any but the x64 case
Additional simplifications and documentation to configandbuild function
Add new options to hctbuild that enable passing in additional arguments to cmake invocation
and also cmake customization hooks that will be included at the beginning and end of DXC's CMakeLists.txt.
Rework library search in FindD3D12.cmake and add one more library to look for.
Fix case in FindDiaSDK.cmake - fixes a cmake warning.
* Use Attribute to designate wave-sensitive intrinsics
This adds an intrinsic attribute to indicate wave-sensitivity that can
be indicated in gen_intrin_main.txt. This and other attributes are
passed along through function representations and lowerings. The
wave-sensitivity needs to be maintained specifically through SROA passes
since it is used by the CleanupDxbreak pass that comes after them.
Specifically this is done to allow extension intrinsics to indicate
wave-sensitivity, but the same mechanism is now used for builtin
intrinsics.
Most intrinsics get a mostly nameless function to represent them between
Codegen and dxilgen. This allows any that have the same prototype to
share the same function. For wave-sensitive intrinsics, they need a
different function or else the attrubute would be similarly shared with
intrinsics matching the prototype. So a minor change is made to their
function names to prevent this.
Adds testing for all these ops and a dummy extension one.
* Skip copy-in/copy-out for constant global variables.
* Enable copy for noinline.
TODO: analysis for global variable alias with parameter.
* Use SetVector and skip resource when copy.
* Disable mayAliasWithGlobal because optimization already covered case not replace when have alias.
When replace const global to a normal value has store, mark it non-constant.
* Gather should not be considered a gradient operation.
* Mark WriteSamplerFeedback[Bias] as gradient
* Remove SampleCmpLevelZero from DxilConvergent
* Add WriteSamplerFeedback[Bias] to DxilConvergent
* Conditionalize breaks to keep them in loops
This introduces dx.break, a temporary builtin that is applied as a
condition to any unconditional break in order to keep the basic block
inside the loop. Because it remains in the loop, operations that depend
on wave operations inside the loop will be able to get the right values.
Such builtins have to be added at finishcodegen time or else clang
throws an error for an undefined function. Consequently, the creation of
these is split in two. First the branch is created with just a
constant true conditional. Then at finishcodegen, it is converted to the
result of the dx.break() builtin.
By using the result of a temporary builtin function, the optimization
passes don't touch the false conditional like they might if we started
with the global constant.
Normal break blocks don't need this conditional, but we don't know that
at code generation. So a later pass identifies break blocks with wave
sensitive operations that depend on wave ops that are inside the loop they
are breaking out of and preserves those conditionals while removing all
the rest.
As part of dxil finalization, the dx.break() function is removed and all
branches that depended on it are made to depend on function-local
loads and compares of a global variable dx.break.cond.
The DxBreak Fixup pass depends on dxil-mem2reg. It is placed immediately
after to allow as many optimizations to go as they would without this
change in shaders that don't have any wave ops.
* Adds the DxcPixDxilDebugInfo interface and friends.
* Modifies the entrypoints to require InParam/OutParam for pointers, as well as CheckNotNull them
* Removes S_FALSE for happier Jeff
* Fixes broken test
* returns E_POINTER for nullptrs
* Returns S_FALSE from UnAlias for non-aliasing types.
* fails GetName for arrays
* Addresses CR comments
- Fix output naming for StringRef
- Support /Fc when /Fh is provided
- Fix root sig strip flag
- Create root sig stream output
- Validate -Frs root sig output for signing with DXIL.dll
- Move outputs under valHR success branch
- Add test for -Fre, -Fsh, -Frs, and -Fc with -Fh
- Rework hcttestcmds.cmd completely: checks a lot more stuff and is way more robust,
while being much easier to read, add to, and maintain, I hope.
This is a pass for PIX that adds instructions to write mesh shader output (vertices and indices) to a UAV for later ingestion by PIX in order to present a view of that output.
When run on a fast system with a lot of threading, the custom cmake
commands to generate version include information ran into a problem
because the two include files both depended on the same intermediate
file for some builds. As a result, when the first thread started writing
to it, the second thread couldn't access it if it started to try before
the first finished. The result was an empty header and no defined
version macros
By elminating the common intermediate file and giving each output file
their own intermediate file, the generation should proceed no matter how
many threads and unnecessary recompilation won't result when
regeneration results in no changes.
To facilitate this, I've added a cmake macro that handles both cases.
The peculiarities of how they are generated are all handled outside the
macro.
The original generation of HLSL macros containing version information
used the same generated header file as the RC file. This was fine except
when building in xbtools where the version information in the RC is
overridden to match the GDK.
To maintain the DXC macros as communicating the version of the DXC
component to better enable workarounds based on that information, the
same generator script is used, but the header file is separate and
doesn't abide by the fixed version location provided via cmake defines.
This way it will always contain DXC information while the RC file can
continue to contain the version information as it did before.
To allow for the case where the headers are not the same, a preliminary
version.gen file is created according to the dxc version requirements
and copied to the dxc header. In most cases, it is copied to the RC
header as well. When an explicit path is specified for the fixed
version, the DXC version uses the "offficial" generated header and the
RC file gets the version specified by the file in the given location.
Reverts the changes to the target_version_vertex.hlsl test since it
should always succeed now.
Add GetResourceFromHeap for hlsl.
For Dxil, add CreateHandleFromHeap and AnnotateHandle.
All handles ( createHandle, createHandleForLib, createHandleFromHeap ) must be annotated with AnnotateHandle before use.
TODO: add AnnotateHandle for pix passes.
cleanup code about resource.
* HLSL test infrastucture and other refactoring
Refactor common test infrastructure code into HLSLTestLib
Enable invocation of fxc and other executables via // RUN: commands in test files
Add latest d3dx12.h to include/dxc/Support and remove two other outdated copies
Improve DXIL container header validation on load
New helper classes DxilContainerReader and FixedSizeMemoryStream
Move LoadSubobjectsFromRDAT to DxilSubobjects.cpp
Co-authored-by: Greg Roth <grroth@microsoft.com>
* Add HLSL macros with version information
Adds three sets of version information as defined macros available
to the shader.
The DXC version is included as __DXC_VERSION_{MAJOR|MINOR|RELEASE|COMMITS}
The values of these are set at compile time and derived from version.inc
RELEASE and COMMITS information varies according to the criteria
described in gen_version.py.
The HLSL Version is included as __HLSL_VERSION. It may be taken from
the -HV argument with which dxc was invoked, derived from other
arguments, or the default.
The shader target stage and version information is taken from the
-T argument with which dxc was invoked. For the stage, defined
constants representing each stage are defined at compile time that
correspond to the enum values used internally. The shader stage
specified by -T is determined at runtime and set to the corresponding
enum value. The major and minor versions are similarly derived from
the -T argument at runtime.
The runtime defines are set in InitializeStandardPredefinedMacros()
This allows them to be set just once and the same way for any way
that dxc is invoked including testing.
This makes some changes to how the version.inc file is generated and
found in cmake files in order to generate it regardless of the need
for a .rc resource file as it was before.
A series of hlsl tests are added that test for the defines presence
and values in order to add versus subtract with checks for the adds.
Since many of the defines are determined by arguments at runtime, a
few different tests were needed, but as much consolidation as possible
was done to reduce the number of individual test runs.
According to the description at the top of gen_version.py and the
commit that introduced it, the "Official build" versioning is meant
to add 10000 to the number of commits since the last official release
when the build is in the master branch. However, because of the use
of "is" instead of "==", the comparison between the variable and
the string literal in the python script was never succeeding. The
"is" operator only succeeds when it compares two variables that
have been assigned the same variable or one has been assigned to
the other. For comparisons of contents, "==" should be used.
After initial generation, subsequent invocations of cmake will
only alter the defines that are specified by that command. Any
defines that were set previously retain their earlier settings
by being stored in the CMakeCache.txt in the build directory.
hctbuild.cmd allows specifying certain defines when arguments are
passed to it, but many of these do not allow resetting the defines
they set. So if you define an official versioned build, every
subsequent build with use the official version unless the cmake
cache is edited or the whole build directory is wiped clean. This
is true of several other settings.
This change sets the defines hctbuild cares about every time it is
invoked regardless of arguments. If the official build is specified
the official define is set to ON. If it isn't, it is set to OFF.
This way these flags can easily be switched on the same build dir.
New Rules:
outer type may be: [ptr to][1 dim array of]( UDT struct | scalar )
inner type (UDT struct member) may be: [N dim array of]( UDT struct | scalar )
scalar type may be: ( float(16|32|64) | int(16|32|64) )
- Disallow pointers to pointers, and pointers in structs
- Disallow multi-dim arrays at top-level, but allow within struct
* Erase dead region
* Pass dependencies
* Simpler heuristic, only checking that Begin dominates End and End post dominates Begin
* Small cleanups. No longer iterating whole block to find PHIs
* A few optimizations. Fixed infinite loops caused by self-loops
Fixes problems like:
- extra AllocateRayQuery calls, or improper location (for lifetime)
- proper array support
- static global RayQuery
This RayQuery allocation changes:
- Add a constructor to RayQuery
- Set init sequence to use constructor in InitializeInitSequenceForHLSL, just for RayQuery
- For array: modify EmitCXXAggrConstructorCall to
- loop over index instead of pointer to allow SROA of RayQuery struct
- mark the loop as HlslForceUnroll
- Add hidden flag for HL intrinsics to allow internal intrinsic not produced
by HLSL directly - mangle name so it can't be matched during parse.
- Add hidden HL AllocateRayQuery intrinsic
- Translate constructor call on ptr to HL AllocateRayQuery intrinsic call producing handle i32 during FinishCodeGen
- Translate RayQuery ptr to load i32 handle value for intrinsic methods during SROA_HLSL
- Flatten RayDesc for TraceRayInline
(otherwise /Od fails validation since RayDesc type may still be present)
- No longer skip RayQuery for SROA_HLSL
- Update lowering for AllocateRayQuery, i32 handle, and flattened RayDesc
- Remove ReadNone attribute from AllocateRayQuery to prevent incorrect CSE optimizations
- Manually cleanup unused RayQuery allocations
* Adding test fixes and more support for SM6.5 WaveMultiPrefix functions
1) Ensure wave lane-id’s are sorted when accumulating result
2) Fix made for HLSL in ShaderOpArithTable.xml to ensure the input aligns with what is being tested
3) Added support for the “UBit” version for all of these tests.
Two test options, -Qstrip_reflect_from_dxil and -Qkeep_reflect_in_dxil
for making tests work with reflection removed, since many tests are relying
on main module disassembly-reassembly between test phases and reflection
metadata will no longer be present there. The strip option is for the
few cases where tests don't want the reflection kept in DXIL by default.
Validator no longer requires function annotations for no reason.
Fix places where remove global hook was not being called when functions
were removed manually from the list.
StripReflection now deletes function annotations, unless targeting lib or
old validator that required them. Preserve global constructor list and
add annotation for 1.4 validator. The global hook fixes were required
here, otherwise annotations would refer to dead functions during linking.
Struct annotations may not be removed in library case when they still need
translation to legacy types.
Allow missing struct annotation when not necessary to upgrade the layout.
Preserve usage in reflection by upgrading the module, emitting metadata,
cloning for reflection, then restoring validator version and re-emit
metadata.
Fix size for 16-bit type for usage and reflected size.
Make various batch reflection tests require validator 1.5, since these
tests rely on module disassembly->assembly, which will not preserve extra
usage metadata for reflection in 1.4.
Include reflection part in IDxcAssembler, but don't strip from module,
since there are no options to prevent this from breaking a lot of tests.
Don't strip reflection from offline lib target.
- Align coord dimensions with Sample for future flexibility and alignment
- Fix ddx and ddy arguments to support the correct number of dimensions
- Rewrite lowering, using SamplerHelper
- Clean up SampleHelper a bit, adding additional asserts/checks
- Set components to zero for default offset, not undef
- Compute mesh payload size before final object serialization
- During CodeGen for MS based on payload parameter
- During CollecShaderFlagsForModule for AS based on DispatchMesh call
- Store payload sizes in corresponding funtion properties, serializing
these properly for HL and Dxil Modules
- Use payload sizes from function props for PSV0 data during serialization
- Validate measured and declared payload sizes, don't just fill in
properties during validation
- Fix Wave/Quad allowed shader stages, enabling Quad* with CS-like models
- rename payloadByteSize members to payloadSizeInBytes
- Add GetMinShaderModelAndMask overload taking CallInst for additional
detail required to produce correct SM mask for Barrier operations
- Update the HLSL syntax from FeedbackTexture2DMinLod to FeedbackTexture2D<SAMPLER_FEEDBACK_MIN_MIP>
- Update DXIL to only have two UAV types for FeedbackTexture2D[Array] and use an extra metadata field to distinguish between the sampler feedback type.
The tests were manually added to ShaderOpArithTable.xml in PR #1867,
but ShaderOpArithTable.xml should be generated by hctdb_test.py script.
No actual changes to the test data, just the order of the tests is different
because of rearranging done by the script XML processing.
Added a check for shader model 6.5 so the tests will be skipped on lower
shader models.
- base of struct should always be aligned - or internal bug
- offset for array member must always be aligned - (new) validation error
- alloc and verify struct layouts even when not array field
- out of bound check would have missed OOB on last array element
Added generating of new version for each DX Compiler build.
There are 3 kinds of version:
1. **Official build**
Built by using `hctbuild -official`. The version is based on the current DXIL version, latest official release and a number of commits since then. The format is `dxil_major.dxil_minor.release_no.commit_count`. For example a current official version would be something like `1.5.1905.42`. The latest release information is read from `utils\version\latest-release.json`. The `1905` corresponds to `dxil-2019-05-16` release branch and `42` is the number of commits since that release branch was created. For master branch the `commit_count` will be incremented by 10000 to distinguish it from stabilized official release branch builds. So the current official version of master would be someting like `1.5.1905.10042`.
2. **Dev build**
Build by using `hctbuild` with no other version-related option. The format is `dxil_major.dxil_minor.0.commit_count` where commit_count is the number of total commits since the beginning of the project.
3. **Fixed version build**
Build by using `hctbuild -fv`. Enables overriding of the version information. The fixed version is read from `utils\version\version.inc`. Location of the version file can be overriden by `-fvloc` option on `hctbuild`.
In addition to the numbered version the product version string on the binaries will also include branch name and last commit sha - `"1.5.1905.10042 (master, 47e31c8a)"`. This product version string is included in `dxc -?` output.
Adds support for templatized RWRawByteBuffer.Store<T>. To avoid SROA making us lose the original layout of any struct arguments, a new pass runs before SROA and breaks down such cases into per-element stores. So better be careful with the likes of buf.Store(0, (int[65536])0);...
-Qstrip_reflect would reserialize the root signature, leading to
validation failure #2162. Fixed by moving root sig to writer to clear
from module and prevent re-serialization to metadata.
Fixed -Qstrip_debug with -Zi and no output location still embeding
debug module.
- New -Qembed_debug is required to embed PDB in shader container
- -Zi used without -Qembed_debug will not embed debug info anymore,
and will issue a warning from CompileWithDebug().
- When compiling with Compile() and -Zi, -Qembed_debug is assumed
for compatibility reasons (lots of breaks without it)
- In dxc and CompileWithDebug() -Fd implies -Qstrip_debug
- Debug name is based on -Fd, unless path ends with '\', meaning you
want auto-naming and file written under the specified directory
- Debug name always embedded when debug info used, or -Fd used
- -Fd without -Zi just embeds debug name for CompileWithDebug(),
still error with dxc, since it can't write to your file.
- If not embedding debug info, it doesn't get written to the container,
only to be stripped out again.
- Fix padding for alignment in DebugName part.
- Default to DebugNameForBinary instead of DebugNameForSource if no
DebugInfo enabled
- Also fixed missing dependency on table gen options from libclang
* Allow clip/cull elements to be declared as array [2]
- This approach fixes validation and packing to handle this case.
- There could be implications to runtime ViewID validation
- fix some issues found in packing related to rowsUsed result from Pack
functions. Make these return 0 on failure, instead of startRow.
- Split PackNext into FindNext and PackNext that uses it for greater
flexibility.
Some large shaders exhibit a behavior where phi nodes are created for resources, in which one of the possible incoming values is undef. This gets cleaned up later such that there are no undef resources left. However the fail-undef-resources pass has already failed compilation by that point. The fix is to change that pass to invalidate-undef-resources and replace the undefs with a special invalid value, such that we can produce an error later if still necessary, when optimization passes have been run such that temporary undef resources have been eliminated.
Those tests reach into HLSL, CodeGenHLSL, Samples and even quick-test directories to find the files they need. With this change, I move or copy files around to make sure that these test files are attributable to their consumer. I copied rather than moved files only in the case where a test in code explicitly reached into quick-test or Samples, because that essentially means that the file serves two different tests since they are also run as batch.
These new DXIL instructions are added to SM 6.5. The valid operations
for <Op> are:
- BitAnd
- BitOr
- BitXor
- CountBits
- Product
- Sum
In HLSL, these are exposed as:
uint4 WaveMatch(<type> val)
<type> WaveMultiPrefixBitAnd(<type> val, uint4 mask)
<type> WaveMultiPrefixBitOr(<type> val, uint4 mask)
<type> WaveMultiPrefixBitXor(<type> val, uint4 mask)
uint WaveMultiPrefixCountBits(bool val, uint4 mask)
<type> WaveMultiPrefixProduct(<type> val, uint4 mask)
<type> WaveMultiPrefixSum(<type> val, uint4 mask)
In DXIL, these are exposed as:
[BitAnd,BitOr,BitXor,Product,Sum]
%dx.types.fouri32 @dx.op.waveMatch.T(i32 %opc, T %val)
T @dx.op.waveMultiPrefixOp.T(i32 %opc, T %val, i32 %mask_x,
i32 %mask_y, i32 %mask_y, i32 %mask_z,
i8 %operation, i8 %signed)
[CountBits]
i32 @dx.op.waveMultiPrefixBitCount(i32 %opc, i1 %val, i32 %mask_x,
i32 %mask_y, i32 %mask_y,
i32 %mask_z)
Scalarization of vector types occur as per the existing wave intrinsics.
For WaveMatch, the match is performed on each scalar and the results
are combined with bitwise AND. For WaveMultiPrefix, the operation is
performed on each scalar and combined into an aggregate.
- add addrspace stress test and addrspace inst test
- add ll test for cleanup pass targeting paths that may be difficult to
hit from HLSL
- fix handling of base-class bitcast cast for structured buffer
- handle addrspacecast in SimplifyBitCast during CodeGen
The old test values and precision was the same as the 32-bit variants of the tests. This does not work for 16-bit floating point numbers and especially for trigonometric operations.
The ShaderCompatSuite and Unroll tests randomly supported being used to batch run file-check on directories. This change rather updates hcttest file-check to support directories as well as individual files.
* Add root signature flags
- Add LOCAL_ROOT_SIGNATURE flag in RootFlags
- Add DESCRIPTORS_STATIC_KEEPING_BUFFER_BOUNDS_CHECKS flag in DescriptorTable
- New DxilRootSignatureCompilationFlags argument on Compile/ParseRootSignature
to specify whether the root signature is local or global.
* Fix root signature flag issues, add cmd test for root sig from define
- Fix improper bool early promotion to int in CheckBinOpForHLSL
- Modify VerifierTest.cpp to re-enable parsing by VerifierHelper.py
- Update/fix VerifierHelper.py
- Update/fix various tests, including adding expected fxc comments
- Fix source locations for ValidateCast uses
- ValidateCast will use expression location if no loc provided
- TODO: Fix error on explicit truncation cast that should be ok, as long
as it's not an l-value "cannot truncate lvalue vector/matrix"
See binop-dims.hlsl
- TODO: Fix L-Value casts in the future
Remove bad optimization for reusing loads in LegalizeResourceUse
- also update non-uniform detection to catch when the handle is marked,
not just the load.
Validate raytracing shader properties and RDAT blob part
- no signatures for ray tracing shader functions
- payload/params/attribute sizes are >= argument type allocation sizes
- RDAT is bit-identical to RDAT generated
- Minor fix for CS: should not have input signature elements either.
- update val version comment
- Prevent strange behavior with library target and entry point not empty.
- remove DXASSERT in LoadDxilMetadata (validation should catch this case)
Disallow ignore/accept hit intrinsics in lib export functions
- still need to add general validator part to restrict intrinsics for
library functions and shaders.
Eliminate phi on resources/handles for library
- ensure all handles use only one global resource
- replace resource path with indices into global resource
- replace resource alloca with index allocas
- AddCreateHandleForPhiNodeAndSelect now unnecessary, deleted
- error on resources in library function params or return
- mark entry clones as internal linkage since they could contain params
that are not allowed for libraries.
- fix MarkUavUpdateCounter - handle nested GEP
- disallow static resource use from exported library functions
- add/update tests
- detect undef resource paths that would otherwise be lost (FailUndefResource)
- set internal linkage for HL intrinsic function bodies so they aren't processed with user functions
- Rename legalize [static] resource use passes to DxilPromote(Local|Static)Resources
- Remove various dead code paths for resources, such as:
GenerateParamDxilResourceHandles, RemoveLocalDxilResourceAllocas,
lower handle cast, non-promotable error (may translate to index allocas)
- force resource GVs to external constant with no initializer
This change disables the build targets that are not supported on Unix
platforms when compiling on those platforms.
We were also ignoring DXCompiler.cpp when compilation in the past. We
can now bring it back.
We have also added cmake-predefined-config-params which contains all the
cmake configurations needed to build on Unix platforms. It will make it
easy for developers to build.
Add shader mask and min SM to RDAT
- add OP::GetMinShaderModelAndMask to compute mask and min SM
- account for ops that can be translated during linking
- update PS ops that could be allowed in library functions
- update min SM with newer optional feature flags
Fix AV in TranslateHLSubscript for typed buffer with struct element
- add OP::GetMinShaderModelAndMask to compute mask and min SM
- account for ops that can be translated during linking
- update PS ops that could be allowed in library functions
- update min SM with newer optional feature flags
This will give users a way to know the commit from which a DXC
library/binary was built from, so that they can compare different
libraries/binaries and report bugs easily.
dxc.exe now also prints the commit info using this new interface
method. It uses the following format:
(dev;<commit-count>-<commit-hash>)
In this CL, we have introduced the "WinAdapter.h" header file.
This file includes declarations and definitions that will enable compilation of the code base on non-Windows platforms. These include:
* Disabling SAL annotations for non-Windows platforms.
* Defining some Windows-specifc macros that are not defined on other platforms.
* Defining Windows-specific types that are used throughout the codebase for non-Windows platforms.
This file is currently not complete. As we go forward, we will add more classes/structs/type definitions to this file for specific purposes. The contents of this file is completely disabled (#ifdef'd away) for Windows, and do not affect Windows compilation in any way. For the most part, this is our way of enabling compilation of the codebase on non-Windows platforms without making invasive changes to the existing cpp files, and without hindering any future development. Moreover, I have added the #include of this file for all the locations that will need it for successful Linux compilation.
Finalize OpCode changes for Dxil 1.3 / SM 6.3
- Rename CreateHandleFromResourceStructForLib to CreateHandleForLib
- Add PrimitiveIndex
- Add final NumOp[Codes|Classes]_Dxil_1_3 values
- Fix legal shader stage set for PrimitiveID
hctstart.cmd and hctbuild.cmd now support ARM64 builds.
Note that building ARM64 build on x86/x64 machine needs a location
of x86/x64 version of TableGen tools (clang-tblgen and llvm-tblgen).
Before starting ARM64 build set the BUILD_TBLGEN_PATH to point to
the TableGen binaries or use the -tblgen option on hctbuild.cmd.
Also had to rearrange control flow around the cmake --build call
in hctbuild.cmd due to a flaky batch behavior around brackets.
Primarily if not exclusively due to the massive carveouts of the
original LLVM source base as part of the HLSL adaptation, many
variables and functions are left unused. In keeping with the
practice of commenting or ifdef-ing out unused portions of this
code and marking every such exclusion as an HLSL change, this adds
few comments and moves a lot of preprocessor conditionals around to
encompass the portions left unused as a consequence of the earlier
exclusions.
Fixes 450 clang and 442 gcc warnings.
Several #included files were capitalized incorrectly in the case of
files pertaining to the project, or inconsistently in the case of
Windows headers which are typically only relevant where capitalization
doesn't matter. In spite of that, consistent capitalization can be
useful for cross platform stubbing purposes.
This corrects capitalization of a few Dxil* headers and forces
capitalization of Windows-specific headers to be consistent with
the overwhelming majority of other cases.
No functional change, just cross-platform facilitation
Add -auto-binding-space option enabling auto-binding for lib targets
- Added DxilAllocateResourcesForLib pass to allow auto-binding any unbound
resources in a DxcOptimizer pass.
- Added AutoCleanupPerThreadFileSystem for calling cleanup on scope exit,
useful for command line users.
- Use global singleton and ensure cleanup on exit if not called
- Put setup/cleanup in VerifierTest.cpp where the TestModule[Setup|Cleanup]
methods live.
- Remove GlobalPerThreadSys from FileCheckForTest, since this has to happen
at the module level instead.
- Added Setup/Cleanup to all the appropriate places
- Fixing device support for > SM 6.0
- updating dxexp
- Also fixing the following tests:
ExecutionTest::DenormBinaryFloatOpTest#FDivDenormFTZ
ExecutionTest::DenormBinaryFloatOpTest#FAddDenormAny
ExecutionTest::DenormBinaryFloatOpTest#FMulDenormAny
ExecutionTest::DenormBinaryFloatOpTest#FDivDenormAny
ExecutionTest::DenormBinaryFloatOpTest#FMulDenormFTZ
ExecutionTest::DenormBinaryFloatOpTest#FAddDenormFTZ
ExecutionTest::DenormBinaryFloatOpTest#FSubDenormFTZ
ExecutionTest::DenormBinaryFloatOpTest#FSubDenormAny
ExecutionTest::DenormTertiaryFloatOpTest#FMadDenormAny
ExecutionTest::DenormTertiaryFloatOpTest#FMadDenormFTZ
2. Not SROA on resource.
3. avoid unpack for resource.
4. Use value type for createHandleForLib.
5. Move all scalar global resource load to the beginning of function.
This will stop other pass do optimizations on the loads of scalar global resource.
6. Not remove local resource for lib.
Fixing a bunch of SM 6.2 Tests
- BinaryHalfOp tests reading input2 instead of input1
- Fixed converting signed zero from float to float16
- Fixed compiler options for float16 min,max test
- Fixed expected results for following float16 operations
- cos(0), sin(314), sin(-314)
- frc(-7.39)
- rsqrt(65504)
- int16 subtraction
- int16 unsigned multiplication
- sqrt, rsqrt, round_pi, round_ni, fmad treat denorm as it is
Two new Vulkan specific intrinsic resource types, SubpassInput and
SubpassInputMS, are introduced to provide a way to use subpass inputs
in Vulkan. Their signatures:
template<typename T>
class SubpassInput {
T SubpassLoad();
};
template<typename T>
class SubpassInputMS {
T SubpassLoad(int sample);
};
T can be a scalar or vector type.
If compiled without ENABLE_SPIRV_CODEGEN, the compiler will report
unknown type name for SubpassInput(MS). If compiled with SPIR-V
CodeGen and trying to use SubpassInput(MS) in DXIL CodeGen, the
compiler will report an error saying the type is Vulkan-specific.
A new Vulkan-specific attribute, vk::input_attachment_index is
introduced to provide a way to set the input attachment index for
SubpassInput(MS) objects.
- TAEF is unable to recognize negative values in hex representation in x64 released build. So use decimal values instead
- Add 'Shl', 'LShr', 'And', 'Or', 'Xor' execution tests
- Update Warp Version for test cases known to fail
- Cleaning up execution test
- Generating xml test file from the script and hctdb
- Fixing hctdb categories for unsigned int
- Adding more execution tests (Fadd, FSub, FMul, FDiv, Add, Sub, SDiv)
This change removes new Load variations (LoadHalf, LoadFloat, etc) for RWByteAddressBuffer/ByteAddressBuffer methods and add templated Load intrinsics. This templated Load only works on scalar or vector types, and its variations (Load2, Load3, Load4, etc) do not work with templates and work as it did before (only storing uints). For Store operation, you can store any scalar or vector types of up to 16 bytes, while prior to 2018 Store only supported storing uint scalar.
This change is to add fixed width scalar types in HLSL 2018.
These types are: int16_t, uint16_t, int32_t, uint32_t, float16_t, float32_t, and float64_t.
Prior HLSL versions do not support these fixed width types.
This change also renames /no-min-precision command line option to /enable-16bit-types. int16_t and uint16_t are only allowed with this mode.
This change is to introduce native precision types for shader model 6.2. With native half (float16) types, we will introduce native int16/uint16 types with same command line option. This means that one option will disable all min precision types. For now, int16/uint16 types will be exposed on HLSL level as int16_t and uint16_t.
This change also includes some constant buffer fixes for 16 bit types, as well as int64 types.
This change is an extension of float16 support. We are adding LoadHalf, LoadFloat, and LoadDouble method to byte address buffer so that users can access data from byte address buffer by these types. Also starting shader model 6.2, we are mapping byte address buffer and structure buffer load/store operations to RawBufferLoad/Store to differentiate raw buffer load from typed buffer load. Unlike BufferLoad for typed buffers, RawBufferLoad for min precision types will not have its min precision values as its return types, but their actual scalar size in buffer (i.e rawBufferLoad.i32 for min16int and rawBufferLoad.f32 for min16float). RawBufferLoad/Store contains additional parameters, where mask was required for correct status behavior for CheckAccessFullyMapped, and alignment is for relative alignment for future potential benefit for backend.
* Adds support for pause/resume in pipeline.
* Adds 0.0 as the validator version to indicate no validation should occur.
* Moves module parsing from dxcompiler down to HLSL library.
* Adds support for optimizer to process a DXIL part.
* Adds a form with designer support for the interactive optimizer.
See the comment in DxilDebugInstrumentation.cpp for a summary of how this-all works. Basically: instrument all instructions to output "trace" info to a UAV so that a debugger can reconstruct the execution of the shader and offer a debugger-like experience.
* Add SV_Position index parameter, various minor tweaks to UAV setup
* Convert to raw buffer
* fix up unit tests, add a few names to variables
* CR feedback: check for pre-existing type; typo
This change is to enforce the new constraint on signature packing: pack signature elements by data width. Before we introduce fp16 type, every element was assumed to reserve 32 bits. Since we are introducing a new 16 bit data type, we need a new way to enforce signature rules.
After discussions we decided that it would be nice to pack elements based on data width. However, we are still enforcing the rule that each row contains up to 4 elements, regardless of the size. This way, depending on the hardware support drivers can optimize packing signatures, while on DXIL level we maintain the assumption that there are 4 elements per row. We are also still constraining on the total number of rows to be 32 for now. This can be changed in the future if people find this limit to be an issue.
* Fix ViewID state for flow control and consistent ViewID detection
1. Different ways of determining ViewID usage led to assert or crash.
Use DxilModule flag to guarantee consistency. But this required
splitting DxilEmitMetadata pass into two passes, one that finalizes
the module and computes flags run before ComputeViewIdState, and one
that simply writes the metadata.
2. storeOutput in flow control with values not dependent on that flow,
such as literals, would not pick up control flow dependence.
Fix by picking up control flow dependence on storeOutput instructions,
not just the value being written.
* Fix control depenence computation
There was a bug in the implementation of the control dependence algorithm.
We were incorrectly iterating over all descendents in the postdom tree where
the algorithm should only iterate over immediate children.
This change is to support preserve denorm operations. We do this by creating dxc options and propagate them all the way to dxil metadata so that drivers can see how it should handle floats.
Currently we support three modes: any(default), preserve, and ftz
Denorm option will be of per function flag. We store this information at function annotation metadata. This means that for DXIL 1.2 we have a new structure for function annotation. The change in structure is documented on DXIL.rst
The pass implements functionality for PIX for the "pixel cost", "depth complexity" and "overdraw" visualizers. You can probably infer what the pass does from the names "overdraw" and "depth complexity": For each pixel rendered it increments a corresponding counter in a UAV of a buffer that is the same size as the render target. The "pixel cost" pass does the same thing, only the increment is a weight value calculated from the total cost of the draw call, as derived from PIX's GPU-side profiling system.
This pass will be used in PIX for "Dr. PIX experiments", wherein a workload is re-rendered with MSAA turned off in order to get an approximate understanding of the performance impact of MSAA.
This change enables CodeGenSPIRV test flow.
New tests can be added by simply adding the test file to
`tools/clang/test/CodeGenSPIRV/` directory and running:
```cpp
TEST_F(WholeFileTest, NewTest) {
EXPECT_TRUE(runWholeFileTest("new-test-name"));
}
```
The input file with the format described in `WholeFileCheck.h` is read in;
the HLSL portion is passed to the compiler with SPIR-V codegen enabled.
The resulting SPIR-V binary is disassembled and compared to the expected
result in the input file.
Supporting a custom allocator for dxcompiler.
Adds recovery for exceptions and out-of-memory handling.
Add custom allocator support to linker.
Fix for release-only test failure.
Removes assertion about presence of command-line option registration
* Add DxilD3DCompile and DxilD3DCompile2 as examle of library use.
It will compile shader into library first, then link to generate the final dxil.
If there's include used, it will compile included file into seperate lib and link all lib together.
* Add remove-discards pass
* explicit lfs
* Change pass name, remove copy-pasted member enum, reduced include set
* CR feedback: better way to delete instructions, stale comment, end iterator could go stale, CHECK-NOT is a good idea.