Fix compiling issues in x86 configuration when using a target other than x64
The script didn't set the -A parameter that vs2019 depends on in any but the x64 case
Additional simplifications and documentation to configandbuild function
Add new options to hctbuild that enable passing in additional arguments to cmake invocation
and also cmake customization hooks that will be included at the beginning and end of DXC's CMakeLists.txt.
Rework library search in FindD3D12.cmake and add one more library to look for.
Fix case in FindDiaSDK.cmake - fixes a cmake warning.
* Use Attribute to designate wave-sensitive intrinsics
This adds an intrinsic attribute to indicate wave-sensitivity that can
be indicated in gen_intrin_main.txt. This and other attributes are
passed along through function representations and lowerings. The
wave-sensitivity needs to be maintained specifically through SROA passes
since it is used by the CleanupDxbreak pass that comes after them.
Specifically this is done to allow extension intrinsics to indicate
wave-sensitivity, but the same mechanism is now used for builtin
intrinsics.
Most intrinsics get a mostly nameless function to represent them between
Codegen and dxilgen. This allows any that have the same prototype to
share the same function. For wave-sensitive intrinsics, they need a
different function or else the attrubute would be similarly shared with
intrinsics matching the prototype. So a minor change is made to their
function names to prevent this.
Adds testing for all these ops and a dummy extension one.
* Skip copy-in/copy-out for constant global variables.
* Enable copy for noinline.
TODO: analysis for global variable alias with parameter.
* Use SetVector and skip resource when copy.
* Disable mayAliasWithGlobal because optimization already covered case not replace when have alias.
When replace const global to a normal value has store, mark it non-constant.
* Gather should not be considered a gradient operation.
* Mark WriteSamplerFeedback[Bias] as gradient
* Remove SampleCmpLevelZero from DxilConvergent
* Add WriteSamplerFeedback[Bias] to DxilConvergent
* Conditionalize breaks to keep them in loops
This introduces dx.break, a temporary builtin that is applied as a
condition to any unconditional break in order to keep the basic block
inside the loop. Because it remains in the loop, operations that depend
on wave operations inside the loop will be able to get the right values.
Such builtins have to be added at finishcodegen time or else clang
throws an error for an undefined function. Consequently, the creation of
these is split in two. First the branch is created with just a
constant true conditional. Then at finishcodegen, it is converted to the
result of the dx.break() builtin.
By using the result of a temporary builtin function, the optimization
passes don't touch the false conditional like they might if we started
with the global constant.
Normal break blocks don't need this conditional, but we don't know that
at code generation. So a later pass identifies break blocks with wave
sensitive operations that depend on wave ops that are inside the loop they
are breaking out of and preserves those conditionals while removing all
the rest.
As part of dxil finalization, the dx.break() function is removed and all
branches that depended on it are made to depend on function-local
loads and compares of a global variable dx.break.cond.
The DxBreak Fixup pass depends on dxil-mem2reg. It is placed immediately
after to allow as many optimizations to go as they would without this
change in shaders that don't have any wave ops.
* Adds the DxcPixDxilDebugInfo interface and friends.
* Modifies the entrypoints to require InParam/OutParam for pointers, as well as CheckNotNull them
* Removes S_FALSE for happier Jeff
* Fixes broken test
* returns E_POINTER for nullptrs
* Returns S_FALSE from UnAlias for non-aliasing types.
* fails GetName for arrays
* Addresses CR comments
- Fix output naming for StringRef
- Support /Fc when /Fh is provided
- Fix root sig strip flag
- Create root sig stream output
- Validate -Frs root sig output for signing with DXIL.dll
- Move outputs under valHR success branch
- Add test for -Fre, -Fsh, -Frs, and -Fc with -Fh
- Rework hcttestcmds.cmd completely: checks a lot more stuff and is way more robust,
while being much easier to read, add to, and maintain, I hope.
This is a pass for PIX that adds instructions to write mesh shader output (vertices and indices) to a UAV for later ingestion by PIX in order to present a view of that output.
When run on a fast system with a lot of threading, the custom cmake
commands to generate version include information ran into a problem
because the two include files both depended on the same intermediate
file for some builds. As a result, when the first thread started writing
to it, the second thread couldn't access it if it started to try before
the first finished. The result was an empty header and no defined
version macros
By elminating the common intermediate file and giving each output file
their own intermediate file, the generation should proceed no matter how
many threads and unnecessary recompilation won't result when
regeneration results in no changes.
To facilitate this, I've added a cmake macro that handles both cases.
The peculiarities of how they are generated are all handled outside the
macro.
The original generation of HLSL macros containing version information
used the same generated header file as the RC file. This was fine except
when building in xbtools where the version information in the RC is
overridden to match the GDK.
To maintain the DXC macros as communicating the version of the DXC
component to better enable workarounds based on that information, the
same generator script is used, but the header file is separate and
doesn't abide by the fixed version location provided via cmake defines.
This way it will always contain DXC information while the RC file can
continue to contain the version information as it did before.
To allow for the case where the headers are not the same, a preliminary
version.gen file is created according to the dxc version requirements
and copied to the dxc header. In most cases, it is copied to the RC
header as well. When an explicit path is specified for the fixed
version, the DXC version uses the "offficial" generated header and the
RC file gets the version specified by the file in the given location.
Reverts the changes to the target_version_vertex.hlsl test since it
should always succeed now.
Add GetResourceFromHeap for hlsl.
For Dxil, add CreateHandleFromHeap and AnnotateHandle.
All handles ( createHandle, createHandleForLib, createHandleFromHeap ) must be annotated with AnnotateHandle before use.
TODO: add AnnotateHandle for pix passes.
cleanup code about resource.
* HLSL test infrastucture and other refactoring
Refactor common test infrastructure code into HLSLTestLib
Enable invocation of fxc and other executables via // RUN: commands in test files
Add latest d3dx12.h to include/dxc/Support and remove two other outdated copies
Improve DXIL container header validation on load
New helper classes DxilContainerReader and FixedSizeMemoryStream
Move LoadSubobjectsFromRDAT to DxilSubobjects.cpp
Co-authored-by: Greg Roth <grroth@microsoft.com>
* Add HLSL macros with version information
Adds three sets of version information as defined macros available
to the shader.
The DXC version is included as __DXC_VERSION_{MAJOR|MINOR|RELEASE|COMMITS}
The values of these are set at compile time and derived from version.inc
RELEASE and COMMITS information varies according to the criteria
described in gen_version.py.
The HLSL Version is included as __HLSL_VERSION. It may be taken from
the -HV argument with which dxc was invoked, derived from other
arguments, or the default.
The shader target stage and version information is taken from the
-T argument with which dxc was invoked. For the stage, defined
constants representing each stage are defined at compile time that
correspond to the enum values used internally. The shader stage
specified by -T is determined at runtime and set to the corresponding
enum value. The major and minor versions are similarly derived from
the -T argument at runtime.
The runtime defines are set in InitializeStandardPredefinedMacros()
This allows them to be set just once and the same way for any way
that dxc is invoked including testing.
This makes some changes to how the version.inc file is generated and
found in cmake files in order to generate it regardless of the need
for a .rc resource file as it was before.
A series of hlsl tests are added that test for the defines presence
and values in order to add versus subtract with checks for the adds.
Since many of the defines are determined by arguments at runtime, a
few different tests were needed, but as much consolidation as possible
was done to reduce the number of individual test runs.
According to the description at the top of gen_version.py and the
commit that introduced it, the "Official build" versioning is meant
to add 10000 to the number of commits since the last official release
when the build is in the master branch. However, because of the use
of "is" instead of "==", the comparison between the variable and
the string literal in the python script was never succeeding. The
"is" operator only succeeds when it compares two variables that
have been assigned the same variable or one has been assigned to
the other. For comparisons of contents, "==" should be used.
After initial generation, subsequent invocations of cmake will
only alter the defines that are specified by that command. Any
defines that were set previously retain their earlier settings
by being stored in the CMakeCache.txt in the build directory.
hctbuild.cmd allows specifying certain defines when arguments are
passed to it, but many of these do not allow resetting the defines
they set. So if you define an official versioned build, every
subsequent build with use the official version unless the cmake
cache is edited or the whole build directory is wiped clean. This
is true of several other settings.
This change sets the defines hctbuild cares about every time it is
invoked regardless of arguments. If the official build is specified
the official define is set to ON. If it isn't, it is set to OFF.
This way these flags can easily be switched on the same build dir.
New Rules:
outer type may be: [ptr to][1 dim array of]( UDT struct | scalar )
inner type (UDT struct member) may be: [N dim array of]( UDT struct | scalar )
scalar type may be: ( float(16|32|64) | int(16|32|64) )
- Disallow pointers to pointers, and pointers in structs
- Disallow multi-dim arrays at top-level, but allow within struct
* Erase dead region
* Pass dependencies
* Simpler heuristic, only checking that Begin dominates End and End post dominates Begin
* Small cleanups. No longer iterating whole block to find PHIs
* A few optimizations. Fixed infinite loops caused by self-loops
Fixes problems like:
- extra AllocateRayQuery calls, or improper location (for lifetime)
- proper array support
- static global RayQuery
This RayQuery allocation changes:
- Add a constructor to RayQuery
- Set init sequence to use constructor in InitializeInitSequenceForHLSL, just for RayQuery
- For array: modify EmitCXXAggrConstructorCall to
- loop over index instead of pointer to allow SROA of RayQuery struct
- mark the loop as HlslForceUnroll
- Add hidden flag for HL intrinsics to allow internal intrinsic not produced
by HLSL directly - mangle name so it can't be matched during parse.
- Add hidden HL AllocateRayQuery intrinsic
- Translate constructor call on ptr to HL AllocateRayQuery intrinsic call producing handle i32 during FinishCodeGen
- Translate RayQuery ptr to load i32 handle value for intrinsic methods during SROA_HLSL
- Flatten RayDesc for TraceRayInline
(otherwise /Od fails validation since RayDesc type may still be present)
- No longer skip RayQuery for SROA_HLSL
- Update lowering for AllocateRayQuery, i32 handle, and flattened RayDesc
- Remove ReadNone attribute from AllocateRayQuery to prevent incorrect CSE optimizations
- Manually cleanup unused RayQuery allocations
* Adding test fixes and more support for SM6.5 WaveMultiPrefix functions
1) Ensure wave lane-id’s are sorted when accumulating result
2) Fix made for HLSL in ShaderOpArithTable.xml to ensure the input aligns with what is being tested
3) Added support for the “UBit” version for all of these tests.
Two test options, -Qstrip_reflect_from_dxil and -Qkeep_reflect_in_dxil
for making tests work with reflection removed, since many tests are relying
on main module disassembly-reassembly between test phases and reflection
metadata will no longer be present there. The strip option is for the
few cases where tests don't want the reflection kept in DXIL by default.
Validator no longer requires function annotations for no reason.
Fix places where remove global hook was not being called when functions
were removed manually from the list.
StripReflection now deletes function annotations, unless targeting lib or
old validator that required them. Preserve global constructor list and
add annotation for 1.4 validator. The global hook fixes were required
here, otherwise annotations would refer to dead functions during linking.
Struct annotations may not be removed in library case when they still need
translation to legacy types.
Allow missing struct annotation when not necessary to upgrade the layout.
Preserve usage in reflection by upgrading the module, emitting metadata,
cloning for reflection, then restoring validator version and re-emit
metadata.
Fix size for 16-bit type for usage and reflected size.
Make various batch reflection tests require validator 1.5, since these
tests rely on module disassembly->assembly, which will not preserve extra
usage metadata for reflection in 1.4.
Include reflection part in IDxcAssembler, but don't strip from module,
since there are no options to prevent this from breaking a lot of tests.
Don't strip reflection from offline lib target.
- Align coord dimensions with Sample for future flexibility and alignment
- Fix ddx and ddy arguments to support the correct number of dimensions
- Rewrite lowering, using SamplerHelper
- Clean up SampleHelper a bit, adding additional asserts/checks
- Set components to zero for default offset, not undef
- Compute mesh payload size before final object serialization
- During CodeGen for MS based on payload parameter
- During CollecShaderFlagsForModule for AS based on DispatchMesh call
- Store payload sizes in corresponding funtion properties, serializing
these properly for HL and Dxil Modules
- Use payload sizes from function props for PSV0 data during serialization
- Validate measured and declared payload sizes, don't just fill in
properties during validation
- Fix Wave/Quad allowed shader stages, enabling Quad* with CS-like models
- rename payloadByteSize members to payloadSizeInBytes
- Add GetMinShaderModelAndMask overload taking CallInst for additional
detail required to produce correct SM mask for Barrier operations
- Update the HLSL syntax from FeedbackTexture2DMinLod to FeedbackTexture2D<SAMPLER_FEEDBACK_MIN_MIP>
- Update DXIL to only have two UAV types for FeedbackTexture2D[Array] and use an extra metadata field to distinguish between the sampler feedback type.
The tests were manually added to ShaderOpArithTable.xml in PR #1867,
but ShaderOpArithTable.xml should be generated by hctdb_test.py script.
No actual changes to the test data, just the order of the tests is different
because of rearranging done by the script XML processing.
Added a check for shader model 6.5 so the tests will be skipped on lower
shader models.
- base of struct should always be aligned - or internal bug
- offset for array member must always be aligned - (new) validation error
- alloc and verify struct layouts even when not array field
- out of bound check would have missed OOB on last array element
Added generating of new version for each DX Compiler build.
There are 3 kinds of version:
1. **Official build**
Built by using `hctbuild -official`. The version is based on the current DXIL version, latest official release and a number of commits since then. The format is `dxil_major.dxil_minor.release_no.commit_count`. For example a current official version would be something like `1.5.1905.42`. The latest release information is read from `utils\version\latest-release.json`. The `1905` corresponds to `dxil-2019-05-16` release branch and `42` is the number of commits since that release branch was created. For master branch the `commit_count` will be incremented by 10000 to distinguish it from stabilized official release branch builds. So the current official version of master would be someting like `1.5.1905.10042`.
2. **Dev build**
Build by using `hctbuild` with no other version-related option. The format is `dxil_major.dxil_minor.0.commit_count` where commit_count is the number of total commits since the beginning of the project.
3. **Fixed version build**
Build by using `hctbuild -fv`. Enables overriding of the version information. The fixed version is read from `utils\version\version.inc`. Location of the version file can be overriden by `-fvloc` option on `hctbuild`.
In addition to the numbered version the product version string on the binaries will also include branch name and last commit sha - `"1.5.1905.10042 (master, 47e31c8a)"`. This product version string is included in `dxc -?` output.
Adds support for templatized RWRawByteBuffer.Store<T>. To avoid SROA making us lose the original layout of any struct arguments, a new pass runs before SROA and breaks down such cases into per-element stores. So better be careful with the likes of buf.Store(0, (int[65536])0);...
-Qstrip_reflect would reserialize the root signature, leading to
validation failure #2162. Fixed by moving root sig to writer to clear
from module and prevent re-serialization to metadata.
Fixed -Qstrip_debug with -Zi and no output location still embeding
debug module.
- New -Qembed_debug is required to embed PDB in shader container
- -Zi used without -Qembed_debug will not embed debug info anymore,
and will issue a warning from CompileWithDebug().
- When compiling with Compile() and -Zi, -Qembed_debug is assumed
for compatibility reasons (lots of breaks without it)
- In dxc and CompileWithDebug() -Fd implies -Qstrip_debug
- Debug name is based on -Fd, unless path ends with '\', meaning you
want auto-naming and file written under the specified directory
- Debug name always embedded when debug info used, or -Fd used
- -Fd without -Zi just embeds debug name for CompileWithDebug(),
still error with dxc, since it can't write to your file.
- If not embedding debug info, it doesn't get written to the container,
only to be stripped out again.
- Fix padding for alignment in DebugName part.
- Default to DebugNameForBinary instead of DebugNameForSource if no
DebugInfo enabled
- Also fixed missing dependency on table gen options from libclang
* Allow clip/cull elements to be declared as array [2]
- This approach fixes validation and packing to handle this case.
- There could be implications to runtime ViewID validation
- fix some issues found in packing related to rowsUsed result from Pack
functions. Make these return 0 on failure, instead of startRow.
- Split PackNext into FindNext and PackNext that uses it for greater
flexibility.
Some large shaders exhibit a behavior where phi nodes are created for resources, in which one of the possible incoming values is undef. This gets cleaned up later such that there are no undef resources left. However the fail-undef-resources pass has already failed compilation by that point. The fix is to change that pass to invalidate-undef-resources and replace the undefs with a special invalid value, such that we can produce an error later if still necessary, when optimization passes have been run such that temporary undef resources have been eliminated.
Those tests reach into HLSL, CodeGenHLSL, Samples and even quick-test directories to find the files they need. With this change, I move or copy files around to make sure that these test files are attributable to their consumer. I copied rather than moved files only in the case where a test in code explicitly reached into quick-test or Samples, because that essentially means that the file serves two different tests since they are also run as batch.
These new DXIL instructions are added to SM 6.5. The valid operations
for <Op> are:
- BitAnd
- BitOr
- BitXor
- CountBits
- Product
- Sum
In HLSL, these are exposed as:
uint4 WaveMatch(<type> val)
<type> WaveMultiPrefixBitAnd(<type> val, uint4 mask)
<type> WaveMultiPrefixBitOr(<type> val, uint4 mask)
<type> WaveMultiPrefixBitXor(<type> val, uint4 mask)
uint WaveMultiPrefixCountBits(bool val, uint4 mask)
<type> WaveMultiPrefixProduct(<type> val, uint4 mask)
<type> WaveMultiPrefixSum(<type> val, uint4 mask)
In DXIL, these are exposed as:
[BitAnd,BitOr,BitXor,Product,Sum]
%dx.types.fouri32 @dx.op.waveMatch.T(i32 %opc, T %val)
T @dx.op.waveMultiPrefixOp.T(i32 %opc, T %val, i32 %mask_x,
i32 %mask_y, i32 %mask_y, i32 %mask_z,
i8 %operation, i8 %signed)
[CountBits]
i32 @dx.op.waveMultiPrefixBitCount(i32 %opc, i1 %val, i32 %mask_x,
i32 %mask_y, i32 %mask_y,
i32 %mask_z)
Scalarization of vector types occur as per the existing wave intrinsics.
For WaveMatch, the match is performed on each scalar and the results
are combined with bitwise AND. For WaveMultiPrefix, the operation is
performed on each scalar and combined into an aggregate.