Fixes#1228. Fixes#234.
This uses imbue() to be locale independent. Notes:
- 'sstream >> double' is much slower than strtod()
* this was measurable in the test suite as a whole, despite being
a tiny fraction of what the test suite does
- so, this embeds a fast path that bypasses sstream most of the time
=> the test suite is faster than before
- sstream is probably slower, because it does more accurate rounding than strtod()
- sstream does not create INFINITY by itself, this was done based on failure inferencing
- Adds a pragma to see binary output of double values (not portable)
- Print decimals that show more values, but in a portable way
(lots of portability issues)
- Expand the tests to test more double values
Note: it is quite difficult to have 100% portable tests for floating point.
The current situation works by not printing full precision, and working around
several portability issues.
Includes:
Fold divide and multiply by same value.
Allow ADCE to remove more instructions.
Remove dead Workgroup variables in ADCE.
Fold fclamp feeding compare.
Add pass to fold a load feeding an extract.
Remove redundant stores.
SROA: Only create symbols that are loaded.
Get ADCE to handle OpCopyMemory
Remove the limit on struct size in SROA.
Previously, casting an object of a struct type to an identical type
would produce an error. This PR allows this case.
As a side-effect of the change, several self-type casts in existing
tests go away. For example:
0:10 Construct float ( temp float)
0:10 'f' ( in float)
becomes this (without the unneeded constructor op):
0:10 'f' ( in float)
For vector or array types this can result in somewhat less overall code.
Fixes: #1218
SPIR-V requires the coverage mask to be an array of integers, but HLSL
allows scalar integers. This adds the requisite type conversion and
wrapped entry point handling.
Fixes: #1202
This PR forces the external definition of SV_GroupID variables to 3-vectors.
The conversion process between the shader-declared type and the external type
happens in wrapped main IO variable conversion.
The same applies to SV_DispatchThreadID and SV_GroupThreadID.
Fixes: #1371
Append() method is special: unlike most outputs, it does not copy
some temporary data to a symbol in the entry point epilogue, but
rather uses an emit builtin after each write to the output stream.
This had been handled by remembering the special output symbol for
the stream as it was declared in the shader entry point before
symbol sanitization. However the prior code was too simple and
only handled cases where the Append() method happened after the
entry point, so that the output symbol had been seen.
This PR adds a patching step so that the Append()s may appear in
any order WRT the entry point. They are patched in an epilogue,
whereupon it is guaranteed in a well formed shader that we have
seen the appropriate declaration.
Fixes#1217.
When assigning a location to an interface whose stage automatically
converts the interfaces to an array, it now strips off the outermost
array from the type before calculating how many locations it consumes.
Notably, this includes the following:
- Fix exponential performance of DeadInsertElim
- Constant folding of OpVectorTimesScalar
- Add Vector DCE
- Add store-store elimination to LocalSingleBlockElim
- Improve folding of OpVectorTimesScalar and OpPhi
- Fixes handling of kill and unreachable ops in inlining