Make sure to test everything with scalar as well to catch any weird edge
cases.
Not all opcodes are covered here, just the arithmetic ones. FP64 packing
is also ignored.
This maps them to their MSL equivalents. I've mapped `Coherent` to
`volatile` since MSL doesn't have anything weaker than `volatile` but
stronger than nothing.
As part of this, I had to remove the implicit `volatile` added for
atomic operation casts. If the buffer is already `coherent` or
`volatile`, then we would add a second `volatile`, which would be
redundant. I think this is OK even when the buffer *doesn't* have
`coherent`: `T *` is implicitly convertible to `volatile T *`, but not
vice-versa. It seems to compile OK at any rate. (Note that the
non-`volatile` overloads of the atomic functions documented in the spec
aren't present in the MSL 2.2 stdlib headers.)
`restrict` is tricky, because in MSL, as in C++, it needs to go *after*
the asterisk or ampersand for the pointer type it's modifying.
Another issue is that, in the `Simple`, `GLSL450`, and `Vulkan` memory
models, `Restrict` is the default (i.e. does not need to be specified);
but MSL likely follows the `OpenCL` model where `Aliased` is the
default. We probably need to implicitly set either `Restrict` or
`Aliased` depending on the module's declared memory model.
This is quite complex since we cannot flush Phi inside the case labels,
we have to do it outside by emitting a lot of manual branches ourselves.
This should be extremely rare, but we need to handle this case.
We used to use the Binding decoration for this, but this method is
hopelessly broken. If no explicit MSL resource remapping exists, we
remap automatically in a manner which should always "just work".
There is a risk that we try to preserve a loop variable through multiple
iterations, even though the dominating block is inside a loop.
Fix this by analyzing if a block starts off by writing to a variable. In
that case, there cannot be any preservation going on. If we don't, pretend the
loop header is reading the variable, which moves the variable to an
appropriate scope.
In multiple-entry-point modules, we declared builtin inputs which were
not supposed to be used for that entry point.
Fix this, by being more strict when checking which builtins to emit.
This gets rather complicated because MSL does not support OpArrayLength
natively. We need to pass down a buffer which contains buffer sizes, and
we compute the array length on-demand.
Support both discrete descriptors as well as argument buffers.
Change aux buffer to swizzle buffer.
There is no good reason to expand the aux buffer, so name it
appropriately.
Make the code cleaner by emitting a straight pointer to uint rather than
a dummy struct which only contains a single unsized array member anyways.
This will also end up being very similar to how we implement swizzle
buffers for argument buffers.
Do not use implied binding if it overflows int32_t.
We made the mistake of registering a dependency on the atomic variable
even if the atomic result was forced to a temporary. There is no need to
register reads from atomic variables like this as we always force atomic
results to a temporary and argument read/writes do not need to be
tracked.
If we generate an access chain in a loop body, and it is consumed in the
loop continue block, we have a problem because we cannot emit a
temporary here holding the access chain reference. Force a complex loop
body to workaround this exceptionally rare case.
We cannot deduce if OpLoad needs ArrayCopy templates early since it's
heavily context dependent, and we might only know on 3rd iteration of
the compile loop.
-1 (0xffffffff) literal means the component should be undefined.
Since we cannot express undefined directly, just use a 0 literal in the
appropriate type.
This is necessary to deal with indirect draws, where the draw parameters
are given in a buffer instead of passed by the CPU. For normal draws,
the draw parameters are set with Metal's `setVertexBytes:` method.
This undoes the change to add the vertex count to the aux buffer,
rendering that entire discussion largely moot. Oh well. It was a
discussion that needed to happen anyway.
In the past, SPIRV-Cross threw an error in this case because it couldn't
work out which swizzle from the auxiliary buffer needs to be passed.
Now, we pass the swizzle around with the texture object, like a combined
image-sampler and its associated sampler.
MSL does not support value semantics for arrays (sigh), so we need to
force constant references and deal with copies if we have a different
address space than what we end up guessing.