Similar to the bmethod/send optimization, this avoids using
CALLER_ARG_SPLAT if not necessary. As long as the receiver argument
can be shifted off, other arguments are passed through as-is.
This optimizes the following types of calls:
* symproc.(recv) ~5%
* symproc.(recv, *args) ~65% for args.length == 200
* symproc.(recv, *args, **kw) ~45% for args.length == 200
* symproc.(recv, **kw) ~30%
* symproc.(recv, kw: 1) ~100%
Note that empty argument splats do get slower with this approach,
by about 2-3%. This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.
The following types of calls are not optimized:
* symproc.(*args)
* symproc.(*args, **kw)
This is because the you cannot shift the receiver argument off
without first splatting the arg.