Similar to the bmethod optimization, this avoids using
CALLER_ARG_SPLAT if not necessary. As long as the method argument
can be shifted off, other arguments are passed through as-is.
This optimizes the following types of calls:
* send(meth, arg) ~5%
* send(meth, *args) ~75% for args.length == 200
* send(meth, *args, **kw) ~50% for args.length == 200
* send(meth, **kw) ~25%
* send(meth, kw: 1) ~115%
Note that empty argument splats do get slower with this approach,
by about 20%. This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.
The following types of calls are not optimized:
* send(*args)
* send(*args, **kw)
This is because the you cannot shift the method argument off
without first splatting the arg.