This adds an SSE2 optimized path for UTF-16 strings that can be trivially
converted to UTF-8 (they contain no characters > 0X7F). It has a
speedup for the case where many of the leading characters are < 0X80 and
should have the same performance in the worst-case scenario where the first
character is > 0X7F.