This only makes sense for AVX2, because widening it from a 64-bit comparison
to a 128-bit comparison is hardly worth it, and there are gaps in the SSE2
instruction set (missing _mm_cmpeq_epi64, which is introduced in SSE4.1) that
would require us to compensate and probably take a sizeable perf hit.
Differential Revision: https://phabricator.services.mozilla.com/D152297