In most of case `sort_by` works on primitive type.
Using `qsort_r` with function pointer is much slower than compare data directly.
I implement an intro sort which compare primitive data directly for `sort_by`.
We can even afford an O(n) type check before primitive data sort.
It still go faster.