This avoids falling back to a slow path when doing x8888 -> 8888 scales. It is basically just cherrypicked from upstream.