Adjust the vectorized filter so that it can handle tile widths
which are not a multiple of 4, so we do not have to fall back
to the C version of the filter.
Negligible speed impact for tiles with widths which are multiples
of 4, and greatly improves speed on tiles with non-multiple-of-4
widths.
Change-Id: Iae9d14f812c52c6f66910d27da1d8e98930df7ba