The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b