Граф коммитов

1281 Коммитов

Автор SHA1 Сообщение Дата
hkuang be6aeadaf4 Try again to merge branch 'frame-parallel' into master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.

Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.

There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

This reverts commit a18da9760a.

Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-30 21:00:13 -08:00
Jingning Han 9bdc0ae2b2 Format fixes in vp9_rd_pick_inter_mode_sb/sub8x8
Add parentheses to bit operations.

Change-Id: I095d601f0631d055adc4b3a8fde70c9cbae9e749
2015-01-23 11:48:58 -08:00
Johann a18da9760a Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch."
This reverts commit bde04ce503

Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-23 08:42:02 -08:00
hkuang bde04ce503 Merge branch 'frame-parallel' to enable frame parallel decode in master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""

 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-22 18:18:53 -08:00
Jingning Han 5c31fd5c6d Merge "Enable sub8x8 inter block search for RTC coding mode" 2015-01-02 10:00:35 -08:00
Jingning Han dad89d5ca1 Enable sub8x8 inter block search for RTC coding mode
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.

Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
2014-12-24 17:40:31 -08:00
Jim Bankoski b3c66f8a2f WIP: Remove giant value cost table
Change-Id: Iabe8a8868a747626c24bb13f1796f4c7827af367
2014-12-23 15:06:17 -08:00
Jim Bankoski 4b8c6d96ec Tokenization without huge tables.
Change-Id: Iff528c4b7528cc70320343b3a7ce07a92b024dfd
2014-12-22 08:42:52 -08:00
Paul Wilkins 2e39817f5e Merge "Improve motion detection for low complexity regions." 2014-12-18 08:38:21 -08:00
Jingning Han 01613aa753 Set second ref frame to be NONE in key frame coding
This commit explicitly set the second reference frame type to be
NONE in key frame coding mode. This fixes a subtle dependency of
reference motion vector used by next inter frame on mode_info
reset before key frame coding.

Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a
2014-12-16 15:49:58 -08:00
Paul Wilkins b6c75c5a8d Improve motion detection for low complexity regions.
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.

Once it has been established though the field propagates well using
Nearest and Near MV.

This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.

This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.

Average results for test sets approximately neutral.

Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524
2014-12-16 17:22:54 +00:00
Jingning Han eefe869291 Simplify rate-distortion modeling function
Use left shift to replace one multiplication. The computation
outcome remains identical.

Change-Id: I1e1737af0a245de0d2a2bde10f0c171477199fc1
2014-12-15 11:51:16 -08:00
James Zern 72ece1308b vp9: move encoder-only member from common
allow_comp_inter_inter VP9_COMMON -> VP9_COMP

Change-Id: I6d9dc25d1cdd7e2ab62f5be69cd9fa883d21dbb6
2014-12-12 11:17:44 -08:00
hkuang 382f86f945 Improve the performance by caching the left_mi and right_mi in macroblockd.
This improve the deocde performance by ~2% on Nexus 7 2013.

Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3
2014-12-05 16:25:42 -08:00
Jingning Han 74ded4863e Enable conditional skip path in rd_pick_intra_sby_mode
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us

Overall rtc set compression performance is down by -0.257%.

Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
2014-12-05 09:36:09 -08:00
Yunqing Wang edbd61e136 vp9_ethread: modify VP9_COMP structure
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.

In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.

Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
2014-11-24 17:57:38 -08:00
Yunqing Wang 379334c2d8 vp9_ethread: move filter_cache out of RD_OPT struct
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.

Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
2014-11-20 13:44:16 -08:00
Yunqing Wang b0efddd8e6 vp9_ethread: change mask_filter to a local variable
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.

Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
2014-11-20 09:41:49 -08:00
Yunqing Wang 70c9d2983b Revert "vp9_ethread: include a pointer to mb in VP9_COMP"
This reverts commit 6906d218dd.

Another way will be used to handle mb struct.

Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4
2014-11-20 08:31:12 -08:00
Yunqing Wang 6906d218dd vp9_ethread: include a pointer to mb in VP9_COMP
Modified VP9_COMP struct to include MACROBLOCK *mb. This change
makes it feasible in multi-thread case to allocate a mb for each
thread.

Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
2014-11-14 12:31:06 -08:00
Jingning Han 61966b1d10 Merge "Refactor vp9_update_rd_thresh_fact" 2014-10-31 08:55:28 -07:00
Jingning Han f7b46d8c5e Refactor vp9_update_rd_thresh_fact
Reduce the scope of function parameters.

Change-Id: Ifef2cfb559908a97498ffdbd6ea53da1cd45a73c
2014-10-30 11:09:40 -07:00
Hui Su 66906da066 Merge "Combine vp9_encode_block_intra and encode_block_intra" 2014-10-30 11:02:31 -07:00
Jingning Han 9349a28e80 Enable mode search threshold update in non-RD coding mode
Adaptively adjust the mode thresholds after each mode search round
to skip checking less likely selected modes. Local tests indicate
5% - 10% speed-up in speed -5 and -6. Average coding performance
loss is -1.055%.

speed -5
vidyo1 720p 1000 kbps
16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms

nik 720p 1000 kbps
33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms

speed -6
vidyo1 720p 1000 kbps
16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms

nik 720p 1000 kbps
33271 b/f, 38.433 dB,  7886 ms -> 33279 b/f, 38.416 dB, 7843 ms

Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b
2014-10-29 10:55:34 -07:00
Hui Su 0928da3b6e Combine vp9_encode_block_intra and encode_block_intra
Change-Id: I79091fb677b64892ecca2fb466fde14602d8cdfc
2014-10-28 18:57:01 -07:00
Jingning Han d56b3eb0cf Refactor encoder tile data structure
Make the common tile info as one element in the encoder tile data
struct.

Change-Id: I8c474b4ba67ee3e2c86ab164f353ff71ea9992be
2014-10-27 19:37:13 -07:00
Jingning Han eee201c221 Tile based adaptive mode search in RD loop
Make the spatially adaptive mode search in rate-distortion
optimization loop inter tile independent. Experiments suggest that
this does not significantly change the coding staticstics.

Single tile, speed 3:
pedestrian_area 1080p 1500 kbps
59192 b/f, 40.611 dB, 101689 ms

blue_sky 1080p 1500 kbps
58505 b/f, 36.347 dB, 62458 ms

mobile_cal 720p 1000 kbps
13335 b/f, 35.646 dB, 45655 ms

as compared to 4 column tiles, speed 3:
pedestrian_area 1080p 1500 kbps
59329 b/f, 40.597 dB, 101917 ms

blue_sky 1080p 1500 kbps
58712 b/f, 36.320 dB, 62693 ms

mobile_cal 720p 1000 kbps
13191 b/f, 35.485 dB, 45319 ms

Change-Id: I35c6e1e0a859fece8f4145dec28623cbc6a12325
2014-10-24 10:00:27 -07:00
Yunqing Wang 330a6b2756 Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct" 2014-10-22 17:10:39 -07:00
Yunqing Wang 7c7e4d4eb8 vp9_ethread: allocate frame contexts outside VP9_COMMON struct
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.

Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
2014-10-22 15:03:12 -07:00
Hangyu Kuang 9ce3a7d76c Implement frame parallel decode for VP9.
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.

Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-22 10:50:58 -07:00
Jingning Han 94ecfa323f Reset rate cost value in rd mode search
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.

Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
2014-10-17 09:33:59 -07:00
Jingning Han ed100c0b00 Fix an ioc issue in super_block_uvrd
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.

Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
2014-10-16 11:07:39 -07:00
Jingning Han f3a5de816d Refactor super_block_uvrd function to remove goto statement
Use return value 0/1 as indicator of the validity of the rate-
distortion cost.

Change-Id: I6244126fbf03472cebcba4f177a6cd329fae4743
2014-10-14 09:58:11 -07:00
Jingning Han 69a09a70e9 Use speed feature variable in vp9_rd_pick_inter/intra_mode
Replace repeated fetch cpi->sf with a local sf pointer.

Change-Id: I5a55bba3e1c41fbdbc6ad5f078d2fa49dd95ee67
2014-10-13 16:15:00 -07:00
Jingning Han 3bdb6bfcee Fix vp9_rd_pick_inter/intra function types
The returned value is not used anywhere, hence changing the function
type into void.

Change-Id: I0ece49ed61e7aab6df01140135503ad41d4ef4a4
2014-10-13 16:00:46 -07:00
Jingning Han 811cef97c9 Refactor rate distortion cost structure
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.

Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
2014-10-13 14:27:16 -07:00
Jingning Han a62acf3c0a Fix ActiveMapTest valgrind warning
This fixes a valgrind warning in the ActiveMapTest unit test
reported in issue 870.

Change-Id: Idf172ab0244ebefe630c3577e649bc9ba7c43d10
2014-10-11 22:36:58 -07:00
Deb Mukherjee 9a29fdbae7 Merge "Rename highbitdepth functions to use highbd prefix" 2014-10-09 15:39:56 -07:00
Deb Mukherjee 1929c9b391 Rename highbitdepth functions to use highbd prefix
Uses highbd_ prefix convention consistently.

Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-09 14:40:40 -07:00
Deb Mukherjee 3117830af3 Merge "Subpel search cleanups and enhancements" 2014-10-09 11:14:51 -07:00
Alex Converse 9ffbc31367 Merge "Move the high freq coeff check outside store_coding_context" 2014-10-09 11:12:02 -07:00
Deb Mukherjee d78dbff09a Subpel search cleanups and enhancements
- Some fixes to surface fit.
- Returns variance function as cost rather than sad in the
  pattern search and diamond search functions. Only
  vp9_pattern_search_sad function used in bigdia search
  uses sad as integer 1-away costs.
- Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.

Results:
derf [Speed 3]: About +0.036% in coding efficiency without any
discernible speed loss.
derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.

Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
2014-10-08 23:59:43 -07:00
Yunqing Wang 189566db58 Merge "Allow mode search breakout at very low prediction errors" 2014-10-08 19:58:18 -07:00
Yunqing Wang e18edd5eb6 Allow mode search breakout at very low prediction errors
In model_rd_for_sb function, the spatial domain SSE and variance
are checked to see if transform coefficients are quantized to 0.
Besides that, this patch adds another set of thresholds that are
much more strict. These thresholds are used to conduct a partition
block level check to measure if all its TX blocks are skippable
for YUV planes. If it is true, x->skip is set for this partition
block, and thus its mode search is terminated.

This speeds up the encoding at very low prediction error case,
such as screen sharing application. This patch covers what
rd_encode_breakout_test() does, so that function is removed.

Borg test at speed 3 shows:
For stdhd set, psnr: +0.008%, ssim: +0.014%;
For derf set, psnr: +0.018%, ssim: +0.025%.
No noticeable speed change.

Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
2014-10-08 17:46:22 -07:00
Jingning Han 5fcbcf1b22 Move the high freq coeff check outside store_coding_context
This fixes valgrind message issue 870.

Change-Id: Ibbc2481923a2995029ab05de30c9e8a6e9f0f9a8
2014-10-08 16:10:32 -07:00
Jingning Han 41cea46154 Use local variable in vp9_rd_pick_inter_mode_sb
Change-Id: Ie35a965a6b8de536ccaf61ff61498620d22db205
2014-10-08 16:09:47 -07:00
Jingning Han 3bbec7b422 Merge "Replace mi_width_log2() with mi_width_log2_lookup table" 2014-10-07 15:33:52 -07:00
Jingning Han 27c9577f8e Merge "Take out repeated block width/height lookup functions" 2014-10-07 15:33:45 -07:00
Yunqing Wang 0cac69f594 Merge "Fix skip_txfm issue in rdopt code" 2014-10-07 15:13:44 -07:00
Yunqing Wang a4aa14020a Fix skip_txfm issue in rdopt code
Fixed an encoder crash. Set skip_txfm to 0 for cases that skip_txfm
isn't calculated. Put memcpy of skip_txfm at right place.

Change-Id: Ib3b6afc1b251a85b2a853c8138fb3393f48cfef6
2014-10-07 12:47:43 -07:00
Jingning Han 7ee58985bd Replace mi_width_log2() with mi_width_log2_lookup table
Change-Id: If0ea98aa139d14d40cd924114e18396aff36b5a5
2014-10-07 12:45:25 -07:00
Jingning Han b66f7016c1 Take out repeated block width/height lookup functions
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.

Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
2014-10-07 12:33:07 -07:00
Deb Mukherjee cfc337aae8 Merge "Resolves some static analysis / undefined warnings" 2014-10-07 12:15:26 -07:00
Deb Mukherjee fced63ed30 Resolves some static analysis / undefined warnings
Also fixes a case of distortion becoming negative and messing
up the RDCOST computation.

Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53
2014-10-07 11:20:56 -07:00
Jingning Han a75551585b Fix eobs buffer pointer mis-use
This commit fixes a buffer pointer mis-use in store_coding_context.
The compression performance for stdhd set of speed 3 is improved by
0.097%. It fixes issue 869.

Change-Id: Idc59e22035eaf39f7133ca04174894374d647ff7
2014-10-06 15:57:13 -07:00
Jingning Han 1b8c57e915 Merge "Fix an IOC issue in vp9_rd_pick_inter_mode_sb" 2014-10-06 09:29:29 -07:00
Jingning Han 085b97aa5c Fix an IOC issue in vp9_rd_pick_inter_mode_sb
It is possible that the GOLDEN reference frame is not avaiable, in
which setting the predicted mv will be associated with a residual
value of INT_MAX. This commit checks this condition before
left shift and comparison with that of ALTREF frame, to avoid
overflow issue.

Change-Id: Ib98c3149dbdd016f2fe5beaafb13f67d469dd07c
2014-10-05 12:05:14 -07:00
Jingning Han a1088e0b5f Merge "Rework partition search skip scheme" 2014-10-03 15:23:54 -07:00
Jingning Han bb260d9076 Rework partition search skip scheme
This commit enables the encoder to skip split partition search if
the bigger block size has all non-zero quantized coefficients in low
frequency area and the total rate cost is below a certain threshold.
It logarithmatically scales the rate threshold according to the
current block size. For speed 3, the compression performance loss:
derf  -0.093%
stdhd -0.066%

Local experiments show 4% - 20% encoding speed-up for speed 3.
blue_sky_1080p, 1500 kbps
51051 b/f, 35.891 dB, 67236 ms ->
50554 b/f, 35.857 dB, 59270 ms (12% speed-up)

old_town_cross_720p, 1500 kbps
14431 b/f, 36.249 dB, 57687 ms ->
14108 b/f, 36.172 dB, 46586 ms (19% speed-up)

pedestrian_area_1080p, 1500 kbps
50812 b/f, 40.124 dB, 100439 ms ->
50755 b/f, 40.118 dB,  96549 ms (4% speed-up)

mobile_calendar_720p, 1000 kbps
10352 b/f, 35.055 dB, 51837 ms ->
10172 b/f, 35.003 dB, 44076 ms (15% speed-up)

Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
2014-10-03 11:54:30 -07:00
Deb Mukherjee 431cdc33ee Prevent negative cost for highbitdepth
Adds proper scaling for highbitdepth in a rdopt cost.

Change-Id: I066694799a7f491b830945ef1c66eb202071c355
2014-10-03 10:22:21 -07:00
Deb Mukherjee 30fbf23fda Merge "High-bitdepth bugfixes" 2014-10-01 16:47:43 -07:00
Yunqing Wang e350e3fe68 Merge "Modify block transform skipping check" 2014-10-01 16:19:56 -07:00
Deb Mukherjee a160d72522 High-bitdepth bugfixes
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.

Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
2014-10-01 14:18:11 -07:00
Yunqing Wang e4aac6bb61 Modify block transform skipping check
Block transform skipping was implemented based on DCT's energy
conservation property. Modified the thresholds using zero bin
parameters. AC and DC coefficients were checked separately to
allow better identifying of skippable blocks.

Borg test at speed 3 showed:
stdhd set: psnr gain: 0.153%, ssim gain: 0.051%;
derf set: psnr gain: 0.023%, ssim gain: 0.036%

For most test clips, the encoding speedup is 1% - 2%.
parkrun(720p): 7.5% speedup, park_joy(1080p): 3.5% speedup.

Change-Id: If28eb81113a077414f5ca7b021c14f9069b373bb
2014-10-01 12:58:09 -07:00
Jingning Han 891793a540 Conditionally skip reference frame check
For regular inter frames, if the distance from GOLDEN_FRAME is larger
than 2 and if the predicted motion vector of LAST_FRAME gives lower
sse than that of GOLDEN_FRAME, skip the GOLDE_FRAME mode checking in
the rate-distortion optimization. It provides about 5% speed-up at
expense of -0.137% and -0.230% performance down for speed 3. Local
experiment results:

pedestrian 1080p 2000 kbps
66712 b/f, 40.908 dB, 113688 ms ->
66768 b/f, 40.911 dB, 108752 ms

blue_sky 1080p 2000 kbps
51054 b/f, 35.894 dB, 70406 ms ->
51051 b/f, 35.891 dB, 67236 ms

old_town_cross 720p 1500 kbps
14412 b/f, 36.252 dB, 60690 ms ->
14431 b/f, 36.249 dB, 57346 ms

Change-Id: Idfcafe7f63da7a4896602fc60bd7093f0f0d82ca
2014-10-01 08:32:15 -07:00
Jingning Han 8b4dd536a5 Merge "Skip certain ALTREF inter modes in ARF coding" 2014-09-29 10:43:45 -07:00
Jingning Han ccdb518ff8 Skip certain ALTREF inter modes in ARF coding
This commit enables the encoder to skip checking ALTREF inter modes
in ARF coding, if the predicted motion vectors suggest that the
GOLDEN_FRAME provides higher prediction accuracy than ALTREF_FRAME.

It improves the speed 3 encoding speed by about 5%, at the expense
of compression performance loss -0.041% and -0.225% for derf and
stdhd, respectively.

pedestrian_area 1080p 2000 kbps
66705 b/f, 40.909 dB, 118738 ms ->
66732 b/f, 40.908 dB, 113688 ms

old_town_cross 720p 1500 kbps
14427 b/f, 36.256 dB, 62746 ms ->
14412 b/f, 36.252 dB, 60690 ms

blue_sky 1080p 1500 kbps
51026 b/f, 35.897 dB, 73310 ms ->
50921 b/f, 35.893 dB, 70406 ms

bus CIF 1000 kbps
21301 b/f, 34.841 dB, 7326 ms ->
21248 b/f, 34.837 dB, 7196 ms

Change-Id: I76cf88b4d655e1ee3c0cb03c8a5745493040e8d2
2014-09-26 12:53:43 -07:00
Deb Mukherjee 993d10a217 Adds various high bit-depth encode functions
Change-Id: I6f67b171022bbc8199c6d674190b57f6bab1b62f
2014-09-25 01:50:36 -07:00
Jingning Han 6989e81d61 Remove unused variable in handle_inter_mode
Change-Id: Id757d2c940756ce1b0ead2ea24af9ac0a493de05
2014-09-24 18:27:44 -07:00
Yaowu Xu 4a101310e8 Adapt mode based rd_threshold for similar block size
The rd_thresholds are adaptively changed based on best mode tested.
It was only changed for the same block size, this commit makes the
adaptation for similar block sizes too. The commit also made minor
adjustment and code cleanups.

The impact on encoding time for _ped:
118089 ms -> 111927 ms

The impact on compression:
derf:  -0.339%
stdhd: -0.303%

Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
2014-09-23 16:10:59 -07:00
Yaowu Xu 56032b471d Fix an IOC
Change-Id: I0ca6746696d81657c035b0f6523c9af370da3c95
2014-09-23 16:07:22 -07:00
Yaowu Xu 7feede9869 Merge "Remove code duplication" 2014-09-22 17:13:59 -07:00
Yaowu Xu 052bc8ea6a Merge "Simplify rd_pick_intra_sby_mode()" 2014-09-22 17:13:55 -07:00
Yaowu Xu c7ab18fe56 Remove code duplication
Change-Id: I453b3e0d946951665d5919248445fc4f3222d2ad
2014-09-22 15:22:51 -07:00
Yaowu Xu f46326c7a2 Simplify rd_pick_intra_sby_mode()
Change-Id: Ifb0915c94c2db48827ddbd446314cb6e3155b99c
2014-09-22 14:58:51 -07:00
Jingning Han f7023ea014 Remove unnecessary local variable declaration
This commit removes a repetitive local variable declaration in
vp9_rd_pick_inter_mode_sb.

Change-Id: I1b0afa98ff1ecbfb46e17d3d1cee95d32c4309db
2014-09-22 09:29:28 -07:00
Jingning Han eee904c9b9 Adaptive mode search scheduling
This commit enables an adaptive mode search order scheduling scheme
in the rate-distortion optimization. It changes the compression
performance by -0.433% and -0.420% for derf and stdhd respectively.
It provides speed improvement for speed 3:

bus CIF 1000 kbps
24590 b/f, 35.513 dB, 7864 ms ->
24696 b/f, 35.491 dB, 7408 ms (6% speed-up)

stockholm 720p 1000 kbps
8983 b/f, 35.078 dB, 65698 ms ->
8962 b/f, 35.054 dB, 60298 ms (8%)

old_town_cross 720p 1000 kbps
11804 b/f, 35.666 dB, 62492 ms ->
11778 b/f, 35.609 dB, 56040 ms (10%)

blue_sky 1080p 1500 kbps
57173 b/f, 36.179 dB, 77879 ms ->
57199 b/f, 36.131 dB, 69821 ms (10%)

pedestrian_area 1080p 2000 kbps
74241 b/f, 41.105 dB, 144031 ms ->
74271 b/f, 41.091 dB, 133614 ms (8%)

Change-Id: Iaad28cbc99399030fc5f9951eb5aa7fa633f320e
2014-09-22 09:28:16 -07:00
hkuang c70cea97ac Remove mi_grid_* structures.
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.

This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.

Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
2014-09-19 21:27:11 -07:00
Deb Mukherjee 5cd0aab81a Adds high bitdepth quantization functions
Adds various high bitdepth quantization functions.

Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c
2014-09-16 14:55:37 -07:00
Jingning Han ffaebfc7b4 Merge "Add ARF validation for compound inter mode check" 2014-09-15 21:26:37 -07:00
Jingning Han c50256c157 Merge "Remove redundant reference frame check in sub8x8 RD search" 2014-09-15 21:26:11 -07:00
Jingning Han fe96932c69 Merge "Replace best_ref_index table fetch with best_mbmode" 2014-09-15 21:25:48 -07:00
Yunqing Wang 57eb2a4e83 Merge "Simplify the skip flag cost code" 2014-09-15 18:50:30 -07:00
Yunqing Wang c60ef810a1 Merge "Set the skip flag to 1 for skippable blocks" 2014-09-15 18:50:19 -07:00
Yunqing Wang 200ec69abb Simplify the skip flag cost code
Code refactoring.

Change-Id: Idad53cb80497d13551a142a642f7529fc305b0bc
2014-09-15 17:11:16 -07:00
Yunqing Wang 46aed7b8d0 Set the skip flag to 1 for skippable blocks
If the partition block is skippable, which means no coefficients
for Y, U, and V planes, its skip flag is set to 1. No quality
change (verified by borg tests), and no noticeable speed change.

Change-Id: I9231f720f8dd6364384cf05aa148ca24d75450f1
2014-09-15 16:50:19 -07:00
Jingning Han f897dd5f09 Merge "Fix format in vp9_rd_pick_inter_mode_sub8x8" 2014-09-15 15:34:22 -07:00
Jingning Han f1581b3b2e Add ARF validation for compound inter mode check
This commit enforces ARF validation check for compound inter modes.
It avoids potential access to ARF in the encoding process if it
is not allowed.

Change-Id: I055fec946b5d19d97937dc9001e1e564923e2439
2014-09-15 12:20:57 -07:00
Jingning Han 252822e81c Remove redundant reference frame check in sub8x8 RD search
The valid reference frame check in sub8x8 rate-distortion
optimization search has been included in the ref_frame_skip_mask
scheme. This commit removes the later further validation checks
that are not in effect.

Change-Id: I853b477c44037d3dc0afec6cbfce08a96c597a75
2014-09-15 12:20:04 -07:00
Jingning Han cc00eea676 Replace best_ref_index table fetch with best_mbmode
This commit replaces the best_ref_index table fetch with the use
of best_mbmode in vp9_rd_pick_inter_mode_sub8x8.

Change-Id: I882ee9ee6a8c0e61befcca1f4dba6d2ea8de8f13
2014-09-15 09:59:20 -07:00
Jingning Han 73805bfa70 Fix format in vp9_rd_pick_inter_mode_sub8x8
Change-Id: I9b6a74bdf003b39235f14f8b5b7f3b861f6bf131
2014-09-15 09:44:09 -07:00
Jingning Han 59dd83a3ea Merge "Refactor reference frame control in sub8x8 block RD search" 2014-09-13 10:43:36 -07:00
Jingning Han e6d927343e Merge "Format fixes in vp9_rd_pick_inter_mode_sb" 2014-09-13 10:43:24 -07:00
Jingning Han ad3c92b9b7 Merge "Remove unused best_inter_rd variable" 2014-09-13 10:43:14 -07:00
Jingning Han f02e0b6cf6 Merge "Remove unused speed feature" 2014-09-13 10:43:03 -07:00
Jingning Han adb20849b6 Refactor reference frame control in sub8x8 block RD search
This commit unifies the reference frame control in the rate-
distortion optimization search loop of sub8x8 block size to remove
the control dependency on mode search order.

Change-Id: I3a174099f71a7cc176ede9fd60e2374243ae9232
2014-09-12 11:03:03 -07:00
Jingning Han 7f77a1c3c9 Merge "Unify intra mode mask into mode_skip_mask scheme" 2014-09-12 09:06:35 -07:00
Deb Mukherjee 10783d4f3a Adds high bitdepth transform functions and tests
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.

Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-09-11 19:56:33 -07:00
Jingning Han 74ddde01c0 Format fixes in vp9_rd_pick_inter_mode_sb
Change-Id: Ie45687405dcaa34ba465dce2aa14f76017d3a794
2014-09-11 17:15:15 -07:00
Jingning Han 8e3f7a52a1 Remove unused best_inter_rd variable
The variable best_inter_rd is effectively not in use in the rate-
distortion mode search loops of both regular block sizes and sub8x8
block sizes.

Change-Id: I178f909f8c9629772e13adc6257908653b2adf31
2014-09-11 16:16:26 -07:00