CN110035285B

CN110035285B - Depth prediction method based on motion vector sensitivity

Info

Publication number: CN110035285B
Application number: CN201910313621.8A
Authority: CN
Inventors: 张昊; 李�诚; 周搏; 王剑光; 牟凡; 马学睿; 杜忠泽; 江滔
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2023-01-06
Anticipated expiration: 2039-04-18
Also published as: CN110035285A

Abstract

The invention provides a depth prediction method based on motion vector sensitivity. Compared with the original x265, the method of the invention has the advantages that the coding time can be reduced by 17.56% compared with the x265 algorithm, the BDBR is only increased by about 1.75%, the BDPSNR is reduced by 0.05dB, the method of the invention optimizes the process of selecting the prediction unit mode on the premise of not reducing the video quality, and the coding speed is effectively improved. The complexity of a coding algorithm is reduced, the video coding speed is greatly improved with smaller quality loss, and the method has good practicability in the field of video coding.

Description

Depth prediction method based on motion vector sensitivity

Technical Field

The invention belongs to the technical field of video coding and decoding, and particularly relates to a depth prediction method based on motion vector sensitivity.

Background

If the redundant information can be eliminated, the data volume of the video signal can be greatly reduced, and further the compression of the video data is realized. There is a correlation, i.e. spatial redundancy, between the object and the background of the same image. In the same scene, the texture and content of adjacent video images are less variable, and the similarity between the images is called temporal redundancy. In the information theory, a specific bit number is allocated according to the information entropy of a pixel, so that the information amount carried by the pixel is represented, however, in an actual image, the information entropy of each pixel point is difficult to obtain intuitively, all the pixel points are generally represented by equal bit numbers, for example, an 8-bit image, namely, each pixel point is represented by 8bit numbers, and for a position with a smaller pixel value, the representation method has information entropy redundancy.

Video coding means to convert a file of a video signal into another file format by some compression means, so that the bandwidth is reduced and the video signal is transmitted efficiently during signal transmission. Since the amount of information carried by the raw video signal is large, and the requirements on the actual transmission and storage system are high, the raw video signal must be compressed and processed before it can appear in the actual life of people.

High Efficiency Video Coding (HEVC) is a new Video compression standard. HEVC is superior to h.264 in performance, and the compression rate of HEVC can reach 2 times that of h.264 under the same video quality. After videos such as movies and animation films are compressed by HEVC (high efficiency video coding), not only is the flow consumption greatly reduced when a mobile phone user watches online videos, but also the downloading speed is faster, the image quality is basically not influenced, the online watching is smoother, and the mobile phone user is not easy to get stuck. In the HEVC Coding standard, in order to increase the compression ratio, an input image is first divided into image blocks of predefined size, called Coding Tree Units (CTUs) for short. Each CTU may consist of several Coding Units (CU), a CU having 8 × 8, 16 × 16, 32 × 32, 64 × 64 luma samples and corresponding chroma samples, and color being represented by both luma and chroma. A CU may also be decomposed into smaller Prediction Units (PUs) and Transform Units (TUs) to allow better processing for coding, prediction, and transformation. The time consumption of HEVC mode selection accounts for 60-70% of the overall coding time. The intra mode selection direction of HEVC is extended from 9 modes of H.264/AVC to 35. Meanwhile, HEVC also introduces Advanced Motion Vector Prediction (AMVP), a Merge mode, a Skip mode based on the Merge mode, and a Sample Adaptive Offset (SAO) technology. In the HEVC coding standard, a coding tree unit may be composed of several layers of coding units, and each layer of coding units may be divided into multiple coding sub-units. Therefore, up to 85 coding units need to be traversed when performing mode selection. Each coding unit has a plurality of inter-frame and intra-frame prediction modes, and each mode needs to be transformed and quantized, and the rate out-of-true cost of each operation is calculated, which is very computationally intensive. The video coding is primarily to reduce the number of bits in a unit consumable and to extract redundant information. Currently, video coding researchers mainly use three high performance models: network out-of-true mode, global-based motion estimation method and high-order block matching method.

The measure taken by the network untrue method is to match the network, which divides the predicted frame into small grid sets and then untrue the image by moving the grid check points. In this way, more advanced matching accuracy can be achieved, and also no blocking effects occur. Moreover, the motion vectors required for such a network and transmission one-to-one approach are relatively small. However, not only does the network control point operate on the network, the relevant network will be affected as soon as it becomes a checkpoint and changes occur. That is, the motion vectors between adjacent networks are continuous, so it is necessary to match multiple related networks at the same time to evaluate the motion. Determining the optimal control point becomes a necessary condition, which is complicated.

Global-based motion estimation methods usually target the entire image, sometimes expecting large images in some areas, but each area does not get too small. The global-based motion estimation method uses a top-level model of each frame, and is limited in application because good effects can be obtained only in slow-frame video.

The high-order block matching measurement method breaks through the previous limitation, a higher motion model is used, similar conversion is imitated, motion estimation is not performed by using translation blocking any more, and the high-order model describes motion in more detail, so that the matching accuracy is generally higher than that of block matching, and a small residual error is probably generated. When motion estimation is carried out, a plurality of check points are needed to transmit some motion vectors, the process is very complex, the calculation amount generally increases exponentially, and therefore matching of high-order blocks which cost more algorithms and are extremely difficult is hardly considered in real-time video coding. This method does not achieve better results.

Although the prior art can bring good effects on improving the compression ratio and the video quality, the complexity of an HEVC (high efficiency video coding) encoder is increased, and the practical difficulty is increased.

Disclosure of Invention

In order to solve the problems in the prior art of HEVC coding, an embodiment of the present invention provides a depth prediction method based on motion vector sensitivity.

In order to achieve the above purpose, one of the embodiments of the present invention adopts the following technical solutions:

the depth prediction method based on the motion vector sensitivity comprises the following steps:

(1) Defining variables skipModes, a 2 multiplied by 2 matrix variable mvSub, variables mvVar, mvSenNum, mvTotalNum and bMVSensitive;

(2) Entering Skip and Merge modes;

(3) Obtaining a BestMode value and a cbf value after transformation and quantization, and judging whether the best mode exists, the cbf coefficient is 0 and the flag bit of the early skip mode is true:

if yes, assigning the skipModes to true, and then entering the step (4); if not, directly entering the step (4);

(4) Entering a 2 Nx 2N mode, and enabling subPartIdx to be 0;

(5) Judging whether the subPartldx is less than 4:

if yes, entering step (6); if not, entering the step (8);

(6) Carrying out an optimal PU mode selection process of the sub-CU according to the subPartldx, obtaining rate distortion cost SubPel _ cost, optimal rate distortion cost Best _ cost and MV of 1/2 pixel of the optimal mode of the CU, and judging whether the ratio of the Subpel _ cost/Best _ cost is less than 0.8:

if yes, let bMVSensitive be true and add 2 to MvSenNum; if not, entering the step (7);

(7) Calculating the average value of the forward MV and the backward MV of each pixel of the operator CU, respectively storing the average value into mvSub [0] [ SubPartldx ] and MVSub [1] [ SubPartldx ], sequentially adding 2 to MvTotalNum and 1 to SubPartldx, and entering the step (5);

(8) Calculating the standard deviation var1 of the forward MV and the standard deviation var2 of the backward MV according to the mvSub [2] [4], and judging whether the standard deviations satisfy the mvVar >1 and the mvSenNum/MvTotalNum >0.5:

if yes, enabling the skipModes to be true; if not, judging whether the skipModes are true:

if yes, ending; if not, the symmetrical division and the asymmetrical division modes are continued to be carried out, and then the operation is finished.

Preferably, the value of the skipModes initial value is false.

Preferably, the mvSub is used for calculating the standard deviation of the MV in the front and back directions, and the initial value is assigned to 0.

Preferably, the mvVar is the sum of the standard deviation var1 of the forward MV and the standard deviation var2 of the backward MV.

Preferably, the initial value of MvSenNum is assigned to 0.

Preferably, the MvTotalNum initial value is assigned to 0.

Preferably, the initial value of bMVSensitive is assigned to false.

Preferably, the subPartIdx is a sub-sequence number of the CU.

Preferably, the BestMode is a current CU best mode.

Preferably, the cbf is a current CU all-zero block flag value.

The MV is a motion vector.

In the depth ratio of the video sequence, the depth of 3 accounts for approximately 50% of the encoding process. Significant optimization at the encoder can be achieved if it can be accurately predicted during the encoding process that some coding units avoid continuing to divide to depth 3.

In the encoding process of the HEVC encoder, integer pixel motion estimation is performed first, and then sub-pixel motion estimation is performed, and the sub-pixel motion estimation is divided into the processes of 1/2 pixel motion estimation and 1/4 pixel motion estimation, and in the process of performing motion estimation, the encoder needs to perform rate distortion loss calculation for each pixel, and the calculation process needs to spend a large amount of encoding time. The motion estimation process of the integer pixel affects the motion estimation process of the sub-pixel.

Aiming at the problem that a threshold value is difficult to select in a depth prediction method based on a motion vector, the method of one embodiment of the invention introduces a new motion vector sensitivity parameter bMVSensitive which mainly expresses whether a tiny transformation of each motion vector causes a large prediction error. It was tested to assign a value to this parameter with a ratio of 1/2 of the prediction error to the optimal prediction error.

The purpose of introducing the motion vector sensitivity parameter bmvsesitive is to make a better choice among 3 coding prediction modes and corresponding depth partitions for each CU without affecting the video coding quality. After the bMVSensitive is introduced, the threshold setting in the motion vector depth prediction method can be more accurate, that is, when the sensitivity of the motion vector is higher, it indicates that even if the difference between the motion vectors of 4 CU subblocks is small, the optimal partition calculation based on RDO may also select to continue the partition, so the threshold can be adjusted correspondingly. If a threshold condition based on the sensitivity of the motion vector is satisfied, subsequent computations are skipped after the quadtree is computed, thus saving a large amount of encoding time.

PU mode selection is made for all CUs at all depths. There are 3 PU modes for HEVC Inter coded motion information coding, namely Skip, merge and Inter, where Skip and Merge modes have only 2N × 2N partition, and Inter modes include 2N × 2N, 2N × N, N × 2N, N × N, 2N × nU, 2N × nD, nL × 2N, and nR × 2N partitions, and the N × N mode has only 8 × 8 CU. The information that Skip needs to be transmitted is least, and the coding efficiency is highest, so when the PU mode decision is made, whether the PU accords with the Skip mode or not needs to be considered, and then the Merge mode and the Inter mode need to be considered. In the inter-frame prediction process, the probability that the Merge/Skip mode is selected as the final mode is high, so that the inter-frame prediction of a certain number of modes, such as four asymmetric partitions and two symmetric partitions of a prediction unit, is redundant, which is the inventive concept of one embodiment of the present invention.

The method provided by the embodiment of the invention is based on a mechanism that HEVC has various CU block partitions, and aims to enable a video scene to be more accurately expressed.

The algorithm starts and ends from the texture complexity of the picture, and directly skips deeper CU hierarchical division parts for the region with simple texture complexity according to the characteristics of the picture, thereby achieving the purpose of reducing the division complexity of the coding block.

The CU depth division process and the motion estimation process have a large proportion of the coding time in the coding process, and the concept of the method is based on the point, so that the depth division in the motion vector process is accurately estimated, and the coding efficiency is effectively improved.

On the basis of researching HEVC coding standard inter-frame prediction, the patent provides a depth prediction optimization algorithm based on motion vector sensitivity, and the depth prediction optimization algorithm is used for reducing the calculated amount and the algorithm complexity in inter-frame prediction.

The embodiment of the invention has the advantages of

Compared with the original x265, the method of the invention has the advantages that the coding time can be reduced by 17.56% compared with the x265 algorithm, the BDBR is only increased by about 1.75%, the BDPSNR is reduced by 0.05dB, the method of the invention optimizes the process of selecting the prediction unit mode on the premise of not reducing the video quality, and the coding speed is effectively improved.

The complexity of the coding algorithm is reduced, the video coding speed is greatly improved with less quality loss, and the method has good practicability in the field of video coding.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a graph of the rate distortion of the blowingbubbes sequence.

Fig. 3 is a rate-distortion plot of the parylene sequence.

Figure 4 is a rate-distortion plot of the Soccer sequence.

Fig. 5 is a rate-distortion plot of a Johnny sequence.

Detailed Description

The following are specific examples of the present invention, and the technical solutions of the present invention will be further described with reference to the examples, but the present invention is not limited to the examples.

Example 1

The process of the depth prediction method based on the sensitivity of the motion vector is shown in fig. 1, and the steps include:

(1) Defining variables skipModes, 2 multiplied by 2 matrix variables mvSub, variables mvVar, mvSenNum, mvTotalNum and bMVSensitive;

(2) Entering Skip and Merge modes;

(4) Entering a 2 Nx 2N mode, and making subPartIdx be 0;

(5) Judging whether the subPartldx is less than 4:

if yes, entering step (6); if not, entering the step (8);

(7) Calculating the average value of the forward direction MV and the backward direction MV of each pixel of the sub CU, respectively storing the average value into mvSub [0] [ SubPartldx ] and MVsub [1] [ SubPartldx ], sequentially adding 2 to MvTotalNum and 1 to SubPartldx, and entering the step (5);

Wherein, the value of the initial value of the skipModes is false. The mvSub is used for calculating the standard deviation of the MVs in the front and back directions, and the initial value is assigned to be 0. The mvVar is the sum of the standard deviation var1 of the forward MV and the standard deviation var2 of the backward MV. The initial value of MvSenNum is assigned to 0. The initial value of MvTotalNum is assigned to 0. The initial value of bMVSensitive is assigned to false. subPartIdx is the sub-sequence number of the CU. BestMode is the current CU best mode. cbf is the current CU all-zero block flag bit value. MV is a motion vector.

In step (1), before mode selection, a motion vector submatrix mvSub of 2 rows and 4 columns is defined, the submatrix is used for storing the magnitude values of the forward and backward motion vectors MV of 4 sub-CUs of one CU, and all initial values are assigned to 0, and the submatrix is used in the subsequent discrimination step.

And defining bMVSensitive as a basis for judging whether to continue the depth division of the CU, namely defining a corresponding flag bit of the motion vector sensitivity parameter, wherein the initial value is false.

Skip modes are defined to indicate Skip mode, and the initial value is false.

Skip and Merge modes are firstly carried out before mode selection is carried out, namely, an initial mode is firstly selected, and initial values of relevant parameters are calculated to provide basis for subsequent mode selection and depth division.

In step (3), cbf is 0 and the flag bit EnableEarlySkip of the early Skip mode is true, which means that the Skip mode should be selected, that is, the current image pixel change is flat, a larger CU and a simpler Skip mode can be directly used for predictive coding, and at this time, skip modes are changed to true.

In step (5), in HEVC, the depth partition of a CU is divided according to a quadtree structure, which means that one CU can be divided into 4 sub-units, i.e., sub-CUs, and the sub-CUs can be further divided, and according to the depth degree (i.e., block size) of the partition, there are four sizes of 64 × 64, 32 × 32, 16 × 16, and 8 × 8 in total, and the smaller the sub-CU is, the deeper the partition depth is. For 4 sub-CUs divided by each CU, the numbers of the sub-CUs are sequentially 1,2,3 and 4, namely the sub-CU serial numbers subPartldx. At this time, whether subPartldx is smaller than 4 is judged, if so, the step (6) is carried out, if not, 4 is shown, the CU is divided in the layer, at this time, the mvSub [2] [4] is utilized to calculate the standard deviations var1 and var2 of the forward and backward Motion Vectors (MV) respectively, and calculate mvVar = var1+ var2, the size of mvVar represents the difference of 4 sub-CUs (i.e., the difference degree of each sub-CU is judged by using the parameter of MV standard deviation), and then the ratio mvSenNum/MvTotalNum of the parameters mvSenNum and MvTotalNum, i.e., the ratio of the total number of MVs with the sensitivity parameter reaching the threshold to the total number of all MVs, i.e., the motion vector parameter, is calculated. At this time, it is determined whether both mvVar greater than 1 and mvSenNum/MvTotalNum greater than 0.5 are satisfied. The mvVar greater than 1 represents that the standard deviation is too large, which means that the MV difference of the 4 sub-CUs is large, that is, the four sub-CUs may not belong to the same object, so the motion states are different, and therefore it is considered that it is better to directly divide the sub-CUs without performing the rest of the modes. The fact that the ratio of the number of MVs with the sensitivity parameter reaching the threshold value among all the MVs of the 4 sub-CUs is greater than 0.5 means that the requirement of the number ratio of the MVs with the sensitivity parameter reaching the threshold value is met. And if the two conditions are simultaneously met, judging whether the skipModes are true, if so, indicating that the residual mode does not need to be continued, and directly ending the program. If the skipModes is not true, then the symmetric split and asymmetric split modes are followed, and the procedure ends.

subpartdx is smaller than 4, which indicates that the partitioning of the CU at the layer is still in progress, and the optimal PU mode selection process of the sub-CU is performed according to subpartdx at this time. Obtaining a rate-distortion cost parameter SubPel _ cost, an optimal rate-distortion cost Best _ cost and MV of 1/2 pixel of the CU optimal mode, performing SubPartldx self-addition 1 after finishing, and returning to a judgment sentence SubPartldx <4? In (1). And simultaneously judging whether the ratio of the sub _ cost/the Best _ cost is less than 0.8, if so, representing that the rate distortion cost of the sub-CU in the mode is low, reaching the threshold of the motion vector sensitivity parameter, making bMVSensitive be true, adding 2 to MvSenNum, representing that the number of MVs reaching the sensitivity threshold is increased by 2, calculating the average value of the forward and backward MV sizes of each pixel of the sub-CU, and respectively storing the average value into mvSub [0] [ SubPardx ] and MVSubPartlx [1] [ SubPartldx ], namely the forward and backward MVs of the corresponding sub-CU under the sub-CU number. Then, adding 2 by itself to MvTotalNum, namely adding 2 to the total number of the MVs, returning to subPartldx adding 1 in the flow, and then judging whether the subPartldx is smaller than 4; if the ratio of the sub _ cost/Best _ cost is greater than 0.8, it indicates that the rate distortion cost of the sub-CU in the mode is high, and the threshold of the motion vector sensitivity parameter is not reached. Then, the calculation operator CU directly enters the step of respectively storing the average values of the front and back MV sizes of each pixel into mvSub [0] [ subpartdx ] and MVsub [1] [ subPartldx ], then self-adding 2 to MvTotalNum, returning to self-adding 1 to Subpartx in the process, and then judging whether the subpartdx is smaller than 4.

Example 2

In this example, the method of embodiment 1 is adopted, the 2.2 version of the x265 encoder and the vs2013 tool are used for performing an experiment, and a DELL Vostro 3900 desktop is used for performing a test on experimental data in a test environment, and the test environment is configured as an Inter Core i7-7700cpu, a 8g memory, a 64-bit operating system, and a 1T hard disk. The coding parameters used in this example are consistent with the standard test parameters published by x 265. 4 QPs are set, 22, 27, 32, 37 respectively. The test sequences used were standard test sequences, as shown in table 1.

TABLE 1 test sequence schematic table

The results of the encoding performance of the method of embodiment 1 of the present invention and version 2.2 of x265 are shown in table 2.

Table 2 comparison of the coding performance of the method of example 1 with version 2.2 of x265

From table 2, compared with the original x265, the method of the present invention can reduce the encoder coding time by 17.56% compared with the x265 algorithm, while the BDBR is only increased by about 1.75%, and the BDPSNR is reduced by 0.05dB, so that the method of the present invention optimizes the process of predicting unit mode selection and effectively improves the coding speed on the premise of not reducing the video quality.

In order to verify the effectiveness of the method on objective data, blowingBubbels, partyScene, soccer and Johnny are randomly selected to draw a rate-distortion curve graph as shown in figures 1-4, wherein the abscissa and the ordinate respectively represent the bit rate and the PSNR before and after the rapid algorithm is added, the line with diamond data points represents the rate-distortion curve processed by the method, and the line with triangle data points represents the rate-distortion curve not processed by the method. It can be seen that rate-distortion curves displayed by the HEVC standard coding algorithm and the algorithm optimized by the method of the present invention are almost the same, which indicates that the method of the present invention has negligible influence on the rate-distortion performance of the image.

The result shows that after the motion vector sensitivity parameter is introduced, the threshold setting in the motion vector depth prediction method can be more accurate, that is, when the motion vector sensitivity is higher, it shows that even if the motion vector difference of the 4 CU sub-blocks is very small, the optimal partition calculation based on Rate Distortion Optimization (RDO) is also possible to select to continue the partition, so the threshold can be correspondingly adjusted. If a threshold condition based on the sensitivity of the motion vector is met, subsequent computations are skipped after the quadtree is computed. The overall performance test shows that the method reduces the algorithm complexity of inter-frame prediction and realizes great improvement of video coding speed with less quality loss.

Claims

1. The depth prediction method based on the motion vector sensitivity is characterized by comprising the following steps of:

wherein, skipModes represents Skip mode, and the initial value is false;

the mvSub is used for calculating the standard deviation of the MVs in the front and back directions, the initial value is assigned to be 0, and the MV is a motion vector;

the mvVar is the sum of the standard deviation var1 of the forward MV and the standard deviation var2 of the backward MV;

MvSenNum represents the total number of MVs with the sensitivity parameter reaching the threshold value, and the initial value is assigned to be 0;

MvTotalNum represents the total number of all MVs, and the initial value is assigned to be 0;

the bMVSensitive is a basis for judging whether to continue the CU depth division, is a corresponding flag bit of a motion vector sensitivity parameter, and has an initial value of false;

(2) Entering Skip and Merge modes;

wherein BestMode is the current CU optimal mode;

cbf is the flag bit value of the all-zero block of the current CU;

(4) Entering a 2 Nx 2N mode, and enabling subPartIdx to be 0 and subPartIdx to be the sub serial number of the CU;

(5) Judging whether the subPartldx is less than 4:

if yes, entering step (6); if not, entering the step (8);

(6) Carrying out the optimal PU mode selection process of the sub-CU according to the subpoltdx, obtaining the rate distortion cost SubPel _ cost, the optimal rate distortion cost Best _ cost and the MV of 1/2 pixel of the optimal mode of the CU, and judging whether the ratio of the SubPel _ cost/Best _ cost is less than 0.8:

(8) Calculating the standard deviation var1 of the forward MV and the standard deviation var2 of the backward MV according to the mvSub [2] [4], and judging whether the standard deviation satisfies the mvVar >1 and the mvSenNum/MvTotalNum >0.5: