CN116193126A

CN116193126A - Video coding method and device

Info

Publication number: CN116193126A
Application number: CN202310168381.3A
Authority: CN
Inventors: 梁俊辉; 苏文艺; 叶天晓
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-05-30

Abstract

The application provides a video coding method and a video coding device, wherein the video coding method comprises the following steps: selecting candidate reference frames from a set number of initial reference frames according to at least one reference dimension, wherein the initial reference frames are reference frames corresponding to a current coding frame in a video to be coded; establishing a corresponding motion vector candidate list for each candidate reference frame; screening target reference frames from the candidate reference frames based on a motion vector candidate list of the candidate reference frames; and encoding the current encoding frame according to the target reference frame. Therefore, firstly, the initial reference frames with low reference possibility are pruned through simpler features to obtain candidate reference frames, secondly, a motion vector candidate list of each candidate reference frame of the current coding frame is established, a target reference frame which can be referred in coding is screened out, and then the current coding frame is coded, so that the number of the candidate reference frames is reduced, and the computing resource is greatly saved.

Description

Video coding method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video encoding method. The present application is also directed to a video encoding apparatus, a computing device, and a computer-readable storage medium.

Background

With rapid development of computer and internet technologies, various video layers often need to encode video in a video transmission process, and video encoding is a technology for reducing the data volume of a video file by compressing video, for example, inter-frame encoding is a technology for performing compression encoding on a current video frame by utilizing correlation between video image frames.

The AV1 (AOMedia Video 1, a Video standard developed by the open media alliance) standard specifies that there may be 7 reference frames in the reference frame list of the current frame at the time of inter-prediction, and that the current frame uses at most 2 of them as the final reference frame. The existing encoder generally uses the rate distortion theory to determine the optimal reference frame combination of the current frame, namely, traverses all possible reference frame combinations and calculates the rate distortion cost of each combination, selects the combination with the minimum rate distortion cost as the optimal reference frame of the current frame to encode the current frame, and consumes a large amount of calculation for traversing all possible reference frame combinations.

Disclosure of Invention

In view of this, embodiments of the present application provide a video encoding method. The application relates to a video coding device, a computing device and a computer readable storage medium, so as to solve the technical problem that the screening of the reference frame with the optimal current frame needs to consume a large amount of calculation when video coding is performed in the prior art.

According to a first aspect of an embodiment of the present application, there is provided a video encoding method, including:

selecting candidate reference frames from a set number of initial reference frames according to at least one reference dimension, wherein the initial reference frames are reference frames corresponding to a current coding frame in a video to be coded;

for each candidate reference frame, a corresponding motion vector candidate list is established, wherein the motion vector candidate list at least comprises: a motion vector prediction list, motion vectors of coded blocks around the current coding block and motion vectors of the current coding block in a pre-analysis process;

screening target reference frames from the candidate reference frames based on a motion vector candidate list of the candidate reference frames;

and encoding the current encoding frame according to the target reference frame.

According to a second aspect of embodiments of the present application, there is provided a video encoding apparatus, including:

the selecting module is configured to select candidate reference frames from a set number of initial reference frames according to at least one reference dimension, wherein the initial reference frames are reference frames corresponding to a current coding frame in a video to be coded;

a building module configured to build, for each candidate reference frame, a corresponding motion vector candidate list, wherein the motion vector candidate list at least comprises: a motion vector prediction list, motion vectors of coded blocks around the current coding block and motion vectors of the current coding block in a pre-analysis process;

A screening module configured to screen a target reference frame from each candidate reference frame based on a motion vector candidate list of each candidate reference frame;

and the encoding module is configured to encode the current encoding frame according to the target reference frame.

According to a third aspect of embodiments of the present application, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the method of:

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of any of the video encoding methods.

According to the video coding method provided by the embodiment of the application, candidate reference frames can be selected from a set number of initial reference frames according to at least one reference dimension, wherein the initial reference frames are reference frames corresponding to a current coding frame in a video to be coded; for each candidate reference frame, a corresponding motion vector candidate list is established, wherein the motion vector candidate list at least comprises: a motion vector prediction list, motion vectors of coded blocks around the current coding block and motion vectors of the current coding block in a pre-analysis process; screening target reference frames from the candidate reference frames based on a motion vector candidate list of the candidate reference frames; and encoding the current encoding frame according to the target reference frame.

In this case, for a currently encoded frame of a video to be encoded, a portion may be adaptively selected as a candidate reference frame from a plurality of initial reference frames specified by a video standard according to at least one reference dimension at the time of encoding; then, a motion vector candidate list of each candidate reference frame is established, other motion vectors such as the motion vector of the coded block around the current coding block and the motion vector of the current coding block in the pre-analysis process are added besides the motion vector prediction list specified by the video standard, the target reference frame is screened from the motion vector candidate list of each candidate reference frame based on the motion vector candidate list of each candidate reference frame, and the current coding frame is coded. Therefore, firstly, the initial reference frames with lower reference possibility are pruned through simpler features to obtain candidate reference frames, secondly, a motion vector candidate list of each candidate reference frame of the current coding frame is established, the target reference frames which can be referred in coding are screened out, and then traversing is carried out to code the current coding frame, so that the number of the candidate reference frames is reduced, the calculation amount required by traversing all possible reference frame combinations is reduced, and the calculation resources are greatly saved.

Drawings

Fig. 1 is a flowchart of a video encoding method according to an embodiment of the present application;

FIG. 2 is a flowchart of another video encoding method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a video encoding device according to an embodiment of the present application;

FIG. 4 is a block diagram of a computing device according to one embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present application will be explained.

AV1: AOMedia Video 1, a Video standard developed by the open media alliance.

Reference frame: the image referenced by the coded block is coded in the video coding process.

MVP: motion Vector Prediction, a motion vector prediction list.

MVC: motion Vector Candidate, a motion vector candidate list.

When the AV1 standard specifies inter-frame prediction, 7 reference frames may be included in the reference frame list of the current frame, and at most 2 of the reference frames are used as final reference frames in the current frame. The encoder typically uses the rate-distortion theory to determine the reference frame combination that is optimal for the current frame, i.e., traverses all possible reference frame combinations and calculates the rate-distortion cost for each combination, selecting the combination with the smallest rate-distortion cost as the reference frame that is optimal for the current frame.

Because the rate distortion cost of each combination is often calculated with a large amount of computation, the encoder typically uses a series of methods to prune specific reference frames, thereby reducing the number of reference frame combinations that need to be traversed and achieving the goal of increasing the encoding speed.

In a possible implementation manner, in the standard open source codec libacom of AV1, the reference frame result of square divided encoded blocks is utilized to prune the reference frames of other divided encoded blocks, and the encoding time is saved by reducing the number of reference frames of non-square encoded blocks.

In another possible implementation manner, in the open source encoder SVT-AV1, the total calculation amount is reduced by calculating the reference block rate distortion cost size corresponding to the coding block and each reference frame MVP, and pruning the reference frame with the larger rate distortion cost.

However, the encoder determines the optimal reference frame combination for the current frame requires a significant amount of computation. If all possible reference frame combinations are traversed, the most computationally intensive is achieved, although better coding performance is achieved. Analysis of the above-mentioned reference frame pruning method in libacom shows that it cannot reduce the amount of computation required for determining the optimal reference frame by the square coding block. For the method in the SVT-AV1, only MVP is used to calculate the rate distortion cost of each reference frame, and when the MVP and the true motion vector have a large difference, the method will bring a large coding performance loss. In summary, the existing pruning reference frame method often has a certain limitation, and can not ensure that the coding speed is increased faster while the coding performance is better.

Therefore, an embodiment of the present application provides a video encoding method, which can prune a reference frame, and increase the encoding speed as much as possible without losing more encoding performance. Specifically, firstly, pruning reference frames with lower reference possibility through simpler features, secondly, establishing MVCs of all reference frames of a current coding block, and pruning reference frames with higher cost by evaluating the rate distortion cost of each MVC. Through the scheme, the method and the device can accelerate the coding speed by 13.2% under the condition of only increasing the coding rate by 0.25%.

In the present application, a video encoding method is provided, and the present application relates to a video encoding apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 shows a flowchart of a video encoding method according to an embodiment of the present application, which specifically includes the following steps:

step 102: and selecting candidate reference frames from a set number of initial reference frames according to at least one reference dimension, wherein the initial reference frames are reference frames corresponding to a current coding frame in the video to be coded.

It should be noted that, in different video standards, when inter-frame prediction is performed, the number of initial reference frames in the reference frame list of the current frame is different, for example, when inter-frame prediction is specified in the AV1 standard, 7 initial reference frames may be in the reference frame list of the current frame; when inter prediction is specified in the H264 standard, there may be 15 initial reference frames in the reference frame list of the current frame. That is, the set value is the number of initial reference frames in the reference frame list of the current frame specified by the video standard, where the initial reference frames are reference frames corresponding to the current encoded frame in the video to be encoded under the video standard, for example, the initial reference frames in the AV1 standard are 7 specified reference frames.

In practical applications, no matter how many initial reference frames are in the reference frame list of the current frame, the current frame uses at most 2 of the initial reference frames as final reference frames, so that it is necessary to traverse all possible reference frame combinations and calculate rate-distortion costs of each combination, and select a combination with the minimum rate-distortion cost as the reference frame with the optimal current frame. In the embodiment of the present application, in order to save calculation amount, candidate reference frames may be selected from a set number of initial reference frames according to at least one reference dimension, then a target reference frame is selected from the candidate reference frames, and all possible reference frame combinations are traversed for the selected target reference frame, so that an optimal coding reference frame is selected to code a current coding frame, so that the number of reference frame combinations that need to be traversed is greatly saved, calculation amount is saved, and amount coding efficiency is improved.

Wherein the at least one reference dimension is a dimension of pruning reference frames with lower reference likelihood by simpler features, such as a distance dimension, a time domain dimension, a quality dimension, etc.

In an optional implementation manner of this embodiment, the reference dimension is at least one of a distance dimension, a time domain dimension, and a quality dimension, and at this time, according to the at least one reference dimension, candidate reference frames are selected from a set number of initial reference frames, which may be specifically implemented as follows:

And selecting candidate reference frames from the set numerical initial reference frames according to at least one of the distance between each initial reference frame and the current coding frame, the time domain level of each initial reference frame and the coding quality of each initial reference frame.

Specifically, each video frame in the video to be encoded has a corresponding temporal level, and the video frame of the second layer of the temporal level may refer to the temporal level as the video frame of the first layer, and the lower the temporal level, the greater the likelihood that the frame is referred to.

It should be noted that the closer to the current encoded frame, the greater the likelihood that the description is referenced; the lower the time domain hierarchy, the greater the likelihood that the description is referenced; the better the coding quality, the greater the likelihood of being referenced, so that candidate reference frames may be selected from the set number of initial reference frames according to at least one of the distance of each initial reference frame from the current coding frame, the temporal level of each initial reference frame, and the coding quality of each initial reference frame.

In a possible implementation manner, taking the AV1 video standard as an example, the distances between 7 initial reference frames and the current encoded frame may be calculated in sequence, and 2 frames closest to the current encoded frame may be selected as candidate reference frames. Then, from the remaining 5 initial reference frames, the 1 frame with the lowest temporal level is selected as the candidate reference frame. And then, for the rest 4 initial reference frames, determining average coding QP of each initial reference frame in turn, and selecting 1 frame with the minimum average coding QP as a candidate reference frame. Therefore, 4 frames can be selected from 7 frames of initial reference frames to serve as candidate reference frames of the current coding frame, and the optimal reference frames of all coding blocks in the subsequent current coding frame are selected from the 4 frames of candidate reference frames.

Wherein QP is Quantization Parameter, which refers to quantization parameter. When the method is specifically implemented, a basic QP is firstly designated externally, then the encoder judges the importance degree of each coding block in the current coding frame through a pre-analysis process, the basic QP is adjusted according to the importance degree, the QP is reduced for the important coding blocks, the QP is increased for the less important blocks, the coding QP of each coding block can be finally obtained, and the average value of the QP of all the coding blocks in the current coding frame is the average QP of the frame.

In the embodiment of the application, 4 candidate reference frames of the current coding frame are adaptively selected during coding, namely 2 frames closest to the current coding frame in 7 initial reference frames, 1 frame with the lowest time domain layer and 1 frame with the best coding quality, 4 better reference frames can be selected from the 7 initial reference frames to form the candidate reference frames, the reference frames are reduced from 7 to 4, the calculation amount is greatly saved, and the coding efficiency is improved.

Step 104: for each candidate reference frame, a corresponding motion vector candidate list is established, wherein the motion vector candidate list at least comprises: a motion vector prediction list, motion vectors of coded blocks around the current coding block, and motion vectors of the current coding block in a pre-analysis process.

When a current encoded frame is encoded, the current encoded frame is often divided into a plurality of encoded blocks, and motion estimation is performed for each encoded block to determine an optimal motion vector MV. In actual implementation, for each candidate reference frame, a corresponding motion vector candidate list MVC may be established, where the motion vector candidate list MVC may at least include a motion vector prediction list MVP under a video standard, an optimal motion vector of a coded block around a current coding block, a motion vector of the current coding block in a pre-analysis process, and so on, so as to establish MVC of each reference frame of the current coding block, and add other motion vectors in addition to MVP, so as to improve screening accuracy.

In an alternative implementation of this embodiment, the current encoded frame is divided into a plurality of encoded blocks; for each candidate reference frame, a corresponding motion vector candidate list is established, and the specific implementation process may be as follows:

determining a motion vector prediction list of a current coding block in the current coding frame aiming at a first candidate reference frame;

determining a first motion vector of coded blocks surrounding the current coded block;

determining a pre-analysis motion vector of the current coding block in a pre-analysis stage;

And establishing a motion vector candidate list corresponding to the first candidate reference frame according to the motion vector prediction list, the first motion vector and the pre-analysis motion vector.

It should be noted that, the AV1 standard specifies a rule for establishing an MVP list, but for some coding blocks, MVP may be quite different from actual motion vectors, so in this embodiment of the present application, an MVC list (i.e. a motion vector candidate list) is established to predict motion information of a current coding block more accurately, and then the candidate reference frame may be pruned by using the list to obtain a target reference frame.

In practical application, the MVC list building method is as follows: firstly, adding MVP (motion vector prediction list) specified by a standard into the list, secondly, adding the optimal motion vector (namely a first motion vector) of the coded blocks around the current coding block into the list, and then adding the pre-analysis motion vector of the current coding block into the list in the pre-analysis process to establish the MVC list of each candidate reference frame corresponding to the current coding block.

It should be noted that, each coding block has motion estimation, and the motion estimation may obtain an MV, that is, a motion vector, where the motion estimation refers to determining a starting point, searching a series of surrounding points, determining an optimal point, and obtaining a motion vector MV from the starting point to the optimal point; the motion vector MV of the current coding block may provide a reference for determining a search starting point for neighboring coding blocks, each coding block having its own optimal motion vector MV.

MVP refers to a rule defined in advance by the AV1 standard for predicting MVs of a current coded block, i.e., in what manner to predict, and based on this rule, a MVP list can be predicted. MVs of other surrounding coding blocks not used by MVP can also be added to the MVC list, and in addition, in the pre-analysis stage, the pre-analysis motion vector of the current coding block can also be added to the MVC list.

In the embodiment of the application, MVC of each reference frame of the current coding block is established, other motion vectors are added besides MVP, reference information is enriched, and screening accuracy is improved.

In an optional implementation manner of this embodiment, one encoded frame may be divided into a plurality of encoded blocks, and one encoded block may be further divided into a plurality of sub-encoded blocks, where a motion vector candidate list corresponding to the first candidate reference frame is established according to the motion vector prediction list, the first motion vector, and the pre-analysis motion vector, and the specific implementation process may be as follows:

determining whether a parent node exists in the current coding block;

if yes, determining a second motion vector of the father node;

and establishing a motion vector candidate list corresponding to the first candidate reference frame according to the motion vector prediction list, the first motion vector, the pre-analysis motion vector and the second motion vector.

It should be noted that, the parent node refers to whether the current coding block belongs to a sub-block of one coding block, for example, one coding frame is divided into coding block 1, coding block 2, coding block 3 and coding block 4, coding block 1 is divided into coding block 11, coding block 12, coding block 13 and coding block 14, coding block 2-coding block 4, and so on. For coding block 11, the parent node is coding block 1, and when determining the motion vector candidate list of coding block 11, the motion vector of coding block 1 may be added.

Step 106: and screening target reference frames from the candidate reference frames based on the motion vector candidate list of the candidate reference frames.

In an optional implementation manner of this embodiment, based on the motion vector candidate list of each candidate reference frame, the target reference frame is screened from the candidate reference frames, and the specific implementation process may be as follows:

determining the rate distortion cost of each candidate reference frame based on the motion vector candidate list of each candidate reference frame;

And screening out candidate reference frames with rate distortion cost larger than a rate distortion threshold value in the candidate reference frames to obtain the target reference frame.

It should be noted that, the rate distortion cost of each candidate reference frame may be calculated based on the motion vector candidate list of each candidate reference frame, and then candidate reference frames with rate distortion cost greater than the rate distortion threshold in each candidate reference frame are screened out to obtain a target reference frame, so that the reference frame with higher rate distortion cost in the candidate reference frame is further pruned, and the obtained target reference frame is a reference frame with higher reference value, so that the encoding rate may be improved on the basis of ensuring the encoding quality.

In an optional implementation manner of this embodiment, based on the motion vector candidate list of each candidate reference frame, determining a rate distortion cost of each candidate reference frame may include the following specific implementation process:

calculating the rate distortion cost of each candidate motion vector in a first motion vector candidate list of a first candidate reference frame, wherein the first candidate reference frame is any one of the candidate reference frames;

and taking the smallest rate distortion cost in the rate distortion costs of the candidate motion vectors as the rate distortion cost of the first candidate reference frame.

It should be noted that, for the first candidate reference frame, the rate-distortion cost of each candidate motion vector in the first motion vector candidate list may be calculated, and the smallest rate-distortion cost of the rate-distortion costs of each candidate motion vector is taken as the rate-distortion cost of the first candidate reference frame. And by analogy, each candidate reference frame can be used as a first candidate reference frame, and the corresponding rate distortion cost is calculated, so that the candidate reference frame with higher rate distortion cost is pruned.

In an alternative implementation of this embodiment, the current encoded frame is divided into a plurality of encoded blocks; the calculating the rate distortion cost of each candidate motion vector in the first motion vector candidate list of the first candidate reference frame comprises the following specific implementation process:

determining an absolute error sum between a current coding block and a reference block pointed by a first motion vector, wherein the first motion vector is any candidate motion vector in the first motion vector candidate list;

determining the number of bits occupied by transmitting the first motion vector;

determining coding parameters corresponding to the current coding block according to the quantization parameters of the first candidate reference frame;

And calculating the rate distortion cost of the first motion vector according to the absolute error sum, the bit number and the coding parameter.

It should be noted that, the current coding block may point to the reference block in the candidate reference frame through the first motion vector, that is, the coordinates of the current coding block of the current coding frame, and the first motion vector may be added to obtain the coordinates of the corresponding reference block in the candidate reference frame, where the coding block and the reference block include pixels that are in one-to-one correspondence, and the two corresponding pixels in the coding block and the reference block calculate a difference value, and then sum the absolute values of the difference values of the pixels, so as to obtain the absolute error sum.

For example, the current coding block is 8×8 in size, the reference block determined by the first motion vector is also 8×8 in size, that is, there are 64 pixels in total, the difference between the corresponding two pixels in the coding block and the reference block is calculated, 64 differences can be obtained, the absolute values of the 64 differences are summed, and the absolute error sum can be obtained.

In addition, the motion vector has positive and negative values, for example, positive number of the horizontal axis represents rightward movement, negative number represents leftward movement, and the value range of the motion vector is generally larger, and as an example, assuming that the horizontal axis of the motion vector is a value between-15 and 15, the vertical axis is also a value between-15 and 15, the value between 0 and 15 is represented by binary, 4 bits are required, -15 to 15 is compared with 0 to 15, and one bit is required to represent positive and negative values, so that the value between-15 to 15 requires 5 bits to represent positive and negative values, and the total of 10 bits are required for the horizontal axis and the vertical axis, that is, the bit number occupied by transmitting the first motion vector can be determined to be 10 at this time.

In addition, in practical applications, when the first motion vector is transmitted, instead of directly transmitting the first motion vector, the difference MVD between the first motion vector and the motion vector prediction list MVP is often transmitted, and in specific implementations, the first predicted motion vector in the motion vector prediction list MVP may be used to calculate the difference.

Besides, when coding the motion vector, besides using simple fixed-length binary, a variable-length coding mode, such as huffman coding, context-based adaptive binary arithmetic entropy coding and the like, can be adopted, and a shorter bit number can be used for representing a numerical value with a larger occurrence probability, such as the probability that a plurality of MVDs appear closer to 0, and at the moment, a smaller MVD value can be represented by using a smaller bit number, so that the bit number of overall transmission is reduced, and transmission resources are saved.

Furthermore, the coding parameters are determined based on quantization parameters of the first candidate reference frame, in particular, the coding parameters may be determined based on a set constant and a quantization step size, wherein the set constant

The quantization step size is obtained based on a quantization parameter lookup table of the first candidate reference frame for the encoder setting.

In practical application, the rate-distortion cost of the first motion vector can be obtained through calculation according to the following formula (1):

SADCOST＝SAD+λ*R (1)

the SADCOST is the rate distortion cost of a first motion vector in a first motion vector candidate list of the first candidate reference frame; SAD is the sum of absolute errors between the current encoded block and the reference block to which the first motion vector points; r is the number of bits occupied by transmitting the first motion vector; and lambda is a Lagrangian multiplier used by the current coding block, namely, the coding parameters corresponding to the current coding block are determined according to the quantization parameters of the first candidate reference frame.

Wherein the encoding parameters can be calculated by the following formula (2):

λ ＝ c * qstep * qstep (2)

where c is a constant, the encoder can set itself; qstep is a quantization step size, and the standard specifies the corresponding relationship between QP and qstep, and can directly look up a table to obtain qstep corresponding to QP, so as to calculate corresponding coding parameters.

In the embodiment of the application, for each obtained candidate reference frame, each motion vector in the MVC list of each candidate reference frame can be traversed, and the rate distortion cost of each motion vector in MVC can be calculated to prune the reference frame to obtain a final target reference frame, so that the number of reference frames required to be traversed is reduced, the calculated amount is saved, and the coding efficiency is improved.

In an optional implementation manner of this embodiment, the rate distortion threshold may be further calculated in advance, that is, candidate reference frames in the candidate reference frames with rate distortion cost greater than the rate distortion threshold are screened out, and before the target reference frame is obtained, the method may further include:

determining the minimum target rate distortion cost in the rate distortion cost of each candidate reference frame;

and determining the rate distortion threshold according to the target rate distortion cost and an adjustment threshold.

After calculating the rate distortion cost of each candidate reference frame, the minimum target rate distortion cost can be determined, and the target rate distortion cost is multiplied by the adjustment threshold value to obtain the rate distortion threshold value, so as to be used for pruning the reference frame with higher rate distortion cost in the candidate reference frames. Wherein the adjustment threshold is set empirically in advance, e.g., the adjustment threshold may be 1.2.

According to the method and the device for determining the target reference frames, the rate distortion threshold can be determined according to the target rate distortion cost and the adjustment threshold, the rate distortion threshold can be changed along with the change of the rate distortion cost of each candidate reference frame, the method and the device can be more suitable for different coding scenes and different actual conditions of videos to be coded, and the screening accuracy of the target reference frames is improved.

Step 108: the current encoded frame is encoded according to the target reference frame.

It should be noted that, after candidate reference frames are screened out from the initial reference frames and target reference frames are screened out from the candidate reference frames, each target reference frame can be combined based on the requirement of the actual coding standard, each combination is traversed, an optimal reference frame is determined, and the current coding block of the current coding frame of the video to be coded is coded based on the optimal reference frame.

According to the video coding method provided by the embodiment of the application, candidate reference frames can be selected from a set number of initial reference frames according to at least one reference dimension, wherein the initial reference frames are reference frames corresponding to a current coding frame in a video to be coded; establishing a corresponding motion vector candidate list for each candidate reference frame; screening target reference frames from the candidate reference frames based on a motion vector candidate list of the candidate reference frames; and encoding the current encoding frame according to the target reference frame.

In this case, for a current encoded frame of a video to be encoded, a portion of a plurality of initial reference frames may be adaptively selected as candidate reference frames according to at least one reference dimension at the time of encoding; then, a motion vector candidate list of each candidate reference frame is established, other motion vectors are added besides the motion vector prediction list, a target reference frame is screened from the motion vector candidate list of each candidate reference frame based on the motion vector candidate list of each candidate reference frame, and the current coding frame is coded. Therefore, firstly, the initial reference frames with lower reference possibility are pruned through simpler features to obtain candidate reference frames, secondly, a motion vector candidate list of each candidate reference frame of the current coding frame is established, the target reference frames which can be referred in coding are screened out, and then traversing is carried out to code the current coding frame, so that the number of the candidate reference frames is reduced, the calculation amount required by traversing all possible reference frame combinations is reduced, and the calculation resources are greatly saved.

Fig. 2 shows a flowchart of another video encoding method according to an embodiment of the present application, taking the AV1 video standard as an example, specifically including the following steps:

step 202: for a current coding block of a current coding frame, 7 initial reference frames specified by the AV1 video standard are determined.

Step 204: 2 initial reference frames closest to the current coding frame, 1 initial reference frame with the lowest time domain level and 1 initial reference frame with the minimum average coding QP are selected, and the selected 4 initial reference frames are used as candidate reference frames.

Step 206: for each candidate reference frame, a corresponding MVC list is established, wherein the MVC list comprises an MVP list, a preanalyzed MV, a neighboring block MV and a parent block MV.

Step 208: with MVC, the rate distortion cost for each candidate reference frame is calculated.

Step 210: pruning candidate reference frames with higher rate distortion cost in each candidate reference frame to obtain a target reference frame.

And 212, combining the target reference frames based on the coding mode specified by the AV1 video standard, traversing each combination, determining the optimal reference frame of the current coding block, and coding the current coding block based on the optimal reference frame.

According to the video coding method provided by the embodiment of the application, for the current coding frame of the video to be coded, a part of 7 initial reference frames specified by the AV1 video standard can be adaptively selected as candidate reference frames according to at least one reference dimension during coding; then, an MVC list of each candidate reference frame is established, other motion vectors are added besides the MVP list, the target reference frame is screened from the MVC list of each candidate reference frame based on the MVC list of each candidate reference frame, and the current coding frame is coded. Therefore, firstly, the initial reference frames with lower reference possibility are pruned through simpler features to obtain candidate reference frames, secondly, the MVC list of each candidate reference frame of the current coding frame is established, the target reference frames which can be referred in coding are screened out, and then traversing is carried out to code the current coding frame, so that the number of the candidate reference frames is reduced, the calculation amount required for traversing all possible reference frame combinations is reduced, and the calculation resource is greatly saved.

Corresponding to the above method embodiment, the present application further provides an embodiment of a video encoding device, and fig. 3 shows a schematic structural diagram of a video encoding device according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:

the selecting module 302 is configured to select candidate reference frames from a set number of initial reference frames according to at least one reference dimension, where the initial reference frames are reference frames corresponding to a current coding frame in a video to be coded;

a building module 304 configured to build, for each candidate reference frame, a corresponding motion vector candidate list, wherein the motion vector candidate list at least comprises: a motion vector prediction list, motion vectors of coded blocks around the current coding block and motion vectors of the current coding block in a pre-analysis process;

a screening module 306 configured to screen a target reference frame from each candidate reference frame based on a motion vector candidate list of the candidate reference frames;

an encoding module 308 configured to encode the current encoded frame according to the target reference frame.

Optionally, the screening module 306 is further configured to:

Optionally, the current encoded frame is divided into a plurality of encoded blocks; the screening module 306 is further configured to:

Optionally, the screening module 306 is further configured to:

Optionally, the selecting module 302 is further configured to:

Optionally, the current encoded frame is divided into a plurality of encoded blocks; the setup module 304 is further configured to:

Optionally, the setup module 304 is further configured to:

Determining whether a parent node exists in the current coding block;

if yes, determining a second motion vector of the father node;

According to the video coding device provided by the embodiment of the application, for the current coding frame of the video to be coded, a part of the initial reference frames can be adaptively selected as candidate reference frames according to at least one reference dimension during coding; then, a motion vector candidate list of each candidate reference frame is established, other motion vectors are added besides the motion vector prediction list, a target reference frame is screened from the motion vector candidate list of each candidate reference frame based on the motion vector candidate list of each candidate reference frame, and the current coding frame is coded. Therefore, firstly, the initial reference frames with lower reference possibility are pruned through simpler features to obtain candidate reference frames, secondly, a motion vector candidate list of each candidate reference frame of the current coding frame is established, the target reference frames which can be referred in coding are screened out, and then traversing is carried out to code the current coding frame, so that the number of the candidate reference frames is reduced, the calculation amount required by traversing all possible reference frame combinations is reduced, and the calculation resources are greatly saved.

The above is a schematic solution of a video encoding apparatus of the present embodiment. It should be noted that, the technical solution of the video encoding device and the technical solution of the video encoding method belong to the same conception, and details of the technical solution of the video encoding device, which are not described in detail, can be referred to the description of the technical solution of the video encoding method.

FIG. 4 illustrates a block diagram of a computing device provided in accordance with an embodiment of the present application. The components of the computing device 400 include, but are not limited to, a memory 410 and a processor 420. Processor 420 is coupled to memory 410 via bus 430 and database 450 is used to hold data.

Computing device 400 also includes access device 440, access device 440 enabling computing device 400 to communicate via one or more networks 460. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 440 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network Interface Controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Networks) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a near field communication (NFC, near Field Communication) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 400, as well as other components not shown in FIG. 4, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 4 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 400 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 400 may also be a mobile or stationary server.

Wherein the processor 420 is configured to execute the following computer executable instructions to implement the following method:

Establishing a corresponding motion vector candidate list for each candidate reference frame;

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the video coding method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the video coding method.

An embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, are configured to implement the steps of any of the video encoding methods.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the video encoding method described above belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the video encoding method described above.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A video encoding method, comprising:

2. The method according to claim 1, wherein the selecting the target reference frame from the candidate reference frames based on the motion vector candidate list of the candidate reference frames comprises:

3. The method of video coding according to claim 2, wherein said determining a rate-distortion cost for each candidate reference frame based on the motion vector candidate list for each candidate reference frame comprises:

4. The video coding method of claim 3, wherein the current coded frame is divided into a plurality of coded blocks; the calculating the rate distortion cost of each candidate motion vector in the first motion vector candidate list of the first candidate reference frame comprises the following steps:

5. The method according to claim 2, wherein said screening out candidate reference frames of said candidate reference frames having a rate distortion cost greater than a rate distortion threshold, before obtaining said target reference frame, further comprises:

6. The method according to any one of claims 1-5, wherein selecting candidate reference frames from a set number of initial reference frames according to at least one reference dimension comprises:

7. The video coding method according to any one of claims 1-5, wherein the current coded frame is divided into a plurality of coded blocks; the step of establishing a corresponding motion vector candidate list for each candidate reference frame includes:

8. The method according to claim 7, wherein said creating a motion vector candidate list corresponding to the first candidate reference frame from the motion vector prediction list, the first motion vector, and the pre-analysis motion vector comprises:

determining whether a parent node exists in the current coding block;

if yes, determining a second motion vector of the father node;

9. A video encoding apparatus, comprising:

10. A computing device, comprising:

a memory and a processor;

11. A computer readable storage medium storing computer executable instructions which when executed by a processor perform the steps of the video encoding method of any one of claims 1 to 8.