WO2021056215A1 - 运动估计方法、***和存储介质 - Google Patents

运动估计方法、***和存储介质 Download PDF

Info

Publication number
WO2021056215A1
WO2021056215A1 PCT/CN2019/107601 CN2019107601W WO2021056215A1 WO 2021056215 A1 WO2021056215 A1 WO 2021056215A1 CN 2019107601 W CN2019107601 W CN 2019107601W WO 2021056215 A1 WO2021056215 A1 WO 2021056215A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion vector
accuracy
coding unit
motion
affine
Prior art date
Application number
PCT/CN2019/107601
Other languages
English (en)
French (fr)
Inventor
马思伟
孟学苇
郑萧桢
王苫社
Original Assignee
深圳市大疆创新科技有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司, 北京大学 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980066902.6A priority Critical patent/CN112868234A/zh
Priority to PCT/CN2019/107601 priority patent/WO2021056215A1/zh
Publication of WO2021056215A1 publication Critical patent/WO2021056215A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Definitions

  • the present invention relates to the technical field of video coding and decoding, in particular to a motion estimation method, system and storage medium.
  • the basic principle of video coding is to use the correlation between the spatial, temporal and codewords to remove redundancy as much as possible.
  • Current video coding schemes mainly include intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering.
  • the inter-frame prediction technology uses the time-domain correlation between adjacent frames of the video, uses the previously encoded reconstructed frame as a reference frame, and performs motion estimation (ME) and motion compensation (MC) pairs
  • the current frame (that is, the frame currently being encoded) is predicted to remove the temporal redundant information of the video.
  • the image can be divided into several coding units, and the position of each coding unit in the adjacent frames can be searched out, and the spatial position between the two can be obtained.
  • Relative offset the obtained relative offset is usually referred to as a motion vector (motion vector, MV), and the process of obtaining a motion vector is called motion estimation.
  • Motion compensation is the process of using MV and reference frames to obtain the predicted frame.
  • the predicted frame obtained by this process may be different from the original current frame. Therefore, the difference between the predicted frame and the current frame needs to be transformed and quantized.
  • the MV information is passed to the decoder, so that the decoder can reconstruct the current frame through the MV, the reference frame, and the difference between the predicted frame and the current frame.
  • Motion estimation is an important link that affects the efficiency of video coding. Therefore, how to optimize the motion estimation method has always been a concern of those skilled in the art.
  • the first aspect of the embodiments of the present invention provides a motion estimation method, the method includes:
  • For the affine coding unit in the current frame select one from at least four kinds of motion vector accuracy to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit;
  • the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.
  • the second aspect of the embodiments of the present invention provides another motion estimation method, and the method includes:
  • the affine coding unit in the current frame select one from a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit, wherein the multiple Motion vector accuracy includes 1/2 pixel accuracy;
  • the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.
  • a third aspect of the embodiments of the present invention provides a motion estimation system.
  • the system includes a storage device and a processor.
  • the storage device stores a computer program run by the processor.
  • the processor executes the above-mentioned motion estimation method while it is running.
  • a fourth aspect of the embodiments of the present invention provides a storage medium on which a computer program is stored, and the computer program executes the above-mentioned motion estimation method when running.
  • the motion estimation method, system and storage medium of the present invention unify the design of the motion vector accuracy in the affine mode with the motion vector accuracy in the conventional mode, and improve the coding performance.
  • Fig. 1 shows a flowchart of a motion estimation method according to an embodiment of the present invention
  • Fig. 2 shows a schematic diagram of a motion vector of a control point of an affine coding unit according to an embodiment of the present invention
  • Fig. 3 shows a schematic diagram of motion vectors of subunits of an affine coding unit according to an embodiment of the present invention
  • Fig. 4 shows a flowchart of a motion estimation method according to another embodiment of the present invention.
  • Fig. 5 shows a structural block diagram of a motion estimation system according to an embodiment of the present invention.
  • the motion estimation method of the embodiment of the present invention can be applied to the inter-frame prediction part of the video coding and decoding technology.
  • Video is generally composed of multiple frames of images in a certain order. There are often a lot of identical or similar spatial structures in one frame of image, that is to say, there are a lot of spatial redundant information in the video file. In addition. Since the sampling time interval between two adjacent frames of the video is extremely short, there is usually a large amount of similarity in the adjacent two frames, that is, there is a large amount of time redundant information in the video. In addition, from the perspective of the visual sensitivity of the human eye, there is also a part of video information that can be used for compression, that is, visual redundant information.
  • video image information also has a series of redundant information such as information entropy redundancy, structural redundancy, knowledge redundancy, importance redundancy and so on.
  • the purpose of video coding is to remove redundant information in a video sequence, so as to reduce storage space and save transmission bandwidth.
  • video coding mainly includes intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering.
  • the embodiment of the present invention mainly aims at improving the inter-frame prediction part.
  • the inter-frame prediction technology uses the time-domain correlation between adjacent frames of the video, uses the previously encoded reconstructed frame as a reference frame, and predicts the current frame (the frame currently being encoded) through motion estimation and motion compensation, thereby removing Time redundant information of the video.
  • the motion estimation method, system, and storage medium described in the embodiments of the present invention use the HEVC standard or its extension.
  • the present invention is also applicable to other coding standards, such as the H.264 standard, the next generation video coding standard VVC, AVS3, or any other suitable coding standard.
  • Fig. 1 shows a flowchart of a motion estimation method 100 according to an embodiment of the present invention. As shown in FIG. 1, the method 100 includes the following steps:
  • step S110 for the affine coding unit in the current frame, one of at least four kinds of motion vector accuracy is selected to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit.
  • the current frame is the video frame currently to be encoded.
  • the current frame can be a video frame collected in real time, or a video frame extracted from a storage medium.
  • the reference frame is the video frame to be referred to when encoding the current frame.
  • the reference frame may be a reconstructed video frame obtained by reconstructing the encoded data corresponding to the video frame that can be used as the reference frame.
  • the reference frame can be a forward reference frame, a backward reference frame, or a bidirectional reference frame.
  • inter-frame prediction techniques include forward prediction, backward prediction, bidirectional prediction, and so on.
  • Forward prediction uses the previous frame (historical frame) of the current frame as a reference frame to predict the current frame.
  • Backward prediction uses the frame after the current frame (future frame) as a reference frame to predict the current frame.
  • Bidirectional prediction uses not only historical frames but also future frames to predict the current frame.
  • a bidirectional prediction mode is adopted, that is, the reference frame includes both historical frames and future frames.
  • the affine coding unit in the current frame is a coding unit (CU) divided in the current frame based on the affine motion compensation prediction (Affine) technology.
  • the traditional motion model only includes translational motion, but in reality there are many forms of motion, such as zooming, rotating, perspective motion and other irregular motions, which introduces the Affine technology.
  • the processing unit in the Affine technology is no longer the entire coding unit, but divides the entire coding unit into multiple sub-units. In the process of motion compensation, motion compensation is performed in the unit of sub-units.
  • the affine coding unit in the Affine mode no longer has only one motion vector, but each subunit in the affine coding unit has its own motion vector.
  • the motion vector of each subunit in the affine coding unit passes through the two control points of the affine coding unit (ie, the four-parameter model, see the left figure in Figure 2) or The motion vectors of the three control points (that is, the six-parameter model, see the right figure in Figure 2) are calculated and derived. Only the motion vector information of the control point needs to be written in the code stream, not the motion of each subunit. Vector information.
  • the motion vector of the control point is first determined.
  • the embodiment of the present invention adopts the adaptive motion vector accuracy (AMVR) technology in The encoder side adaptively determines the accuracy of the motion vector.
  • the determination of the motion vector of the control point is based on the Inter mode (also known as the AMVP mode) in the Affine mode. In this mode, the motion vector accuracy is selected on the encoder side, and MVD (Motion Vector Difference, motion vector difference) calculation.
  • the selectable motion vector precision includes four kinds, and for each coding unit, one of the four kinds of motion vector precision is selected for motion estimation.
  • the at least four motion vector precisions include any four of 4 pixels, 2 pixels, whole pixels, 1/2 pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels.
  • the four kinds of motion vector precisions may be integer pixel precision, 1/2 pixel precision, 1/4 pixel precision, and 1/16 pixel precision.
  • the conventional AMVP mode includes four AMVR precisions. Therefore, compared with the previous Affine mode with three precisions, the embodiment of the present invention increases the precision by one bit, so that the number of motion vector precisions available for the affine coding unit is the same as the number of motion vector precisions available for the conventional coding unit. Furthermore, the design of adaptive motion vector accuracy in Affine mode is unified with the design of adaptive motion vector accuracy in conventional AMVP mode. In one embodiment, the newly added precision in the embodiment of the present invention is 1/2 pixel precision.
  • the accuracy of the motion vector of the control point referred to in the Affine mode is not the actual use in the process of sub-unit motion compensation.
  • the accuracy of the motion vector is not the actual use in the process of sub-unit motion compensation.
  • the method for determining the accuracy of the motion vector includes: selecting the accuracy of the motion vector according to the selected motion vector accuracy of the neighboring coding unit.
  • the method for determining the accuracy of a motion vector may further include: attempting to perform motion estimation based on at least two of the four kinds of motion vector accuracy, and selecting the motion vector accuracy based on the effect of the motion estimation.
  • two kinds of motion vector precisions can be selected from the four optional motion vector precisions, and motion estimation is attempted respectively, and the effects of the motion estimation twice are compared. For example, you can select 1/2 pixel precision and integer pixel precision to perform motion estimation separately.
  • the motion estimation effect with lower motion vector accuracy is better, stop trying, and directly use the lower motion vector accuracy as the selected motion vector accuracy. For example, if the effect of using the integer pixel precision for motion estimation is better than the effect of using 1/2 pixel accuracy for motion estimation, then no other precision attempts are made, and the integer pixel precision is directly selected. If the motion estimation effect with higher motion vector accuracy is better, continue to use higher motion vector accuracy to try motion estimation until the best motion estimation effect is obtained. For example, if the effect of using 1/2 pixel accuracy for motion estimation is better than that of whole pixel accuracy, then continue to try 1/4 pixel accuracy for motion estimation.
  • the determining the motion vector of the control point of the affine coding unit includes: first, obtaining the motion vector of the spatial or temporal adjacent coding unit, and according to the spatial adjacent coding unit or temporal adjacent coding The combination of the motion vectors of the units constructs a candidate list.
  • the motion vector obtained in this process may be the motion vector of the control point of the coding unit in the Affine mode, or the motion vector of the conventional coding unit in the traditional mode.
  • the obtained motion vectors are combined to construct a candidate list of control point motion vectors, and the number of motion vectors in each combination depends on the number of control points of the affine coding unit.
  • the motion vector predictor MVP
  • the corresponding reference block can be determined in the reference frame according to the predicted motion vector.
  • interpolation processing is performed on the reference block to generate fractional pixels, and then the actual motion vector is determined.
  • the encoding end can also calculate the difference MVD (Motion Vector Difference) between the actual motion vector and the predicted motion vector, encode the MVD, and send the encoded MVD and the index of the predicted motion vector in the candidate list to the decoding end.
  • MVD Motion Vector Difference
  • the accuracy of the motion vector includes integer pixel accuracy and fractional pixel accuracy. Since the pixel at the fractional pixel position does not exist, it is necessary to interpolate the reference block to obtain the pixel at the sub-pixel position. Interpolation is to use the value of integer pixels to generate fractional pixels between each integer sample. The more fractional pixels are generated between integer pixels, the higher the resolution of the reference block becomes, and the more accurately and accurately the displacement of fractional pixel precision can be compensated. With the improvement of interpolation accuracy, the efficiency of motion estimation and motion compensation will be improved to a certain extent.
  • the accuracy of the motion vector in the Affine mode can be an integer, that is, an integer pixel accuracy, such as integer, 2 pixels; it can also be non-integer, that is, a sub-pixel accuracy, such as 1/2, 1/4, 1/8. Equal precision.
  • the pixel at 1/2 precision position needs to be obtained by interpolation of the pixel at the whole pixel position.
  • the pixel values of other precision positions need to be obtained by further interpolation using integer-pixel precision pixels or 1/2-precision pixels.
  • an interpolation filter can be selected according to the selected motion vector accuracy to perform interpolation processing on the reference block.
  • the same interpolation filter may be used for all motion vector accuracy.
  • the existing six-tap interpolation filter is used by default.
  • the identification bit that characterizes the type of the interpolation filter may not be set in the code stream, thereby saving one bit of data.
  • different interpolation filters can be selected according to different motion vector accuracy.
  • 1/2 precision uses a 6-tap interpolation filter, and other precisions all use an 8-tap interpolation filter. Therefore, in an embodiment of the present invention, when 1/2 pixel precision is selected as the motion vector precision, the first interpolation filter is selected to perform interpolation processing on the reference block; when the precision other than 1/2 pixel precision is selected When the other precision of is used as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block, wherein the number of taps of the first interpolation filter and the second interpolation filter are different.
  • the first interpolation filter may be a 6-tap interpolation filter
  • the second interpolation filter may be an 8-tap interpolation filter.
  • the filter type identification bit can be set in the code stream. For example, 1 can be used to indicate that a 6-tap interpolation filter is used; 0 can be used to indicate that a 6-tap interpolation filter is not used, that is, the default 8-tap interpolation filter is used.
  • the motion estimation method 200 when applied to the decoding end, if different interpolation filters are selected for different motion vector accuracy, before selecting the interpolation filter according to the motion vector accuracy, the motion estimation method 200 further includes: acquiring code Stream, the code stream is provided with an identification bit of the filter type corresponding to the motion vector.
  • motion estimation can include both Affine mode and regular AMVP mode.
  • motion estimation is performed with the entire coding unit as a unit.
  • each conventional coding unit when adaptive motion vector accuracy (AMVR) is applied, it also includes adaptively selecting one of four motion vector accuracy for motion estimation.
  • the four motion vector accuracies of the conventional coding unit are the same or different from the four motion vector accuracies of the affine coding unit.
  • the four kinds of motion vector precisions may include integer pixel, 4 pixel, 1/4 pixel and 1/2 pixel precision.
  • the accuracy of the motion vector is not limited to the above four types, for example, it may also include 1/8 pixel, 1/16 pixel, and so on.
  • the corresponding motion vector accuracy is adaptively decided at the coding end, and the result of the decision is written into the code stream and passed to the decoding end.
  • the identifier indicating the accuracy of the motion vector of the affine coding unit is consistent with the identifier indicating the accuracy of the motion vector of the conventional coding unit, so that the two modes are more unified.
  • the method when the motion estimation method 200 is applied to the decoding end, before selecting one of the at least four motion vector precisions to perform motion estimation in the reference frame, the method further includes: acquiring a bitstream, so The identification bit of the code stream records the motion vector accuracy of the selected affine coding unit, the identifier representing the motion vector accuracy of the affine coding unit and the identifier representing the motion vector accuracy of the conventional coding unit Consistent.
  • step S120 the affine coding unit is divided into several subunits.
  • the size of the sub-units may be fixed, for example, each sub-unit is divided into a size of 4 ⁇ 4 pixels.
  • the size of the subunit may also be determined in other ways. For example, a subunit of an appropriate size may be selected to reduce the complexity of coding and decoding.
  • step S130 the motion vector of the subunit in the affine coding unit is calculated according to the motion vector of the control point.
  • the sports field of the Affine mode can be derived from the motion vectors of two control points (four parameters) or three control points (six parameters).
  • the motion vector of the subunit located at the (x, y) position is calculated by the following formula (1):
  • (mv 0x ,mv 0y ) is the motion vector of the control point in the upper left corner
  • (mv 1x ,mv 1y ) is the motion vector of the control point in the upper right corner
  • x and y are the coordinates of the center point of the subunit
  • w is the affine The width of the coding unit.
  • the motion vector of the sub-unit at the position (x, y) is calculated by the following formula (2):
  • (mv 0x ,mv 0y ) is the motion vector of the control point in the upper left corner
  • (mv 1x ,mv 1y ) is the motion vector of the control point in the upper right corner
  • (mv 2x ,mv 2y ) is the motion vector of the control point in the lower left corner
  • w is the width of the affine coding unit.
  • a schematic diagram of the motion vector in an affine coding unit is shown in Fig. 3, where each square represents a 4 ⁇ 4 size subunit. All motion vectors after the calculation of the above formula will be rounded to a 1/16 pixel precision representation.
  • the size of the subunits of the chrominance component and the luminance component are both 4 ⁇ 4, and the motion vector of the chrominance component 4 ⁇ 4 subunit can be obtained by averaging the motion vectors of the corresponding four 4 ⁇ 4 luminance components.
  • the prediction block of each subunit in the reference frame can be obtained through a motion compensation process. After that, the prediction frame can be obtained by using the motion vector and the prediction block.
  • the encoding end transfers the difference between the prediction frame and the actual current frame to the decoding end after transformation, quantization, etc., and the decoding end uses the motion vector, reference frame, and The difference between the predicted frame and the current frame can reconstruct the current frame.
  • the motion estimation method unifies the design of the motion vector accuracy in the affine mode with the motion vector accuracy in the normal mode, and improves the coding performance.
  • Fig. 4 shows a flowchart of a motion estimation method 400 according to another embodiment of the present invention. As shown in FIG. 4, the method 400 includes the following steps:
  • step S410 for the affine coding unit in the current frame, select one from a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit, wherein,
  • the various motion vector precisions include 1/2 pixel precision;
  • step S420 the affine coding unit is divided into several subunits
  • step S430 the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.
  • the current frame is the video frame currently to be encoded.
  • the reference frame is the video frame to be referred to when encoding the current frame.
  • the reference frame in this embodiment includes both historical frames and future frames.
  • the affine coding unit in the current frame is a coding unit (CU) divided in the current frame based on the affine motion compensation prediction (Affine) technology.
  • the processing unit in the Affine technology is no longer the entire coding unit, but divides the entire coding unit into multiple sub-units. In the process of motion compensation, motion compensation is performed in the unit of sub-units.
  • the affine coding unit in the Affine mode no longer has only one motion vector, but each subunit in the affine coding unit has its own motion vector.
  • the motion vector of each subunit in the affine coding unit passes through the two control points of the affine coding unit (ie, the four-parameter model, see the left figure in Figure 2) or The motion vectors of the three control points (that is, the six-parameter model, see the right figure in Figure 2) are calculated and derived. Only the motion vector information of the control point needs to be written in the code stream, not the motion of each subunit. Vector information.
  • the motion vector of the control point needs to be determined first.
  • the motion vector of the object between two adjacent frames may not be exactly an integer number of pixels. Therefore, the embodiment of the present invention adopts the adaptive motion vector accuracy (AMVR) technology in The encoder side adaptively determines the accuracy of the motion vector.
  • the determination of the motion vector of the control point is based on the Inter mode (also known as the AMVP mode) in the Affine mode. In this mode, the motion vector accuracy is selected on the encoder side, and MVD (Motion Vector Difference, motion vector difference) calculation.
  • the Inter mode also known as the AMVP mode
  • MVD Motion Vector Difference, motion vector difference
  • one is selected from multiple types of motion vector accuracy to perform motion estimation in the reference frame, where the multiple types of motion vector accuracy include 1/2 pixel accuracy.
  • the multiple types of motion vector accuracy include 1/2 pixel accuracy.
  • the various motion vector precisions include any of 4 pixels, 2 pixels, whole pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels.
  • one can be selected from integer pixel accuracy, 1/2 pixel accuracy, 1/4 pixel accuracy, and 1/16 pixel accuracy for motion estimation.
  • the conventional AMVP mode adds 1/2 pixel AMVR accuracy. Therefore, the embodiment of the present invention adds 1/2 pixel precision to the optional motion vector precision, so that the design of the motion vector precision of the affine coding unit matches the design of the motion vector precision of the conventional coding unit.
  • the accuracy of the motion vector of the control point referred to in the Affine mode is not the actual use in the process of sub-unit motion compensation.
  • the accuracy of the motion vector is not the actual use in the process of sub-unit motion compensation.
  • the method for determining the accuracy of the motion vector includes: selecting the accuracy of the motion vector according to the selected motion vector accuracy of the neighboring coding unit.
  • the method for determining the accuracy of a motion vector may further include: attempting to perform motion estimation based on at least two of the four kinds of motion vector accuracy, and selecting the motion vector accuracy based on the effect of the motion estimation.
  • two kinds of motion vector precisions can be selected from the four optional motion vector precisions, and motion estimation is attempted respectively, and the effects of the two motion estimations are compared. After that, compare the effects of motion estimation. If the motion estimation effect with lower motion vector accuracy is better, stop trying, and directly use the lower motion vector accuracy as the selected motion vector accuracy. If the motion estimation effect with higher motion vector accuracy is better, continue to use higher motion vector accuracy to try motion estimation until the best motion estimation effect is obtained.
  • the determining the motion vector of the control point of the affine coding unit includes: first, obtaining the motion vector of the spatial or temporal adjacent coding unit, and according to the spatial adjacent coding unit or temporal adjacent coding The combination of the motion vectors of the units constructs a candidate list. After that, the obtained motion vectors are combined to construct a candidate list of control point motion vectors, and the number of motion vectors in each combination depends on the number of control points of the affine coding unit.
  • the motion vector predictor MVP
  • the corresponding reference block can be determined in the reference frame according to the predicted motion vector.
  • interpolation processing is performed on the reference block to generate fractional pixels, and then the actual motion vector is determined. Interpolation is to use the value of integer pixels to generate fractional pixels between each integer sample. The more fractional pixels are generated between integer pixels, the higher the resolution of the reference frame becomes, and the more accurately and accurately the displacement of fractional pixel accuracy can be compensated. With the improvement of interpolation accuracy, the efficiency of motion estimation and motion compensation will be improved to a certain extent.
  • the accuracy of the motion vector in the Affine mode can be an integer, that is, an integer pixel accuracy, such as integer, 2 pixels; it can also be non-integer, that is, a sub-pixel accuracy, such as 1/2, 1/4, 1/8. Equal precision.
  • the pixel at 1/2 precision position needs to be obtained by interpolation of the pixel at the whole pixel position.
  • the pixel values of other precision positions need to be obtained by further interpolation using integer-pixel precision pixels or 1/2-precision pixels.
  • an interpolation filter can be selected according to the selected motion vector accuracy to perform interpolation processing on the reference block.
  • the same interpolation filter may be used for all motion vector accuracy.
  • the existing six-tap interpolation filter is used by default.
  • the identification bit that characterizes the type of the interpolation filter may not be set in the code stream, thereby saving one bit of data.
  • different interpolation filters can be selected according to different motion vector accuracy.
  • 1/2 precision uses a 6-tap interpolation filter, and other precisions all use an 8-tap interpolation filter. Therefore, in an embodiment of the present invention, when 1/2 pixel precision is selected as the motion vector precision, the first interpolation filter is selected to perform interpolation processing on the reference block; when the precision other than 1/2 pixel precision is selected When the other precision of is used as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block, wherein the number of taps of the first interpolation filter and the second interpolation filter are different.
  • the first interpolation filter may be a 6-tap interpolation filter
  • the second interpolation filter may be an 8-tap interpolation filter.
  • the filter type identification bit can be set in the code stream. For example, 1 can be used to indicate that a 6-tap interpolation filter is used; 0 can be used to indicate that a 6-tap interpolation filter is not used, that is, the default 8-tap interpolation filter is used.
  • the motion estimation method 400 when applied to the decoding end, if different interpolation filters are selected for different motion vector accuracy, before selecting the interpolation filter according to the motion vector accuracy, the motion estimation method 400 further includes: acquiring code Stream, the code stream is provided with an identification bit of the filter type corresponding to the motion vector.
  • motion estimation may include both Affine mode and regular AMVP mode.
  • AMVP regular AMVP mode
  • motion estimation is performed with the entire coding unit as a unit.
  • each conventional coding unit when adaptive motion vector precision (AMVR) is applied, it also includes adaptively selecting one of multiple motion vector precisions for motion estimation, the multiple motion vector precisions including 1/2 Pixel accuracy. Except for 1/2 pixel precision, the optional motion vector precision of the conventional coding unit is the same as or different from the optional motion vector precision of the affine coding unit. In one embodiment, the conventional coding unit also includes four optional motion vector precisions.
  • AMVR adaptive motion vector precision
  • the corresponding motion vector accuracy is adaptively decided at the coding end, and the result of the decision is written into the code stream and passed to the decoding end.
  • the identifier indicating the accuracy of the motion vector of the affine coding unit is consistent with the identifier indicating the accuracy of the motion vector of the conventional coding unit, so that the two modes are more unified.
  • the method when the motion estimation method 400 is applied to the decoding end, before selecting one of the at least four motion vector precisions to perform motion estimation in the reference frame, the method further includes: obtaining a code stream, so The identification bit of the code stream records the motion vector accuracy of the selected affine coding unit, the identifier representing the motion vector accuracy of the affine coding unit and the identifier representing the motion vector accuracy of the conventional coding unit Consistent.
  • step S420 the affine coding unit is divided into several sub-units, and in step S430, the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.
  • step S420 and step S430 reference may be made to the related description of step S120 and step S130 of the method 100, which will not be repeated here.
  • the motion estimation method adds 1/2 pixel precision to the optional motion vector precision in affine mode, so that the precision of the motion vector in affine mode is the same as that in normal mode.
  • the precision design is unified, and the coding performance is improved.
  • the following describes a motion estimation system 500 according to an embodiment of the present invention with reference to FIG. 5.
  • FIG. 5 is a schematic block diagram of a motion estimation system 500 according to an embodiment of the present invention.
  • the motion estimation system 500 shown in FIG. 5 includes a processor 510, a storage device 520, and a computer program stored on the storage device 520 and running on the processor 510.
  • the processor implements the foregoing figure when the program is executed. Steps of the motion estimation method 100 shown in 1 or the motion estimation method 400 shown in FIG. 4.
  • the processor 510 may be a central processing unit (CPU), an image processing unit (GPU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other forms with data processing capabilities and/or instruction execution capabilities
  • the processor 510 may be a central processing unit (CPU) or other form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the motion estimation system 500 to execute The desired function.
  • the processor 510 can include one or more embedded processors, processor cores, microprocessors, logic circuits, hardware finite state machines (FSM), digital signal processors (DSP), or combinations thereof.
  • the storage device 520 includes one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 210 may run the program instructions to implement the motion estimation method in the embodiments of the present invention (implemented by the processor) described below. And/or other desired functions.
  • Various application programs and various data such as various data used and/or generated by the application program, can also be stored in the computer-readable storage medium.
  • the system 500 further includes an input device (not shown).
  • the input device may be a device used by the user to input instructions, and may include one of operation keys, a keyboard, a mouse, a microphone, and a touch screen. Or more.
  • the input device may also be any interface for receiving information.
  • the system 500 further includes an output device that can output various information (such as images or sounds) to the outside (such as a user), and may include a display (such as displaying a video image to the user), One or more of speakers, etc.
  • the output device may also be any other device with output function.
  • system 500 further includes a communication interface, which is used to communicate with other devices, including wired or wireless communication.
  • the processor implements the following steps when executing the program: For the affine coding unit in the current frame, select one of at least four motion vector precisions to perform motion estimation in the reference frame, In this way, the motion vector of the control point of the affine coding unit is determined; the affine coding unit is divided into several sub-units; the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point. Motion vector.
  • the processor implements the following steps when executing the program: For the affine coding unit in the current frame, select one of a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the The motion vector of the control point of the affine coding unit, wherein the various motion vector precisions include 1/2 pixel precision; the affine coding unit is divided into a number of sub-units; the calculation is based on the motion vector of the control point The motion vector of the sub-unit in the affine coding unit.
  • the embodiment of the present invention also provides a storage medium on which a computer program is stored.
  • the computer program is executed by the processor, the steps of the method shown in FIG. 1 or FIG. 4 can be implemented.
  • the storage medium is a computer-readable storage medium.
  • the computer-readable storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only Memory (CD-ROM), USB memory, or any combination of the above storage media.
  • the computer-readable storage medium may be any combination of one or more computer-readable storage media.
  • the computer program instructions when run by the computer or processor, cause the computer or processor to perform the following steps: For the affine coding unit in the current frame, select one of at least four motion vector precisions to Perform motion estimation in the reference frame to determine the motion vector of the control point of the affine coding unit; divide the affine coding unit into several subunits; calculate the affine coding according to the motion vector of the control point The motion vector of the sub-unit in the unit.
  • the computer program instructions when run by the computer or processor, cause the computer or processor to perform the following steps: For the affine coding unit in the current frame, select one from a variety of motion vector precisions. Perform motion estimation in a reference frame to determine the motion vector of the control point of the affine coding unit, wherein the various motion vector precisions include 1/2 pixel precision; the affine coding unit is divided into several sub- Unit; calculate the motion vector of the sub-unit in the affine coding unit according to the motion vector of the control point.
  • the motion estimation method, system and storage medium of the present invention unify the design of the motion vector accuracy in the affine mode with the motion vector accuracy in the conventional mode, improve the coding performance, and can be used to improve the quality of compressed video. Enhancing the hardware friendliness of the codec is of great significance to the video compression processing of broadcast television, video conference, network video, etc.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented.
  • the various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some modules according to the embodiments of the present invention.
  • DSP digital signal processor
  • the present invention can also be implemented as a device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for realizing the present invention may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种运动估计方法、***和存储介质,所述方法包括:对于当前帧中的仿射编码单元,从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量(S110);将所述仿射编码单元划分为若干个子单元(S120);根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量(S130)。该运动估计方法、***和存储介质使仿射模式下的运动矢量精度与常规模式下的运动矢量精度的设计相统一,提高了编码性能。

Description

运动估计方法、***和存储介质 技术领域
本发明涉及视频编解码技术领域,具体而言涉及一种运动估计方法、***和存储介质。
背景技术
视频编码的基本原理是利用空域、时域和码字之间的相关性,尽可能去除冗余。目前的视频编码方案主要包括帧内预测、帧间预测、变换、量化、熵编码和环路滤波等部分。
其中,帧间预测技术利用视频相邻帧之间的时域相关性,使用先前已经编码的重构帧作为参考帧,通过运动估计(motion estimation,ME)和运动补偿(motion compensation,MC)对当前帧(即当前正在编码的帧)进行预测,从而去除视频的时间冗余信息。其中,由于视频中邻近帧之间存在着一定的相关性,因此,可将图像分成若干编码单元,搜索出每个编码单元在邻近帧中的位置,并得出两者之间的空间位置的相对偏移量,得到的相对偏移量就是通常所指的运动矢量(motion vector,MV),得到运动矢量的过程被称为运动估计。而运动补偿就是利用MV和参考帧得到预测帧的过程,此过程得到的预测帧可能和原始的当前帧有一定的差别,因此需要将预测帧和当前帧的差值经过变换、量化等过程之后与MV信息传递到解码端,这样解码端通过MV、参考帧、以及预测帧和当前帧的差值,就可以重构出当前帧。
运动估计是影响视频编码效率的重要环节,因此如何优化运动估计方法一直是本领域技术人员关注的问题。
发明内容
在发明内容部分中引入了一系列简化形式的概念,这将在具体实施方式部分中进一步详细说明。本发明的发明内容部分并不意味着要 试图限定出所要求保护的技术方案的关键特征和必要技术特征,更不意味着试图确定所要求保护的技术方案的保护范围。
针对现有技术的不足,本发明实施例第一方面提供了一种运动估计方法,所述方法包括:
对于当前帧中的仿射编码单元,从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量;
将所述仿射编码单元划分为若干个子单元;
根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
本发明实施例第二方面提供了另一种运动估计方法,所述方法包括:
对于当前帧中的仿射编码单元,从多种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量,其中,所述多种运动矢量精度包括1/2像素精度;
将所述仿射编码单元划分为若干个子单元;
根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
本发明实施例第三方面提供了一种运动估计***,所述***包括存储装置和处理器,所述存储装置上存储有由所述处理器运行的计算机程序,所述计算机程序在被所述处理器运行时执行如上所述的运动估计方法。
本发明实施例第四方面提供了一种存储介质,所述存储介质上存储有计算机程序,所述计算机程序在运行时执行如上所述的运动估计方法。
本发明的运动估计方法、***和存储介质使仿射模式下的运动矢量精度与常规模式下的运动矢量精度的设计相统一,提高了编码性能。
附图说明
本发明的下列附图在此作为本发明的一部分用于理解本发明。附图中示出了本发明的实施例及其描述,用来解释本发明的原理。
附图中:
图1示出了根据本发明一实施例的运动估计方法的流程图;
图2示出了根据本发明一实施例的仿射编码单元的控制点的运动矢量的示意图;
图3示出了根据本发明一实施例的仿射编码单元的子单元的运动矢量的示意图;
图4示出了根据本发明另一实施例的运动估计方法的流程图;
图5示出了根据本发明一实施例的运动估计***的结构框图。
具体实施方式
为了使得本发明的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本发明的示例实施例。显然,所描述的实施例仅仅是本发明的一部分实施例,而不是本发明的全部实施例,应理解,本发明不受这里描述的示例实施例的限制。基于本发明中描述的本发明实施例,本领域技术人员在没有付出创造性劳动的情况下所得到的所有其它实施例都应落入本发明的保护范围之内。
在下文的描述中,给出了大量具体的细节以便提供对本发明更为彻底的理解。然而,对于本领域技术人员而言显而易见的是,本发明可以无需一个或多个这些细节而得以实施。在其他的例子中,为了避免与本发明发生混淆,对于本领域公知的一些技术特征未进行描述。
应当理解的是,本发明能够以不同形式实施,而不应当解释为局限于这里提出的实施例。相反地,提供这些实施例将使公开彻底和完全,并且将本发明的范围完全地传递给本领域技术人员。
在此使用的术语的目的仅在于描述具体实施例并且不作为本发明的限制。在此使用时,单数形式的“一”、“一个”和“所述/该”也意图包括复数形式,除非上下文清楚指出另外的方式。还应明白术语“组成”和/或“包括”,当在该说明书中使用时,确定所述特征、整数、步骤、操作、元件和/或部件的存在,但不排除一个或更多其它的特征、整数、步骤、操作、元件、部件和/或组的存在或添加。在此使用时,术语“和/或”包括相关所列项目的任何及所有组合。
为了彻底理解本发明,将在下列的描述中提出详细的步骤以及详 细的结构,以便阐释本发明提出的技术方案。本发明的较佳实施例详细描述如下,然而除了这些详细描述外,本发明还可以具有其他实施方式。
本发明实施例的运动估计方法可以应用到视频编解码技术中的帧间预测部分。为了更好地理解本发明实施例的运动估计方法,下面首先对视频编解码进行介绍。
视频一般由多帧图像按照一定的次序组成。一帧图像内往往存在着大量空间结构相同或者相似之处,也就是说视频文件中存在大量的空间冗余信息。此外。由于视频相邻两帧之间的采样时间间隔极短,因此相邻两帧中通常存在大量的相似性,即视频中还存在大量的时间冗余信息。此外,从人眼的视觉敏感度的角度出发,视频信息中也存在可以用来压缩的部分,即视觉冗余信息。
除了上述空间冗余、时间冗余和视觉冗余外,视频图像信息中还存在信息熵冗余、结构冗余、知识冗余、重要性冗余等等一系列的冗余信息。视频编码的目的就在于去除视频序列中的冗余信息,以达到减小存储空间和节省传输带宽的效果。
目前,视频编码主要包括帧内预测、帧间预测、变换、量化、熵编码、环路滤波几个部分,本发明实施例主要针对帧间预测部分进行改进。帧间预测技术利用视频相邻帧之间的时域相关性,使用先前已经编码的重构帧作为参考帧,通过运动估计和运动补偿对当前帧(当前正在编码的帧)进行预测,从而去除视频的时间冗余信息。
下面结合附图,对本申请的运动估计方法、***和存储介质进行详细说明。在不冲突的情况下,下述的实施例及实施方式中的特征可以相互组合。本发明实施例中所描述的运动估计方法、***和存储介质使用HEVC标准或其扩展。然而,本发明也适用于其它编码标准,例如H.264标准、下一代视频编码标准VVC、AVS3或任何其他合适的编码标准。
图1示出了根据本发明的一个实施例的、运动估计方法100的流程图。如图1所示,方法100包括如下步骤:
在步骤S110,对于当前帧中的仿射编码单元,从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射 编码单元的控制点的运动矢量。
其中,所述当前帧即当前待编码的视频帧。当前帧可以是实时采集的视频帧,也可以是从存储介质中提取的视频帧。
参考帧是在对当前帧进行编码时所要参考的视频帧。参考帧可以是对可作为参考帧的视频帧对应的编码数据重构得到的重构视频帧。根据帧间预测类型的不同,参考帧可为前向参考帧、后向参考帧或双向参考帧。具体地,帧间预测技术包括前向预测、后向预测、双向预测等。前向预测利用当前帧的前一帧(历史帧)作为参考帧对当前帧进行预测。后向预测利用当前帧之后的帧(将来帧)作为参考帧对当前帧进行预测。双向预测是不仅利用历史帧、也利用将来帧对当前帧进行预测。本实施例中采用双向预测模式,即参考帧既包括历史帧又包括将来帧。
当前帧中的仿射编码单元,即基于仿射运动补偿预测(Affine)技术在当前帧中划分的编码单元(Coding Unit,CU)。
具体地,传统的运动模型只包括平移运动,然而现实存在很多种运动形式,例如缩放、旋转、透视运动等无规则的运动,由此引入了Affine技术。Affine技术中的处理单位不再是整个编码单元,而是将整个编码单元划分为多个子单元,在运动补偿的过程中,以子单元为单位进行运动补偿。
与常规编码单元相比,Affine模式下的仿射编码单元不再只有一个运动矢量,而是仿射编码单元中的每一个子单元有各自的运动矢量。在确定仿射编码单元的控制点的运动矢量之后,仿射编码单元中每个子单元的运动矢量通过仿射编码单元的两个控制点(即四参数模型,参见图2中左侧图)或三个控制点(即六参数模型,参见图2中右侧图)的运动矢量计算导出,在码流中只需要写入控制点的运动矢量的信息,而不需要写入每个子单元的运动矢量的信息。
如上所述,为了确定子单元的运动矢量,首先确定控制点的运动矢量。在运动估计的过程,由于自然物体运动的连续性,物体在相邻两帧之间的运动矢量不一定刚好是整数个像素单位,因此本发明实施例采用自适应运动矢量精度(AMVR)技术在编码端自适应地确定运动矢量的精度。在本发明实施例中,控制点的运动矢量的确定是基于 Affine模式下的Inter模式(又称AMVP模式),在该模式下,会在编码端进行运动矢量精度的选择,以及进行MVD(Motion Vector Difference,运动矢量差值)的计算。
在一个实施例中,可选的运动矢量精度共包括四种,对于每个编码单元,从四种运动矢量精度中选择一种以进行运动估计。所述至少四种运动矢量精度包括4像素、2像素、整像素、1/2像素、1/4像素、1/8像素和1/16像素中的任意四种。例如,四种运动矢量精度可以为整像素精度、1/2像素精度、1/4像素精度和1/16像素精度。
在当前视频编码软件VTM-6.0中,常规AMVP模式包括四种AMVR精度。因此与以往的采用三种精度的Affine模式相比,本发明实施例增加了一位精度,从而使仿射编码单元可用的运动矢量精度的数量与常规编码单元可用的运动矢量精度的数量相同,进而使Affine模式下的自适应运动矢量精度的设计与常规AMVP模式下自适应运动矢量精度的设计相统一。在一个实施例中,本发明实施例新增的精度为1/2像素精度。
需要说明的是,对于Affine模式中所涉及的1/16精度、1/4精度、整像素精度等所指的控制点的运动矢量的精度,而不是子单元做运动补偿的过程中实际使用的运动矢量的精度。
在一个实施例中,确定运动矢量精度的方法包括:根据临近编码单元已选定的运动矢量精度来选择所述运动矢量精度。
在另一实施例中,确定运动矢量精度的方法还可以包括:基于所述四种运动矢量精度中的至少两种尝试进行运动估计,并基于所述运动估计的效果选择所述运动矢量精度。
具体地,可以从所述四种可选的运动矢量精度中选择两种运动矢量精度,分别尝试进行运动估计,并比较两次所述运动估计的效果。例如,可以选择1/2像素精度和整像素精度,分别进行运动估计。
之后,比较运动估计的效果。若使用较低的运动矢量精度的运动估计效果较好,则停止尝试,并直接将所述较低的运动矢量精度作为选定的运动矢量精度。例如,若采用整像素精度进行运动估计的效果优于采用1/2像素精度进行运动估计的效果,则不再进行其他精度的尝试,直接选定整像素精度。若使用较高的运动矢量精度的运动估计 效果较好,则继续使用更高的运动矢量精度尝试进行运动估计,直到获得最佳的运动估计效果。例如,若采用1/2像素精度进行运动估计的效果优于整像素精度,则继续尝试1/4像素精度进行运动估计。若1/2像素精度下的运动估计的效果优于1/4像素精度下的运动估计的效果,则选定1/2像素精度作为运动矢量精度。若1/4像素精度下的运动估计的效果优于1/2像素精度下的运动估计的效果,则还可以继续比较1/8像素精度下的运动估计的效果。
在一个实施例中,所述确定所述仿射编码单元的控制点的运动矢量包括:首先,获取空域或时域临近编码单元的运动矢量,并根据所述空域临近编码单元或时域临近编码单元的运动矢量的组合构建候选列表。
其中,该过程获取的运动矢量可以是Affine模式的编码单元的控制点的运动矢量,也可以是传统模式下常规编码单元的运动矢量。之后,将所获取到的运动矢量进行组合,以构建控制点运动矢量的候选列表,每个组合中运动矢量的个数取决于仿射编码单元的控制点的数目。
之后,在所述候选列表中选取一组运动矢量作为所述仿射编码单元的控制点的预测运动矢量(MVP),并根据所述预测运动矢量在所述参考帧中进行运动估计,以确定所述仿射编码单元的控制点的实际运动矢量。例如,可以根据预测运动矢量,在参考帧中确定对应的参考块。之后,对参考块进行插值处理,以生成分数像素点,进而确定所述实际运动矢量。
编码端还可以计算实际运动矢量与预测运动矢量之间的差值MVD(Motion Vector Difference),对MVD进行编码,并且将编码的MVD和预测运动矢量在候选列表中的索引发送到解码端。
如上所述,运动矢量的精度包括整像素精度和分数像素精度,而由于分数像素位置的像素本身不存在,因此需要通过对参考块进行插值来获取亚像素位置处的像素。插值是为了利用整数像素点的值生成各个整数样本之间的分数像素点。整数像素点之间生成越多分数像素点,参考块的分辨率变得越高,可越精确和准确地补偿分数像素精度的位移。随着插值精度的提升,运动估计和运动补偿的效率会有一定 程度的提升。
具体而言,Affine模式的运动矢量的精度可以是整数,即整像素精度,例如整像素、2像素;也可以是非整数的,即亚像素精度,例如1/2、1/4、1/8等精度。作为示例,1/2精度位置的像素需要通过整像素位置的像素进行插值来得到。其他精度位置的像素值需要使用整像素精度像素或1/2精度像素进一步插值来获得。
示例性地,可以根据所选定的运动矢量精度选择插值滤波器,以对所述参考块进行插值处理。
在一个实施例中,可以对于所有运动矢量精度均采用同一种插值滤波器。例如,对于所有的运动矢量精度,均默认采用现有的六抽头的插值滤波器。在这种情况下,可以不在码流中设置表征插值滤波器类型的标识位,从而节省一位数据位。
在另一实施例中,可以根据不同的运动矢量精度选择不同的插值滤波器。例如,由于对于常规的AMVP模式,仅1/2精度采用6抽头插值滤波器,其它精度均采用8抽头插值滤波器。因而在本发明一个实施例中,当选择1/2像素精度作为所述运动矢量精度时,选择第一插值滤波器,以对所述参考块进行插值处理;当选择除1/2像素精度以外的其他精度作为所述运动矢量精度时,选择第二插值滤波器,以对所述参考块进行插值处理,其中,所述第一插值滤波器和所述第二插值滤波器的抽头数量不同。进一步地,所述第一插值滤波器可以为6抽头插值滤波器,所述第二插值滤波器可以为8抽头插值滤波器。由此,使Affine模式下的插值滤波器设计与传统AMVP模式下的插值滤波器设计更为匹配。
进一步地,若根据不同的运动矢量精度选择不同的插值滤波器,则可以在码流中设置滤波器类型的标识位。例如,可以用1表示使用6抽头的插值滤波器;用0表示不使用6抽头的插值滤波器,即使用默认的8抽头的插值滤波器。
因而,在一个实施例中,当应用于解码端时,若不同的运动矢量精度选择不同的插值滤波器,则根据所述运动矢量精度选择插值滤波器之前,运动估计方法200还包括:获取码流,所述码流中设置有运动矢量对应的滤波器类型的标识位。
如上所述,在获取临近编码单元的运动矢量以构建候选列表的过程中,所获取到的可以是Affine模式的编码单元的控制点的运动矢量,也可以是传统模式下常规编码单元的运动矢量。也就是说,运动估计可以同时包括Affine模式和常规AMVP模式。在常规AMVP下,对于当前帧中所划分的常规编码单元,以整个编码单元为单位进行运动估计。
对于每个常规编码单元,当应用自适应运动矢量精度(AMVR)时,同样包括从四种运动矢量精度中自适应地选择一种以进行运动估计。所述常规编码单元的四种运动矢量精度与所述仿射编码单元的四种运动矢量精度相同或不同。例如,所述四种运动矢量精度可以包括整像素、4像素,1/4像素和1/2像素精度。然而需要注意的是,所述运动矢量精度不限于上述四种,例如其还可以包括1/8像素、1/16像素等。
对于每一个采用AMVR技术的常规编码单元,在编码端自适应地决策其对应的运动矢量精度,并将决策的结果写进码流传递到解码端。在本发明实施例中,表示仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致,从而使两种模式更加统一。
因而,在一个实施例中,当所述运动估计方法200应用于解码端时,在从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计之前还包括:获取码流,所述码流的标识位中记录有所选定的所述仿射编码单元的运动矢量精度,表示所述仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致。
在步骤S120,将所述仿射编码单元划分为若干个子单元。
其中,子单元的尺寸可以是固定的,例如每个子单元都被划分为4×4像素大小。或者,子单元的尺寸也可以是通过其他方式确定的,例如,可以选取合适尺寸的子单元,以降低编解码的复杂度。
之后,在步骤S130中,根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
示例性地,Affine模式的运动场可以通过两个控制点(四参数)或三个控制点(六参数)的运动矢量导出。在确定控制点的运动矢量 之后,对于四参数(两个控制点)的仿射编码单元,位于(x,y)位置的子单元的运动矢量通过以下公式(1)计算得到:
Figure PCTCN2019107601-appb-000001
其中,(mv 0x,mv 0y)为左上角控制点的运动矢量,(mv 1x,mv 1y)为右上角控制点的运动矢量,x、y为子单元中心点处的坐标,w为仿射编码单元的宽度。
对于六参数(三个控制点)的仿射编码单元,位于(x,y)位置处的子单元的运动矢量通过以下公式(2)计算得到:
Figure PCTCN2019107601-appb-000002
其中,(mv 0x,mv 0y)为左上角控制点的运动矢量,(mv 1x,mv 1y)为右上角控制点的运动矢量,(mv 2x,mv 2y)为左下角控制点的运动矢量,w为仿射编码单元的宽度。
经过上述公式的计算,一个仿射编码单元中运动矢量的示意图如图3所示,其中每个方格代表4×4大小的子单元。在上述公式计算之后的所有运动矢量都会舍入为1/16像素精度的表示。色度分量和亮度分量的子单元的大小都是4×4,色度分量4×4子单元的运动矢量可以由其对应的四个4×4的亮度分量的运动矢量平均得到。在计算得到每一个子单元的运动矢量之后,经过运动补偿过程可以得到每一个子单元在参考帧中的预测块。之后,利用运动矢量和预测块可以得到预测帧,编码端将预测帧和实际的当前帧之间的差值经过变换、量化等处理之后传递到解码端,解码端通过运动矢量、参考帧、以及预测帧和当前帧之间的差值可以重构出当前帧。
基于上面的描述,根据本发明实施例的运动估计方法使仿射模式下的运动矢量精度与常规模式下的运动矢量精度的设计相统一,提高了编码性能。
图4示出了根据本发明的另一实施例的、运动估计方法400的流程图。如图4所示,方法400包括如下步骤:
在步骤S410,对于当前帧中的仿射编码单元,从多种运动矢量 精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量,其中,所述多种运动矢量精度包括1/2像素精度;
在步骤S420,将所述仿射编码单元划分为若干个子单元;
在步骤S430,根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
在步骤S410中,所述当前帧即当前待编码的视频帧。参考帧是在对当前帧进行编码时所要参考的视频帧。本实施例中参考帧既包括历史帧又包括将来帧。
当前帧中的仿射编码单元,即基于仿射运动补偿预测(Affine)技术在当前帧中划分的编码单元(Coding Unit,CU)。Affine技术中的处理单位不再是整个编码单元,而是将整个编码单元划分为多个子单元,在运动补偿的过程中,以子单元为单位进行运动补偿。
与常规编码单元相比,Affine模式下的仿射编码单元不再只有一个运动矢量,而是仿射编码单元中的每一个子单元有各自的运动矢量。在确定仿射编码单元的控制点的运动矢量之后,仿射编码单元中每个子单元的运动矢量通过仿射编码单元的两个控制点(即四参数模型,参见图2中左侧图)或三个控制点(即六参数模型,参见图2中右侧图)的运动矢量计算导出,在码流中只需要写入控制点的运动矢量的信息,而不需要写入每个子单元的运动矢量的信息。
如上所述,为了确定子单元的运动矢量,首先需要确定控制点的运动矢量。在运动估计的过程,由于自然物体运动的连续性,物体在相邻两帧之间的运动矢量不一定刚好是整数个像素单位,因此本发明实施例采用自适应运动矢量精度(AMVR)技术在编码端自适应地确定运动矢量的精度。在本发明实施例中,控制点的运动矢量的确定是基于Affine模式下的Inter模式(又称AMVP模式),在该模式下,会在编码端进行运动矢量精度的选择,以及进行MVD(Motion Vector Difference,运动矢量差值)的计算。
在本实施例中,从多种运动矢量精度中选择一种以在参考帧中进行运动估计,其中,所述多种运动矢量精度包括1/2像素精度。示例性地,对于每个编码单元,可以从四种运动矢量精度中选择一种以进 行运动估计。除了固定的1/2像素精度以外,所述多种运动矢量精度包括4像素、2像素、整像素、1/4像素、1/8像素和1/16像素中的任意几种。例如,可以从整像素精度、1/2像素精度、1/4像素精度和1/16像素精度中选择一种以进行运动估计。
在当前视频编码软件VTM-6.0中,常规AMVP模式新增了1/2像素的AMVR精度。因此本发明实施例在可选的运动矢量精度中增加了1/2像素精度,从而使仿射编码单元的运动矢量精度的设计与常规编码单元的运动矢量精度的设计相匹配。
需要说明的是,对于Affine模式中所涉及的1/16精度、1/4精度、整像素精度等所指的控制点的运动矢量的精度,而不是子单元做运动补偿的过程中实际使用的运动矢量的精度。
在一个实施例中,确定运动矢量精度的方法包括:根据临近编码单元已选定的运动矢量精度来选择所述运动矢量精度。
在另一实施例中,确定运动矢量精度的方法还可以包括:基于所述四种运动矢量精度中的至少两种尝试进行运动估计,并基于所述运动估计的效果选择所述运动矢量精度。
具体地,可以从所述四种可选的运动矢量精度中选择两种运动矢量精度,分别尝试进行运动估计,并比较两次所述运动估计的效果。之后,比较运动估计的效果。若使用较低的运动矢量精度的运动估计效果较好,则停止尝试,并直接将所述较低的运动矢量精度作为选定的运动矢量精度。若使用较高的运动矢量精度的运动估计效果较好,则继续使用更高的运动矢量精度尝试进行运动估计,直到获得最佳的运动估计效果。
在一个实施例中,所述确定所述仿射编码单元的控制点的运动矢量包括:首先,获取空域或时域临近编码单元的运动矢量,并根据所述空域临近编码单元或时域临近编码单元的运动矢量的组合构建候选列表。之后,将所获取到的运动矢量进行组合,以构建控制点运动矢量的候选列表,每个组合中运动矢量的个数取决于仿射编码单元的控制点的数目。
之后,在所述候选列表中选取一组运动矢量作为所述仿射编码单元的控制点的预测运动矢量(MVP),并根据所述预测运动矢量在所 述参考帧中进行运动估计,以确定所述仿射编码单元的控制点的实际运动矢量。例如,可以根据预测运动矢量,在参考帧中确定对应的参考块。之后,对参考块进行插值处理,以生成分数像素点,进而确定所述实际运动矢量。插值是为了利用整数像素点的值生成各个整数样本之间的分数像素点。整数像素点之间生成越多分数像素点,参考帧的分辨率变得越高,可越精确和准确地补偿分数像素精度的位移。随着插值精度的提升,运动估计和运动补偿的效率会有一定程度的提升。
具体而言,Affine模式的运动矢量的精度可以是整数,即整像素精度,例如整像素、2像素;也可以是非整数的,即亚像素精度,例如1/2、1/4、1/8等精度。作为示例,1/2精度位置的像素需要通过整像素位置的像素进行插值来得到。其他精度位置的像素值需要使用整像素精度像素或1/2精度像素进一步插值来获得。
示例性地,可以根据所选定的运动矢量精度选择插值滤波器,以对所述参考块进行插值处理。
在一个实施例中,可以对于所有运动矢量精度均采用同一种插值滤波器。例如,对于所有的运动矢量精度,均默认采用现有的六抽头的插值滤波器。在这种情况下,可以不在码流中设置表征插值滤波器类型的标识位,从而节省一位数据位。
在另一实施例中,可以根据不同的运动矢量精度选择不同的插值滤波器。例如,由于对于常规的AMVP模式,仅1/2精度采用6抽头插值滤波器,其它精度均采用8抽头插值滤波器。因而在本发明一个实施例中,当选择1/2像素精度作为所述运动矢量精度时,选择第一插值滤波器,以对所述参考块进行插值处理;当选择除1/2像素精度以外的其他精度作为所述运动矢量精度时,选择第二插值滤波器,以对所述参考块进行插值处理,其中,所述第一插值滤波器和所述第二插值滤波器的抽头数量不同。进一步地,所述第一插值滤波器可以为6抽头插值滤波器,所述第二插值滤波器可以为8抽头插值滤波器。由此,使Affine模式下的插值滤波器设计与传统AMVP模式下的插值滤波器设计更为匹配。
进一步地,若根据不同的运动矢量精度选择不同的插值滤波器,则可以在码流中设置滤波器类型的标识位。例如,可以用1表示使用 6抽头的插值滤波器;用0表示不使用6抽头的插值滤波器,即使用默认的8抽头的插值滤波器。
因而,在一个实施例中,当应用于解码端时,若不同的运动矢量精度选择不同的插值滤波器,则根据所述运动矢量精度选择插值滤波器之前,运动估计方法400还包括:获取码流,所述码流中设置有运动矢量对应的滤波器类型的标识位。
在一个实施例中,运动估计可以同时包括Affine模式和常规AMVP模式。在常规AMVP下,对于当前帧中所划分的常规编码单元,以整个编码单元为单位进行运动估计。
对于每个常规编码单元,当应用自适应运动矢量精度(AMVR)时,同样包括从多种运动矢量精度中自适应地选择一种以进行运动估计,所述多种运动矢量精度包括1/2像素精度。除1/2像素精度以外,所述常规编码单元的可选的运动矢量精度与所述仿射编码单元的可选的运动矢量精度相同或不同。在一个实施例中,常规编码单元同样包括四种可选的运动矢量精度。
对于每一个采用AMVR技术的常规编码单元,在编码端自适应地决策其对应的运动矢量精度,并将决策的结果写进码流传递到解码端。在本发明实施例中,表示仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致,从而使两种模式更加统一。
因而,在一个实施例中,当所述运动估计方法400应用于解码端时,在从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计之前还包括:获取码流,所述码流的标识位中记录有所选定的所述仿射编码单元的运动矢量精度,表示所述仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致。
之后,在步骤S420,将所述仿射编码单元划分为若干个子单元,并在步骤S430,根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。步骤S420和步骤S430的具体细节可以参照方法100的步骤S120和步骤S130的相关描述,在此不再赘述。
基于上面的描述,根据本发明实施例的运动估计方法在仿射模式下可选的运动矢量精度中增加了1/2像素精度,使仿射模式下的运动 矢量精度与常规模式下的运动矢量精度的设计相统一,提高了编码性能。
下面结合图5描述根据本发明实施例的运动估计***500。
图5是本发明实施例的运动估计***500的一个示意性框图。图5所示的运动估计***500包括:处理器510、存储装置520及存储在所述存储装置520上且在所述处理器510上运行的计算机程序,处理器执行所述程序时实现前述图1所示的运动估计方法100或图4所示的运动估计方法400的步骤。
所述处理器510可以是中央处理单元(CPU)、图像处理单元(GPU)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,所述处理器510可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制所述运动估计***500中的其它组件以执行期望的功能。例如,处理器510能够包括一个或多个嵌入式处理器、处理器核心、微型处理器、逻辑电路、硬件有限状态机(FSM)、数字信号处理器(DSP)或其组合。
所述存储装置520包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器210可以运行所述程序指令,以实现下文所述的本发明实施例中(由处理器实现)的运动估计方法以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。
在一种实施方式中,***500还包括输入装置(未示出),所述输入装置可以是用户用来输入指令的装置,并且可以包括操作键、键盘、鼠标、麦克风和触摸屏等中的一个或多个。此外,所述输入装置也可以是任何接收信息的接口。
在一种实施方式中,***500还包括输出装置,所述输出装置可以向外部(例如用户)输出各种信息(例如图像或声音),并且可以包括显示器(例如向用户显示视频图像等)、扬声器等中的一个或多个。此外,所述输出装置也可以是任何其他具备输出功能的设备。
在一种实施方式中,***500还包括通信接口,通信接口用于与其他设备之间进行通信,包括有线或者无线方式的通信。
具体地,在一个实施例中,处理器执行所述程序时实现以下步骤:对于当前帧中的仿射编码单元,从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量;将所述仿射编码单元划分为若干个子单元;根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
在另一个实施例中,处理器执行所述程序时实现以下步骤:对于当前帧中的仿射编码单元,从多种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量,其中,所述多种运动矢量精度包括1/2像素精度;将所述仿射编码单元划分为若干个子单元;根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
另外,本发明实施例还提供了一种存储介质,其上存储有计算机程序。当所述计算机程序由处理器执行时,可以实现前述图1或图4所示的方法的步骤。
例如,该存储介质为计算机可读存储介质。计算机可读存储介质例如可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、或者上述存储介质的任意组合。计算机可读存储介质可以是一个或多个计算机可读存储介质的任意组合。
在一个实施例中,所述计算机程序指令在被计算机或处理器运行时使计算机或处理器执行以下步骤:对于当前帧中的仿射编码单元,从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量;将所述仿射编码单元 划分为若干个子单元;根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
在另一个实施例中,所述计算机程序指令在被计算机或处理器运行时使计算机或处理器执行以下步骤:对于当前帧中的仿射编码单元,从多种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量,其中,所述多种运动矢量精度包括1/2像素精度;将所述仿射编码单元划分为若干个子单元;根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
综上所述,本发明的运动估计方法、***和存储介质使仿射模式下的运动矢量精度与常规模式下的运动矢量精度的设计相统一,提高了编码性能,可用于提升压缩视频质量,提升编解码器的硬件友好性,对广播电视、电视会议、网络视频等的视频压缩处理具有重要意义。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性, 机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。
尽管这里已经参考附图描述了示例实施例,应理解上述示例实施例仅仅是示例性的,并且不意图将本发明的范围限制于此。本领域普通技术人员可以在其中进行各种改变和修改,而不偏离本发明的范围和精神。所有这些改变和修改意在被包括在所附权利要求所要求的本发明的范围之内。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每 个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个设备,或一些特征可以忽略,或不执行。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本发明并帮助理解各个发明方面中的一个或多个,在对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该本发明的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如相应的权利要求书所反映的那样,其发明点在于可以用少于某个公开的单个实施例的所有特征的特征来解决相应的技术问题。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域的技术人员可以理解,除了特征之间相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的一些模块的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
以上所述,仅为本发明的具体实施方式或对具体实施方式的说明,本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。本发明的保护范围应以权利要求的保护范围为准。

Claims (36)

  1. 一种运动估计方法,其特征在于,所述方法包括:
    对于当前帧中的仿射编码单元,从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量;
    将所述仿射编码单元划分为若干个子单元;
    根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
  2. 根据权利要求1所述的方法,其特征在于,所述四种运动矢量精度包括4像素、2像素、整像素、1/2像素、1/4像素、1/8像素和1/16像素中的任意四种。
  3. 根据权利要求1所述的方法,其特征在于,所述确定所述仿射编码单元的控制点的运动矢量包括:
    获取空域临近编码单元或时域临近编码单元的运动矢量,并根据所述空域临近编码单元或时域临近编码单元的运动矢量的组合构建候选列表;
    在所述候选列表中选取一组运动矢量作为所述仿射编码单元的控制点的预测运动矢量;
    根据所述预测运动矢量在所述参考帧中进行运动估计,以确定所述仿射编码单元的控制点的实际运动矢量。
  4. 根据权利要求1所述的方法,其特征在于,还包括:对于所述当前帧中的常规编码单元,以整个编码单元为单位进行运动估计。
  5. 根据权利要求4所述的方法,其特征在于,所述仿射编码单元可用的运动矢量精度的数量与所述常规编码单元可用的运动矢量精度的数量相同。
  6. 根据权利要求5所述的方法,其特征在于,还包括:对于每 个常规编码单元,从四种运动矢量精度中自适应地选择一种以进行所述常规编码单元的运动估计,所述常规编码单元的四种运动矢量精度与所述仿射编码单元的四种运动矢量精度相同或不同。
  7. 根据权利要求6所述的方法,其特征在于,还包括:在码流标识位中记录所选定的所述仿射编码单元的运动矢量精度,表示所述仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致。
  8. 根据权利要求6所述的方法,其特征在于,所述从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,之前还包括:
    获取码流,所述码流的标识位中记录有所选定的所述仿射编码单元的运动矢量精度,表示所述仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致。
  9. 根据权利要求1所述的方法,其特征在于,每个所述仿射编码单元包括两个控制点或三个控制点。
  10. 根据权利要求1所述的方法,其特征在于,还包括:根据所述运动矢量精度选择插值滤波器,以对参考块进行插值处理。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述运动矢量精度选择插值滤波器包括:
    根据不同的运动矢量精度选择不同的插值滤波器;或者
    对于所有运动矢量精度均采用同一种插值滤波器。
  12. 根据权利要求11所述的方法,其特征在于,还包括:若根据不同的运动矢量精度选择不同的插值滤波器,则在码流中设置滤波器类型的标识位。
  13. 根据权利要求11所述的方法,其特征在于,不同的运动矢 量精度选择不同的插值滤波器;
    所述根据所述运动矢量精度选择插值滤波器,之前还包括:
    获取码流,所述码流中设置有运动矢量对应的滤波器类型的标识位。
  14. 根据权利要求11所述的方法,其特征在于,所述根据不同的运动矢量精度选择不同的插值滤波器包括:
    当选择1/2像素精度作为所述运动矢量精度时,选择第一插值滤波器,以对参考块进行插值处理;
    当选择除1/2像素精度以外的其他精度作为所述运动矢量精度时,选择第二插值滤波器,以对所述参考块进行插值处理,其中,
    所述第一插值滤波器和所述第二插值滤波器的抽头数量不同。
  15. 根据权利要求14所述的方法,其特征在于,所述第一插值滤波器为6抽头插值滤波器。
  16. 根据权利要求14所述的方法,其特征在于,所述第二插值滤波器为8抽头插值滤波器。
  17. 根据权利要求1所述的方法,其特征在于,所述从四种运动矢量精度中选择一种以在参考帧中进行运动估计,包括:
    根据临近编码单元已选定的运动矢量精度来选择所述运动矢量精度。
  18. 根据权利要求1所述的方法,其特征在于,所述从四种运动矢量精度中自适应地选择一种以在参考帧中进行运动估计,包括:
    基于所述四种运动矢量精度中的至少两种尝试进行运动估计,并基于所述运动估计的效果选择所述运动矢量精度。
  19. 根据权利要求18所述的方法,其特征在于,所述基于所述四种运动矢量精度中的至少两种尝试进行运动估计,并基于所述运动 估计的效果选择所述运动矢量精度,包括:
    从所述四种运动矢量精度中选择两种运动矢量精度,分别尝试进行运动估计,并比较两次所述运动估计的效果;
    若使用较低的运动矢量精度的运动估计效果较好,则停止尝试,并直接将所述较低的运动矢量精度作为选定的运动矢量精度,若使用较高的运动矢量精度的运动估计效果较好,则继续使用更高的运动矢量精度尝试进行运动估计,直到获得最佳的运动估计效果。
  20. 根据权利要求1所述的方法,其特征在于,所述参考帧包括所述当前帧之前的视频帧和所述当前帧之后的视频帧。
  21. 一种运动估计方法,其特征在于,所述方法包括:
    对于当前帧中的仿射编码单元,从多种运动矢量精度中选择一种以在参考帧中进行运动估计,从而确定所述仿射编码单元的控制点的运动矢量,其中,所述多种运动矢量精度包括1/2像素精度;
    将所述仿射编码单元划分为若干个子单元;
    根据所述控制点的运动矢量计算所述仿射编码单元中的所述子单元的运动矢量。
  22. 根据权利要求21所述的方法,其特征在于,所述多种运动矢量精度还包括4像素精度、2像素精度、整像素精度、1/4像素精度、1/8像素精度和1/16像素精度中的至少一种。
  23. 根据权利要求21所述的方法,其特征在于,所述确定所述仿射编码单元的控制点的运动矢量包括:
    获取空域临近编码单元或时域临近编码单元的运动矢量,并根据所述空域临近编码单元或时域临近编码单元的运动矢量的组合构建候选列表;
    在所述候选列表中选取一组运动矢量作为所述仿射编码单元的控制点的预测运动矢量;
    根据所述预测运动矢量在所述参考帧中进行运动估计,以确定所 述仿射编码单元的控制点的实际运动矢量。
  24. 根据权利要求21所述的方法,其特征在于,还包括:对于所述当前帧中的常规编码单元,以整个编码单元为单位进行运动估计。
  25. 根据权利要求24所述的方法,其特征在于,所述仿射编码单元可用的运动矢量精度的数量与所述常规编码单元可用的运动矢量精度的数量相同。
  26. 根据权利要求25所述的方法,其特征在于,还包括:在码流标识位中记录所选定的所述仿射编码单元的运动矢量精度,表示所述仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致。
  27. 根据权利要求26所述的方法,其特征在于,所述从至少四种运动矢量精度中选择一种以在参考帧中进行运动估计,之前还包括:
    获取码流,所述码流的标识位中记录有所选定的所述仿射编码单元的运动矢量精度,表示所述仿射编码单元的运动矢量精度的标识与表示所述常规编码单元的运动矢量精度的标识相一致。
  28. 根据权利要求21所述的方法,其特征在于,还包括:根据所述运动矢量精度选择插值滤波器,以对参考块进行插值处理。
  29. 根据权利要求28所述的方法,其特征在于,所述根据所述运动矢量精度选择插值滤波器包括:
    根据不同的运动矢量精度选择不同的插值滤波器;或者
    对于所有运动矢量精度均采用同一种插值滤波器。
  30. 根据权利要求29所述的方法,其特征在于,所述根据不同的运动矢量精度选择不同的插值滤波器包括:
    当选择所述1/2像素精度作为所述运动矢量精度时,选择第一插 值滤波器,以对参考块进行插值处理;
    当选择除所述1/2像素精度以外的其他精度作为所述运动矢量精度时,选择第二插值滤波器,以对所述参考块进行插值处理,
    所述第一插值滤波器和所述第二插值滤波器的抽头数量不同。
  31. 根据权利要求30所述的方法,其特征在于,所述第一插值滤波器为6抽头插值滤波器。
  32. 根据权利要求30所述的方法,其特征在于,所述第二插值滤波器为8抽头插值滤波器。
  33. 根据权利要求29所述的方法,其特征在于,还包括:若根据不同的运动矢量精度选择不同的插值滤波器,则在码流中设置滤波器类型的标识位。
  34. 根据权利要求33所述的方法,其特征在于,不同的运动矢量精度选择不同的插值滤波器;
    所述根据所述运动矢量精度选择插值滤波器,之前还包括:
    获取码流,所述码流中设置有运动矢量对应的滤波器类型的标识位。
  35. 一种运动估计***,其特征在于,所述***包括存储装置和处理器,所述存储装置上存储有由所述处理器运行的计算机程序,所述计算机程序在被所述处理器运行时执行如权利要求1-34中的任一项所述的运动估计方法。
  36. 一种存储介质,其特征在于,所述存储介质上存储有计算机程序,所述计算机程序在运行时执行如权利要求1-34中的任一项所述的运动估计方法。
PCT/CN2019/107601 2019-09-24 2019-09-24 运动估计方法、***和存储介质 WO2021056215A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980066902.6A CN112868234A (zh) 2019-09-24 2019-09-24 运动估计方法、***和存储介质
PCT/CN2019/107601 WO2021056215A1 (zh) 2019-09-24 2019-09-24 运动估计方法、***和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/107601 WO2021056215A1 (zh) 2019-09-24 2019-09-24 运动估计方法、***和存储介质

Publications (1)

Publication Number Publication Date
WO2021056215A1 true WO2021056215A1 (zh) 2021-04-01

Family

ID=75165894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/107601 WO2021056215A1 (zh) 2019-09-24 2019-09-24 运动估计方法、***和存储介质

Country Status (2)

Country Link
CN (1) CN112868234A (zh)
WO (1) WO2021056215A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113630602B (zh) * 2021-06-29 2024-07-02 杭州未名信科科技有限公司 编码单元的仿射运动估计方法、装置、存储介质及终端
CN113630601B (zh) * 2021-06-29 2024-04-02 杭州未名信科科技有限公司 一种仿射运动估计方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107277506A (zh) * 2017-08-15 2017-10-20 中南大学 一种基于自适应运动矢量精度的运动矢量精度快速选择方法及装置
CN108781284A (zh) * 2016-03-15 2018-11-09 联发科技股份有限公司 具有仿射运动补偿的视频编解码的方法及装置
CN109155854A (zh) * 2016-05-27 2019-01-04 松下电器(美国)知识产权公司 编码装置、解码装置、编码方法及解码方法
CN109792532A (zh) * 2016-10-04 2019-05-21 高通股份有限公司 用于视频译码的适应性运动向量精准度
CN110620932A (zh) * 2018-06-19 2019-12-27 北京字节跳动网络技术有限公司 依赖模式的运动矢量差精度集

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011021914A2 (ko) * 2009-08-21 2011-02-24 에스케이텔레콤 주식회사 적응적 움직임 벡터 해상도를 이용한 영상 부호화/복호화 방법 및 장치
US20180176596A1 (en) * 2015-06-05 2018-06-21 Intellectual Discovery Co., Ltd. Image encoding and decoding method and image decoding device
WO2018131986A1 (ko) * 2017-01-16 2018-07-19 세종대학교 산학협력단 영상의 부호화/복호화 방법 및 장치
WO2019072187A1 (en) * 2017-10-13 2019-04-18 Huawei Technologies Co., Ltd. PRUNING A MOVEMENT MODEL CANDIDATE LIST FOR AN INTERPRETING
CN109729352B (zh) * 2017-10-27 2020-07-21 华为技术有限公司 确定仿射编码块的运动矢量的方法和装置
US20190246134A1 (en) * 2018-02-06 2019-08-08 Panasonic Intellectual Property Corporation Of America Encoding method, decoding method, encoder, and decoder
KR102424189B1 (ko) * 2018-02-14 2022-07-21 후아웨이 테크놀러지 컴퍼니 리미티드 적응형 보간 필터

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108781284A (zh) * 2016-03-15 2018-11-09 联发科技股份有限公司 具有仿射运动补偿的视频编解码的方法及装置
CN109155854A (zh) * 2016-05-27 2019-01-04 松下电器(美国)知识产权公司 编码装置、解码装置、编码方法及解码方法
CN109792532A (zh) * 2016-10-04 2019-05-21 高通股份有限公司 用于视频译码的适应性运动向量精准度
CN107277506A (zh) * 2017-08-15 2017-10-20 中南大学 一种基于自适应运动矢量精度的运动矢量精度快速选择方法及装置
CN110620932A (zh) * 2018-06-19 2019-12-27 北京字节跳动网络技术有限公司 依赖模式的运动矢量差精度集

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG ZHAO, WANG SHIQI, ZHANG JIAN, MA SIWEI: "Adaptive Progressive Motion Vector Resolution Selection Based on Rate–Distortion Optimization", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEE SERVICE CENTER , PISCATAWAY , NJ, US, vol. 26, no. 1, 1 January 2017 (2017-01-01), US, pages 400 - 413, XP055795908, ISSN: 1057-7149, DOI: 10.1109/TIP.2016.2627814 *

Also Published As

Publication number Publication date
CN112868234A (zh) 2021-05-28

Similar Documents

Publication Publication Date Title
TWI729422B (zh) 色彩分量間的子區塊移動向量繼承
US11178419B2 (en) Picture prediction method and related apparatus
CN112470474B (zh) 视频编解码的方法和装置
CN112889269B (zh) 视频解码方法及装置
US11856220B2 (en) Reducing computational complexity when video encoding uses bi-predictively encoded frames
JP6490203B2 (ja) 画像予測方法および関連装置
JP2021182752A (ja) 画像予測方法および関連装置
WO2017005146A1 (zh) 视频编码和解码方法、视频编码和解码装置
JP6905093B2 (ja) 映像コーディングにおける動き補償予測のオプティカルフロー推定
WO2019242563A1 (zh) 视频编码、解码方法、装置、存储介质和计算机设备
TW201813396A (zh) 用於視訊編解碼的基於模型的運動向量推導
JP6945654B2 (ja) 低減されたメモリアクセスを用いてfrucモードでビデオデータを符号化又は復号する方法及び装置
WO2020140331A1 (zh) 视频图像处理方法与装置
TW201526617A (zh) 影像處理方法與系統、解碼方法、編碼器與解碼器
JP2022508074A (ja) スキップ及びマージモードのためのマルチ仮説のシグナリング及び動きベクトル差分によるマージの距離オフセットテーブルのシグナリングのための方法及び装置
KR102059066B1 (ko) 모션 벡터 필드 코딩 방법 및 디코딩 방법, 및 코딩 및 디코딩 장치들
JP2022515031A (ja) ビデオコーディングのための方法、機器及びコンピュータ・プログラム
WO2016065872A1 (zh) 图像预测方法及相关装置
WO2017201678A1 (zh) 图像预测方法和相关设备
KR20200125698A (ko) 서브-블록 모션 벡터 예측을 위한 방법 및 장치
TWI790662B (zh) 一種編解碼方法、裝置及其設備
WO2021056215A1 (zh) 运动估计方法、***和存储介质
TW201937924A (zh) 用於改進獲得線性分量樣本預測參數的方法以及裝置
CN110719467B (zh) 色度块的预测方法、编码器及存储介质
WO2022227622A1 (zh) 一种权值可配置的帧间帧内联合预测编解码的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946975

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946975

Country of ref document: EP

Kind code of ref document: A1