CN112514392A

CN112514392A - Method and apparatus for video encoding

Info

Publication number: CN112514392A
Application number: CN202080004137.8A
Authority: CN
Inventors: 王悦名; 郑萧桢
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2021-03-16
Also published as: WO2021163862A1

Abstract

The application provides a video coding method and device, which can reduce coding complexity, reduce design difficulty of a hardware video encoder, reduce power consumption of the hardware encoder and improve performance of the hardware encoder. The video coding method comprises the following steps: determining a search center of pixel search of a current block according to a motion vector of a corresponding block of the current block in an encoded frame; and performing pixel search on the current block based on the search center to obtain the motion vector of the current block. According to the technical scheme, the center of the pixel search of the current block is determined directly according to the motion vector of the corresponding block of the current block, so that multi-P search is not needed to determine the center of the pixel search of the current block, at least one level of pipeline structure occupied by the multi-P search in a hardware encoder is deleted, the design difficulty of the hardware encoder is reduced, the calculation complexity and the hardware power consumption in the encoding process are reduced, and the overall performance of the hardware encoder is improved.

Description

Method and apparatus for video encoding

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.

Technical Field

The present application relates to the field of digital video coding and decoding technology, and more particularly, to a method and apparatus for video coding.

Background

In the currently common video encoding and decoding technology, the encoding and compressing process of the video includes: the processes of block division, prediction, transformation, quantization and entropy coding form a hybrid video coding framework. The Prediction is divided into two Prediction modes, Intra Prediction (Intra Prediction) and Inter Prediction (Inter Prediction). Specifically, the inter-frame prediction mode searches a prediction block which is most similar to the current image block to be coded from a reference frame by using the time-domain correlation of the video, obtains a residual between a pixel value of the current image block to be coded and a pixel value of the prediction block, and codes the residual based on the residual value, so that the video data can be coded and compressed, and the bandwidth occupied by video storage and transmission is reduced.

In a hardware video encoder, multiple-stage pipeline structures such as multiple-time search (multi-P search), integer-pixel search, sub-pixel search and the like are often used for inter-frame prediction search of a video to be encoded, and the multi-stage pipeline structures are complex, have higher design difficulty, are high in computation complexity, are long in search time consumption, and affect the performance and power consumption of the hardware encoder.

Therefore, how to reduce the encoding complexity, reduce the design difficulty of the hardware video encoder, reduce the power consumption of the hardware encoder, and improve the performance of the hardware encoder is a technical problem to be solved urgently.

Disclosure of Invention

Compared with the prior art, the video coding method and device can reduce coding complexity, reduce design difficulty of a hardware video coder, reduce power consumption of the hardware coder and improve performance of the hardware coder.

In a first aspect, a method for video coding is provided, including: determining a search center of pixel search of a current block according to a motion vector of a corresponding block of the current block in an encoded frame; and performing pixel search on the current block based on the search center to obtain the motion vector of the current block.

According to the technical scheme, the center of the pixel search of the current block is determined directly according to the motion vector of the corresponding block of the current block, so that multi-P search is not needed to determine the pixel search center of the current block, and at least one level of pipeline structure occupied by the multi-P search in a hardware encoder is deleted, so that the design difficulty of the hardware encoder is reduced, the computing resources required in the searching process are reduced, the computing complexity and the hardware power consumption in the encoding process are reduced, and the overall performance of the hardware encoder is improved.

In a second aspect, an apparatus for video coding and decoding is provided, including: a processor to: determining a search center of pixel search of a current block according to a motion vector of a corresponding block of the current block in an encoded frame; and performing pixel search on the current block based on the search center to obtain the motion vector of the current block.

In a third aspect, an electronic device is provided, which includes the video encoding apparatus provided in the second aspect.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a computer, causes the computer to perform the method provided by the first aspect.

In a fifth aspect, a computer program product is provided comprising instructions which, when executed by a computer, cause the computer to perform the method provided in the first aspect.

Drawings

Fig. 1 is an architecture diagram of a solution to which an embodiment of the present application is applied.

Fig. 2 is a schematic diagram of a video coding framework according to an embodiment of the present application.

Fig. 3 is a video frame division according to an embodiment of the present application.

Fig. 4 shows four macroblock partitions according to an embodiment of the present application.

Fig. 5 is a schematic diagram of a video decoding framework according to an embodiment of the present application.

Fig. 6 is a flowchart illustrating a hardware video encoder implementing video encoding according to an embodiment of the present application.

Fig. 7 is a schematic flow chart of a video encoding method according to an embodiment of the present application.

Fig. 8 is a schematic diagram illustrating a spatial position relationship between a corresponding block and a current block according to an embodiment of the present application.

Fig. 9 is a schematic diagram illustrating another spatial position relationship between a corresponding block and a current block according to an embodiment of the present application.

Fig. 10 is a schematic flow chart of another video encoding method according to an embodiment of the present application.

Fig. 11 is a schematic flow chart of another video encoding method according to an embodiment of the present application.

Fig. 12 is a schematic flow chart of another video encoding method according to an embodiment of the present application.

Fig. 13 is a schematic flow chart of another video encoding method according to an embodiment of the present application.

Fig. 14 is a schematic block diagram of a video encoding device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

The embodiment of the application can be applied to standard or non-standard image or video codecs. For example, the video codec is applicable to any one of the international video coding standard H.264/MPEG-AVC, H.265/MEPG-HEVC, the national audio and video coding standard AVS2, the H.266/VVC international standard, the AVS3 national standard and the future evolution audio and video codec standard.

It should be understood that the specific examples are provided herein only to assist those skilled in the art in better understanding the embodiments of the present application and are not intended to limit the scope of the embodiments of the present application.

It should also be understood that the formula in the embodiment of the present application is only an example, and is not intended to limit the scope of the embodiment of the present application, and the formula may be modified, and the modifications should also fall within the scope of the protection of the present application.

It should also be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should also be understood that the various embodiments described in this specification can be implemented individually or in combination, and the examples in this application are not limited thereto.

Unless otherwise defined, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

As shown in FIG. 1, the system 100 can receive the data 102 to be processed, process the data 102 to be processed, and generate processed data 108. For example, the system 100 may receive data to be encoded, encoding the data to be encoded to produce encoded data, or the system 100 may receive data to be decoded, decoding the data to be decoded to produce decoded data. In some embodiments, the components in system 100 may be implemented by one or more processors, which may be processors in computing devices, as well as processors in mobile devices (e.g., cameras, drones, unmanned vehicles, unmanned boats, etc.). The processor may be any kind of processor, which is not limited in this application. In some possible designs, the processor may include an encoder, a decoder, a codec, or the like. One or more memories may also be included in the system 100. The memory may be used to store instructions and data, such as computer-executable instructions to implement aspects of embodiments of the present application, pending data 102, processed data 108, and the like. The memory may be any kind of memory, which is not limited in this embodiment of the present application.

The data to be encoded may include text, images, graphical objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensory data from sensors, which may be visual sensors (e.g., cameras, infrared sensors), microphones, near-field sensors (e.g., ultrasonic sensors, radar), position sensors, temperature sensors, touch sensors, and so forth. In some cases, the data to be encoded may include information from the user, e.g., biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA samples, and the like.

Fig. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application. As shown in fig. 2, after receiving the video to be encoded, each frame of the video to be encoded is encoded in turn starting from the 0 th frame of the video to be encoded. Wherein, the current coding frame mainly passes through: and (3) processing Prediction (Prediction), transformation (Transform), Quantization (Quantization), Entropy Coding (encoding) and the like, and finally outputting the code stream of the current Coding frame. Correspondingly, the decoding process generally decodes the received code stream according to the inverse process of the above process to recover the video frame information before decoding.

Specifically, as shown in fig. 2, the video coding framework 2 includes a coding control module 201 for performing decision control actions and parameter selection during the coding process. For example, as shown in fig. 2, the encoding control module 202 controls parameters used in transformation, quantization, inverse quantization, and inverse transformation, and controls the selection of intra-frame or inter-frame modes, and the parameter control of motion estimation and filtering, and the control parameters of the encoding control module 202 are also input into the entropy encoding module and encoded to form a part of the encoded code stream.

When the encoding of the current encoded frame is started, the encoded frame is divided 202, for example, first, it is divided into slices (Slice) and then into blocks. Alternatively, the block division may be performed directly on the encoded frame. Optionally, in an example, as shown in (a) of fig. 3, the Coding frame is divided into a plurality of non-overlapping largest Coding Tree Units (CTUs), and the size of the CTUs may be from 16x16 to 64x64, as shown in (b) of fig. 3, each CTU may also be iteratively divided into a series of smaller Coding units (Coding units, CUs) according to any rule or irregularity, such as a quadtree, a binary Tree, a ternary Tree, or an octree, respectively, and in some examples, as shown in (c) of fig. 3, each CU may be further divided into one or two or four Prediction Units (PUs) when performing intra-frame Prediction. Each CU may be further divided into at least one Transform Unit (TU) when transformed, where a PU is a basic Unit for prediction and a TU is a basic Unit for Transform and quantization. In some examples, a PU and a TU are each divided into one or more blocks on a CU basis, where a PU includes multiple Prediction Blocks (PBs) and associated syntax elements. In some examples, the PU and TU may be the same or derived from the CU by different partitioning methods. In some examples, at least two of the CU, PU, and TU are the same, e.g., without distinguishing the CU, PU, and TU, all are predicted, quantized, and transformed in units of CUs. It should be understood that the block partitioning in the encoded frame includes, but is not limited to, the above, and the block partitioning provided by different encoding standards is slightly different, and there will be more and more flexible block partitioning in the future evolution standard.

Alternatively, in another example, the encoded frame is divided into a plurality of non-overlapping macroblocks (Macro blocks) having a size of 16 × 16 pixels for the luminance component of the image and 8 × 8 pixels for the chrominance component of the image. A macroblock of 16x16 pixels can be divided into sub-macroblocks in 4 ways. For example, as shown in fig. 4 (a), a first division manner into 2N × 2N is divided to obtain one 16 × 16 sub-macroblock, as shown in fig. 4 (b), a second division manner into 2N × N is divided to obtain two 16 × 8 sub-macroblocks, as shown in fig. 4 (c), a third division manner into N × 2N is divided to obtain two 8 × 16 sub-macroblocks, as shown in fig. 4 (d), and a fourth division manner into N × N is divided to obtain four 8 × 8 sub-macroblocks. Further, each sub-macroblock may be further divided into a plurality of sub-blocks, and the size of each sub-block may be 8 × 16, 16 × 8, 8 × 8, 4 × 8, 8 × 4, or 4 × 4 pixels. These partitions and subblocks greatly improve the correlation between macroblocks, enabling further improvement in compression rate.

For convenience of description, a CTU, CU, macroblock, sub-block, or other formed unit of data is hereinafter referred to as an encoded block.

It should be understood that in the embodiments of the present application, the data unit for video coding may be a frame, a slice, a coding tree unit, a coding block or a group of any of the above. The size of the data units may vary in different embodiments.

Specifically, as shown in fig. 2, after the encoded frame is divided into a plurality of encoded blocks, a prediction process is performed to remove redundant information in spatial domain and temporal domain of the current encoded frame. The currently used prediction coding methods include intra-frame prediction and inter-frame prediction. Intra-frame prediction uses only the reconstructed information in the current frame image to predict the current coding block, while inter-frame prediction uses information in other frame images (also called reference frames) that have been reconstructed before to predict the current coding block. Specifically, in the embodiment of the present application, the encoding control module 202 is configured to decide to select intra prediction or inter prediction.

When the intra-frame prediction mode is selected, the intra-frame prediction 203 includes obtaining reconstructed blocks of adjacent blocks coded around the current coding block as reference blocks, calculating predicted values to generate prediction blocks by adopting a prediction mode method based on pixel values of the reference blocks, subtracting corresponding pixel values of the current coding block and the prediction blocks to obtain residual errors of the current coding block, and transforming 204, quantizing 205 and entropy coding 210 the residual errors of the current coding block to form a code stream of the current coding block. Furthermore, after all the coding blocks of the current coding frame are subjected to the coding process, a part of the coding code stream of the coding frame is formed. In addition, the control and reference data generated in intra prediction 203 is also entropy encoded 210, forming part of the encoded code stream.

In particular, the transform 204 is used to remove correlation of the residuals of the image blocks in order to improve coding efficiency. For the transformation of the residual data of the current coding block, two-dimensional Discrete Cosine Transform (DCT) transformation and two-dimensional Discrete Sine Transform (DST) transformation are usually adopted, for example, at the encoding end, the residual information of the coding block is multiplied by an N × M transformation matrix and its transpose matrix, respectively, and the Transform coefficient of the current coding block is obtained after multiplication.

After the transform coefficients are generated, quantization 205 is used to further improve the compression efficiency, the transform coefficients are quantized to obtain quantized coefficients, and then entropy Coding 210 is performed on the quantized coefficients to obtain the residual code stream of the current Coding block, wherein the entropy Coding method includes, but is not limited to, Content Adaptive Binary Arithmetic Coding (CABAC) entropy Coding.

Specifically, the encoded neighboring blocks in the intra prediction 203 process are: before the current coding block is coded, the residual error generated in the coding process of the adjacent block is transformed 204, quantized 205, dequantized 206 and inverse transformed 207, and then is added to the prediction block of the adjacent block to obtain a reconstructed block. Correspondingly, inverse quantization 206 and inverse transform 207 are inverse processes of quantization 206 and transform 204, and are used to recover residual data prior to quantization and transformation.

As shown in fig. 2, when the inter prediction mode is selected, the inter prediction process includes Motion Estimation (ME) 208 and Motion Compensation (MC) 209. Specifically, Motion estimation 208 is performed according to a reference frame image in the reconstructed video frame, Motion search is performed in one or more reference frame images according to a certain matching criterion, an image block most similar to the current coding block is obtained as a corresponding block of the current coding block, and a relative displacement between the corresponding block and the current coding block is a Motion Vector (MV) of the current coding block. The current coding block is then motion compensated 209 based on the motion vector and the reference frame to obtain a prediction block for the current coding block. And subtracting the original value of the pixel of the coding block from the corresponding pixel value of the prediction block to obtain the residual error of the coding block. The residual of the current coding block is transformed 204, quantized 205 and entropy coded 210 to form a part of the coded stream of the coded frame. In addition, the control and reference data generated in motion compensation 209 is also encoded by entropy coding 210, forming part of the encoded code stream.

As shown in fig. 2, the reconstructed video frame is a video frame obtained after being filtered 211. The filtering 211 is used to reduce compression distortion such as blocking effect and ringing effect generated in the encoding process, the reconstructed video frame is used to provide a reference frame for inter-frame prediction in the encoding process, and the reconstructed video frame is output as a final decoded video after post-processing in the decoding process.

Fig. 5 is a schematic diagram of a video decoding framework 3 according to an embodiment of the present application. As shown in fig. 5, the video decoding performs operation steps corresponding to the video encoding. Firstly, entropy decoding 301 is utilized to obtain one or more data information of residual data, prediction syntax, intra-frame prediction syntax, motion compensation syntax and filtering syntax in the coded code stream. The residual data is subjected to inverse quantization 302 and inverse transformation 303 to obtain original residual data information. Further, it is determined whether the currently decoded block uses intra prediction or inter prediction according to the prediction syntax. If the prediction is intra-frame prediction 304, according to the intra-frame prediction syntax obtained by decoding, utilizing the reconstructed image block in the current frame to construct prediction information according to an intra-frame prediction method; if the prediction is inter-frame prediction, determining a reference block in the reconstructed image according to the motion compensation grammar obtained by decoding to obtain prediction information; then, the prediction information and the residual information are overlapped, and a reconstructed video frame can be obtained through filtering 311, and the decoded video is obtained after the reconstructed video frame is subjected to post-processing 306.

At present, video coding and decoding systems are generally divided into software coding and decoding systems and hardware coding and decoding systems, and the coding and decoding modes and architectures of the two systems have certain differences. The hardware coding and decoding system does not depend on the operation of a CPU, and uses hardware such as a graphics card GPU, a special DSP, an FPGA, an ASIC chip and the like to carry out coding and decoding, the coding and decoding capacity of the hardware coding and decoding system on the high-definition video is higher than that of a software coding and decoding system, the CPU can be released from heavy video decoding operation, and the computer has the capacity of smoothly playing the high-definition video.

The following describes a process for implementing video coding based on a hardware video encoder with reference to fig. 6.

As shown in fig. 6, the hardware video encoder includes multiple stages of pipeline structure modules, such as multiple P search, integer pixel search, fractional pixel search, intra prediction, mode decision, and subsequent entropy coding filtering. Specifically, multi-P search, integer pixel search, and fractional pixel search are all relevant steps of an inter-frame prediction process in video coding.

The multi-P searching module executes multi-P searching of an Nth coding block in a frame to be coded from the t moment to the t +1 moment, then executes multi-P searching of an (N + 1) th coding block in the frame to be coded from the t +1 moment to the t +2 moment, and sequentially executes multi-P searching on all coding blocks in the frame to be coded according to the mode. Similarly, the integer pixel searching module executes the integer pixel searching of the nth coding block in the frame to be coded from the t +1 th time to the t +2 th time, and then sequentially continues the integer pixel searching of the (N + 1) th coding block and the (N + 2) th coding block.

In the current hardware video encoder, one or more of a multi-P search result of a current block to be encoded, a search result obtained by using a motion vector of an adjacent block of the current block to be encoded, or a search result obtained by using a global motion vector is often used as a search center, a full search of integer pixels is performed in a smaller range near the search center, after the search result of the integer pixel search is obtained, a structure of the integer pixel search is used as the search center, and a sub-pixel search is further performed, so that a prediction block of the current block to be encoded and a motion vector corresponding to the prediction block are obtained.

The multi-P search refers to the simultaneous multiple down-sampling of a search area in a reference frame and a current block to be coded, then the full search is performed in the area after the down-sampling, and common multi-P search includes 8P search, 4P search, 2P search and the like, wherein the 8P search refers to the simultaneous 8-time down-sampling of the search area and the current block to be coded, then the search is performed in the search area after the 8-time down-sampling according to the current block to be coded after the 8-time down-sampling, and a search result of the 8P search is obtained, namely the center of a block in the search area, which is closest to the current block to be coded after the 8-time down-sampling, is obtained.

In some embodiments, multiple P searches may be performed, in other words, a pipeline level structure of the hardware encoder may include multiple pipeline level modules for multiple P searches, for example, a pipeline level module for 8P search is followed by a pipeline level module for 4P search, and so on.

In other embodiments, the multi-P search and the integer pixel search may be performed in the same pipeline stage, that is, in the pipeline stage structure of the hardware encoder, the multi-P search and the integer pixel search are multiplexed into one pipeline stage module.

It should be understood that fig. 6 only shows a simple pipeline structure of a hardware video encoder, in practice, the hardware encoder further includes video encoding functional modules such as block division, transformation, quantization, etc., and the pipeline structure of the hardware encoder may also have a plurality of different division schemes.

In the current hardware encoder, the multi-P search occupies the pipeline of one-level or multi-level hardware encoder, which increases the design difficulty of the hardware encoder, and even if the multi-P search and the integer pixel search in part of the encoders design the same pipeline, the computing resources in the pipeline are also in tension. In addition, the multi-P search process also occupies a large amount of computational complexity and hardware power consumption, and reduces the overall performance of the hardware encoder.

In order to solve the problems, the multi-P search result is replaced by other motion vectors, so that at least one level of pipeline structure occupied by the multi-P search in a hardware encoder is deleted, the design difficulty of the hardware encoder is reduced, the required computing resource in the motion search process is reduced, the computing complexity and the hardware power consumption of the motion search are reduced, and the overall performance of the hardware encoder is improved.

The following describes the video encoding process in this application in detail with reference to fig. 7 to 13.

Fig. 7 shows a schematic flow diagram of a video encoding method 200. The video encoding method 200 includes an inter-frame prediction process, i.e., a process of performing a pixel search to obtain a motion vector.

As shown in fig. 7, the video encoding method 200 may include the following steps.

S210: and determining the searching center of the pixel search of the current block according to the motion vector of the corresponding block of the current block in the coded frame.

S220: and performing pixel search on the current block based on the search center to obtain the motion vector of the current block.

Specifically, the current block is an image block being encoded in the current frame, and the current block may be a macro block, a sub-macro block, or a sub-block in the current frame, may also be an encoding tree unit or an encoding unit in the current frame, or may also be another type of image block, which is not specifically limited in the embodiment of the present application.

Optionally, in some embodiments, the current block may be a macroblock of 16 × 16 pixels, and of course, in other embodiments, the current block may also be an image block of other size, such as 4 × 4 pixels, 8 × 8 pixels, or 32 × 32 pixels, and the size of the current block is also not specifically limited in this embodiment of the application.

In the inter-frame prediction process, motion search and motion estimation need to be performed on the current block, and as described above, in the hardware encoder, in the motion search process on the current block, multiple P search needs to be performed first to obtain at least one search center, and then pixel search is performed based on the at least one search center to obtain a motion vector of the current block.

In the embodiment of the present application, instead of performing multiple P searches to obtain the search center, the search center of the pixel search of the current block is obtained directly according to the motion vector of the image block in the encoded frame.

Specifically, the encoded frame is an image frame that has been encoded, and image blocks in the encoded frame have undergone processes such as prediction, transformation, quantization, and entropy encoding, so that the image blocks in the encoded frame can obtain their corresponding motion vectors through inter-frame prediction. Specifically, the inter prediction process of the image block in the encoded frame may refer to an inter prediction process in the prior art, and may also refer to an inter prediction process performed by the current block in the following text.

In the embodiment of the present application, an image block corresponding to a current block in an encoded frame is also referred to as a corresponding block, and the size of the corresponding block may be the same as that of the current block, for example, if the current block is an image block of 16 × 16 pixels, the corresponding block may also be an image block of 16 × 16 pixels. In other implementations, the size of the corresponding block of the current block in the encoded frame may also be different from the size of the current block, which is not limited in this embodiment of the application.

It should be understood that in the embodiment of the present application, the types of the current block and the corresponding block may be the same, for example, if the corresponding block is a macroblock, the current block may also be a macroblock. The types of the current block and the corresponding block may also be different, for example, if the corresponding block is a macroblock, the current block may also be a sub-macroblock or a sub-block.

Alternatively, the corresponding block may be any image block in the encoded frame, for example, it may be the first image block in the upper left corner of the encoded frame, or may also be any other image block at any position.

In one possible implementation, the corresponding block is a co-located block of the current block. I.e. the spatial position of the corresponding block in the encoded frame is the same as the spatial position of the current block in the current image frame. For example, as shown in fig. 8, the spatial position of the corresponding block in the encoded frame is the same as the spatial position of the current block in the current image frame.

In another possible implementation, the corresponding block is an adjacent block to a co-located block of the current block, where a spatial position of the co-located block of the current block in the encoded frame is the same as a spatial position of the current block in the current frame, and the corresponding block is adjacent to the co-located block of the current block.

For example, as shown in fig. 9, the corresponding block is located at the lower right corner of the co-located block of the current block, and the corresponding block is the lower right block of the co-located block of the current block. Of course, the corresponding block may also be located at other adjacent positions of the current block, for example, positions of a left adjacent block, a lower left adjacent block, an upper adjacent block, a right adjacent block, or an upper right adjacent block, which is not limited in this embodiment of the application.

In a third possible implementation, first, the neighboring block of the current block in the coded frame is determined as the corresponding block, and if the neighboring block does not exist, the co-located block of the current block in the coded frame is determined as the corresponding block. If the neighboring block exists, the neighboring block is determined as a corresponding block.

For example, the lower right block of the co-located block of the current block in the encoded frame is determined as the corresponding block, and if the lower right block does not exist, the co-located block of the current block in the encoded frame is determined as the corresponding block. And if the lower right block exists, determining the lower right block as a corresponding block. If the current block is the last image block at the lower right corner of the current frame, the co-located block of the current block in the encoded frame is also the last image block at the lower right corner of the encoded frame, and at this time, the lower right block of the co-located block does not exist.

Alternatively, the motion vector of the corresponding block may be stored in a storage unit, which may be a Buffer (Buffer) unit, or any other hardware or software unit for storage.

Further, the motion vector of the corresponding block may be obtained from the buffer, a search center of a pixel search of the current block may be determined in at least one frame of reference frame, and the pixel search may be performed based on the search center to obtain the motion vector of the current block.

In the embodiment of the application, when the current block in the current frame is subjected to inter-frame predictive coding, the current block is not subjected to multi-P search, and pixel search is not performed according to the result of the multi-P search, but the motion vector of the corresponding block in the coded frame is directly obtained, and the search center is determined according to the motion vector to perform pixel search. Therefore, according to the technical scheme of the embodiment of the application, in the hardware video encoder, a multi-P search pipeline structure is not required to be designed, the design difficulty of the hardware encoder is reduced, meanwhile, the calculation resource of the inter-frame prediction encoding process can be saved by omitting the multi-P search process, and the calculation complexity and the hardware power consumption are reduced.

Specifically, after the motion vector of the corresponding block is acquired, a search center is determined in each reference frame of at least one reference frame according to the motion vector, and pixel search is performed in a search area of each reference frame with the search center as a center. Taking the reference frame of the current block as M reference frames as an example,

in some embodiments, the motion vector of the corresponding block of the current block is a motion vector of the current block in one of M reference frames, where M is a positive integer. For example, if the motion vector of the corresponding block is (0,0), the search center of the current block is the position of the current block in the same-bit block in each reference frame, and if the reference frame is an M frame, the search centers of the current block are M in total. For example, if the motion vector of the corresponding block is (u, v), the search center of the current block is the position of the current block after the co-located block in each reference frame has been moved by (u, v).

In some embodiments, the motion vector of the corresponding block of the current block is a motion vector of a corresponding block of the current block in each of M reference frames, which totals M motion vectors, and the search center of the current block is M search centers respectively determined in the M reference frames according to the M motion vectors. For example, a search center of the current block in a first reference frame of the M reference frames is determined according to a first motion vector of the M motion vectors of the corresponding block. Similarly, according to the ith motion vector in the M motion vectors, determining the search center of the current block in the ith reference frame in the M reference frames, wherein i is more than or equal to 1 and less than or equal to M, and i is a positive integer. In this way, the current block determines a total of M search centers.

In other embodiments, the motion vector of the corresponding block of the current block is a motion vector of a corresponding block of the current block in each of M reference frames, which totals M motion vectors, and the search center of the current block is M × M search centers respectively determined in the M reference frames according to the M motion vectors. For example, the current block may determine M search centers in a first reference frame of the M reference frames according to the M motion vectors, and similarly, the current block may also determine M search centers in each of the other reference frames according to the M motion vectors, where the current block determines M × M search centers of the current block. Alternatively, the encoded frame may be an image frame previous to the current frame, in which case the corresponding block of the current block may be an image block in the previous image frame of the current frame, for example, the corresponding block may be a co-located block of the current block in the previous image frame, or a neighboring block of the co-located block of the current block in the previous image frame. The current block has a corresponding block, a motion vector of the corresponding block can be acquired, at least one search center is determined according to the motion vector, and pixel search is performed based on the at least one search center. In addition, the current block may further have a plurality of corresponding blocks in the previous image frame, for example, the corresponding blocks of the current block are the co-located block of the previous image frame and the adjacent blocks of the co-located block, and the current block may obtain the motion vectors of the plurality of corresponding blocks and determine the search center of the current block according to the motion vectors of the plurality of corresponding blocks; or the current block may obtain a mean value or a median value of the motion vectors of the plurality of corresponding blocks, and determine the search center of the current block according to the mean value or the median value of the motion vectors of the plurality of corresponding blocks.

Optionally, the encoded frame may also include a previous multi-frame image of the current frame, in which case the corresponding block of the current block may include at least one image block in each frame image of the previous multi-frame image of the current frame, in other words, the current block also has a plurality of corresponding blocks.

For example, the encoded frames include the first 1 st frame image to the first N frame image of the current frame, where N is a positive integer greater than 1. Each frame of the first 1 frame image to the first N frame image comprises a corresponding block of the current block, and the current block is provided with N corresponding blocks. The N corresponding blocks may be N co-located blocks of the current block in the previous multi-frame image, or may be adjacent blocks of N co-located blocks of the current block in the previous multi-frame image, for example, a lower-right block of the N co-located blocks.

In some embodiments of this embodiment, each of the N corresponding blocks may have one motion vector, N search centers may be determined in each reference frame according to the N motion vectors of the N corresponding blocks, and if the current block has M reference frames, M × N search centers are determined. Or determining a search center in each of the M reference frames according to any one of the N motion vectors, to determine M search centers in total. And determining a search center in each reference frame according to the average value or the median value in the N motion vectors, and determining M search centers of the current block.

In other embodiments of this embodiment, each of the N corresponding blocks may have M motion vectors, the N corresponding blocks have N × M motion vectors, N × M search centers may be determined in each reference frame according to the N × M motion vectors of the N corresponding blocks, and if the current block has M reference frames, the N × M search centers of the current block are determined.

In addition, each frame of image in the previous multi-frame image may further include a plurality of corresponding blocks of the current block, and each corresponding block may have one or more motion vectors. If each corresponding block has one motion vector, N motion vectors can be determined according to N co-located blocks of the current block in each frame of images from the first 1 frame of image to the first N frame of image, N search centers can be determined according to the N motion vectors in each reference frame, similarly, N motion vectors can be determined according to adjacent blocks of N co-located blocks of the current block in each frame of images from the first 1 frame of image to the first N frame of image, N search centers can be determined in each reference frame according to the N motion vectors, and if the current block has M reference frames, each frame of images in the first N frame of image of the current block includes X corresponding blocks, X × N × M search centers of the current block are determined together, wherein X is a positive integer.

The method can also determine N motion vectors according to N co-located blocks of the current block in each frame of images from the first 1 frame of image to the first N frame of image, determine a search center in each reference frame according to a median or a mean of the N motion vectors, similarly determine N motion vectors according to adjacent blocks of N co-located blocks of the current block in each frame of images from the first 1 frame of image to the first N frame of image, determine a search center in each reference frame according to the N motion vectors, and if the current block has M reference frames, determine X × M search centers of the current block if each frame of images from the first N frame of image includes X corresponding blocks.

It should be understood that, the above description is given by taking each frame of image as an example, where each corresponding block includes a plurality of corresponding blocks of the current block, and each corresponding block has one motion vector, and reference may be made to the above description for the case where each corresponding block has a plurality of motion vectors, and details are not described here.

It should also be understood that, in the embodiment of the present application, including but not limited to the above listed several ways of determining the search center of the current block, any way of obtaining the search center of the current block according to the motion vector of at least one corresponding block in the encoded frame may also be used, and this is not specifically limited in the embodiment of the present application. Optionally, in addition to the above embodiment, the search center may be determined according to a motion vector of a corresponding block in the encoded frame, and at least one of the following motion vectors may be used: and determining more search centers of the current block by using a Global Motion Vector (GMV) of the current block, a zero Motion Vector (0,0) and Motion vectors of adjacent blocks of the current block. The neighboring block of the current block refers to an image block adjacent to the current block in the current frame, and the neighboring block of the current block is a predicted block and has a motion vector corresponding thereto.

By adopting the method of the embodiment of the application, the motion vector of the corresponding block in the previous frame or previous multi-frame image of the current frame is utilized, and the global motion vector, the zero motion vector and the motion vector of the adjacent block of the current block are added, a plurality of search centers can be obtained on the basis of a plurality of motion vectors, the design difficulty of a hardware encoder can be reduced, the calculation resources required in the motion search process are reduced, the calculation complexity and hardware power consumption of the motion search are reduced, the encoding performance is not influenced, and the encoding performance can be even improved on a part of video sequences.

The above describes a process of how to obtain at least one search center of the current block according to the corresponding block, and the following describes a process of how to perform pixel search according to the search center to obtain a motion vector of the current block, with reference to fig. 10 to 12.

Fig. 10 shows a schematic flow diagram of another video encoding method 200.

As shown in fig. 10, the above step S220 may include the following steps.

S221: and performing pixel search on the current block based on the search center, and determining the motion vector of the current block under the inter-frame prediction.

S222: and taking the motion vector of the current block under the inter-frame prediction as the motion vector of the current block.

Alternatively, in step S221, a process of performing a pixel search on the current block, that is, a process of finding a matching block that is most similar to the current block in at least one frame of reference frame.

Optionally, at least one reference block of the current block in the reference frame is determined based on the search center, and a Rate Distortion Cost (RD Cost) of the current block encoding is calculated according to a plurality of reference blocks in the at least one frame reference frame, where the RD Cost of the current block encoding is minimum if the current block is encoded based on a matching block that is most similar to the current block among the plurality of reference blocks. Specifically, the calculation method of RD Cost is shown by the following formula:

RD cost＝D+λR；

where d (distortion) represents the degree of distortion between the current block and the reference block in the reference frame, and is generally expressed by Sum of Squares of Errors (SSE), Sum of Absolute Differences (SAD), or Mean Square Error (MSE), and r (rate) represents the number of bits required to encode the current block. R represents the degree of data compression, and the lower R, the higher the degree of data compression and the larger the distortion degree; the larger R is, the smaller the distortion degree is, but the larger storage space is required, and the pressure of network transmission is also increased, so that a balance point needs to be found out in R and D, and the compression effect is optimized. Therefore, the lagrangian method is used to measure the proportion of R and D in RD Cost, and lagrangian multiplier λ is used to multiply R, which is used to weigh the weight of the bit number in RD Cost, which indicates the coding distortion that would be generated by reducing one bit rate.

When the distortion D is calculated, the SAD of the reference block and the current block in the reference frame is calculated without multiplication operation, so that the method is simple and convenient to realize.

Alternatively, instead of determining the matching block of the current block by calculating the RD Cost, the matching block of the current block may be determined by other prior art calculation methods, for example, by only calculating SAD or other reference values of the current block and a plurality of reference blocks of at least one frame reference frame. The embodiment of the present application does not limit the specific calculation manner.

After a matching block is determined in at least one frame reference frame, a motion vector of the current block under inter-frame prediction is determined according to the position vector of the matching block and the position vector of the current block.

In one embodiment, an integer pixel search is performed on a current block based on a search center, an integer pixel precision motion vector of the current block under inter prediction is determined, and the integer pixel precision motion vector is taken as the motion vector of the current block.

Specifically, after a search center of a current block is obtained, a search area is determined in at least one frame of reference frame, and integer pixel search is directly performed in the search area, at this time, pixel values in the current block and the search area are actual pixel values in a corresponding image frame, pixel interpolation processing is not performed on the current block and the search area, and the pixel values in the current block and the search area are both integer pixel precision.

In the process of integer pixel search, a matching block with integer pixel precision is determined in a search area by adopting the calculation method of the RD Cost or other calculation methods, so that a motion vector with integer pixel precision of the current block under inter-frame prediction is determined, and the motion vector with integer pixel precision is taken as the motion vector of the current block.

In addition, after the motion vector and the matching block with the integer pixel precision are obtained, a residual value of the current block can be calculated based on pixel values of the matching block and the current block, and the residual value and the motion vector are subsequently encoded.

In another embodiment, after integer pixel search and sub-pixel search are performed on the current block based on the search center, a motion vector of the current block under inter prediction with sub-pixel precision is determined, and the motion vector with sub-pixel precision is used as the motion vector of the current block.

Specifically, after the search center of the current block is obtained, integer pixel search is performed in a search area with the search center as the center in at least one frame of reference frame, and a matching block of the integer pixel most similar to the current block is obtained through search according to the RD Cost or SAD calculation mode.

In the process of object motion, the motion of the object is not necessarily integer pixel precision, and because the natural object motion has continuity, the motion between two adjacent frames of images is not necessarily in the unit of integer pixels, but may be in the unit of 1/2 pixels, 1/4 pixels or even 1/8 pixels. At this time, if only integer pixel search is used, the problem of inaccurate matching occurs, which results in large residual amplitude and affects the encoding efficiency.

In order to solve the above problem, after the integer pixel search, a sub-pixel search is further performed, that is, a sub-pixel search is further performed in a search area around a matching block obtained by the integer pixel search as a center. Specifically, after performing pixel-division interpolation on the current block and the search area around the matching block, the pixel precision of the current block and the search area reaches 1/2 pixel precision, 1/4 pixel precision, 1/8 pixel precision or higher, for example, if 1 pixel-division point is inserted between two whole pixels, the pixel precision at this time is 1/2, and the pixel-division point is 1/2 pixels. If 3 sub-pixel points are inserted between two whole pixel points, the pixel precision at this time is 1/4, and the 3 sub-pixel points can be respectively called as 1/4 pixel points, 1/2 pixel points and 3/4 pixel points. After the sub-pixel interpolation, at this time, the current block and the search area both have higher pixel precision, and the current block is searched in the current search area to search for a matching block with the sub-pixel precision most similar to the current block.

At present, the coding efficiency of 1/4 pixel precision is obviously improved compared with that of 1/2 pixel precision, but the coding efficiency of 1/8 pixel precision is not obviously improved compared with that of 1/4 pixel precision except for the high code rate condition, and the motion estimation of 1/8 pixel precision is more complicated. Therefore, the current standards h.264 and HEVC use 1/4 pixel precision for motion estimation.

Optionally, any sub-pixel interpolation method in the prior art may be adopted to perform sub-pixel interpolation on the current block and the search area. The pixel precision of the sub-pixel interpolation may be 1/2 pixel precision, 1/4 pixel precision, 1/8 pixel precision, or any other pixel precision, which is not limited in this embodiment of the present application.

For example, in some embodiments, 1/2 pixel points can be obtained by a 6-tap interpolation filter, and then 1/4 pixel points can be obtained by linear interpolation.

In other embodiments, 1/2 pixel points can be obtained through an 8-tap interpolation filter, and 1/4 pixel points can be obtained through a 7-tap interpolation filter.

It should be understood that, in the above calculation process for obtaining the sub-pixel points through the calculation of the interpolation filter, reference may be made to a calculation method in the prior art, and filter coefficients of the above 6-tap interpolation filter, 7-tap interpolation filter, and 8-tap interpolation filter may also be referred to filter coefficients in the prior art, which is not limited in this embodiment of the present application.

And after the current block and the search area around the matching block obtained by the integer pixel search are subjected to sub-pixel interpolation according to the mode, performing sub-pixel search on the search area with the sub-pixel precision to obtain the matching block with the sub-pixel precision which is most approximate to the current block with the sub-pixel precision.

Likewise, the sub-pixel search process may also be applied to rate distortion optimization techniques or other search algorithms. Specifically, according to a plurality of image blocks in the search area, calculating the RD Cost of the current block code, obtaining an image block with the minimum RD Cost as a matching block of the current block with the sub-pixel precision, and then obtaining a motion vector of the current block with the sub-pixel precision under the inter-frame prediction based on the position vector of the matching block and the current block.

After the motion vector and the matching block with the sub-pixel precision are obtained, a residual value of the current block can be obtained by calculation based on the sub-pixel values of the matching block and the current block, and the residual value and the motion vector are subjected to subsequent encoding.

In the third embodiment, based on the search center, the integer pixel search is not performed on the current block, but only the split pixel search is performed on the current block, the motion vector of the current block under inter prediction with split pixel precision is determined, and the motion vector of the split pixel precision is taken as the motion vector of the current block.

Specifically, after a search center of a current block is obtained, the search center is used as a center, a search area is determined in at least one frame of reference frame, pixel-division interpolation is carried out on the search area and the current block, after pixel-division interpolation processing, pixel values in the current block and the search area are both in pixel-division accuracy, pixel-division search is carried out in the search area, and a searched matching block is also in pixel-division accuracy. And determining a motion vector of the current block under the inter-frame prediction in sub-pixel precision according to the position vector of the matching block and the position vector of the current block.

In step S222, a motion vector of the current block under inter prediction is used as a motion vector of the current block, and the motion vector of the current block may be used to determine a search center of a pixel search of a block to be encoded in a frame to be encoded, where the motion vector of the current block under inter prediction may be a motion vector with integer pixel precision or a motion vector with sub-pixel precision.

In other words, when inter prediction is performed on a block to be encoded in a frame to be encoded subsequent to a current frame, a motion vector of the current block may also be obtained, a search center is obtained based on the motion vector of the current block, and pixel search is performed, thereby simplifying the inter prediction process of the block to be encoded. In the encoding process, the inter-frame prediction process of each image block can utilize the motion vector of the image block in the encoded frame to perform pixel search, and the motion vector obtained through the pixel search can be used for determining a search center for the image block in the subsequent frame to be encoded.

In the above embodiment of fig. 10, the pixel search is directly performed on the current block by taking the current block as a unit, so as to obtain the motion vector of the current block. Alternatively, the current block may be partitioned into a plurality of sub-blocks, the sub-blocks are subjected to pixel search in units of each sub-block, and the motion vector of the current block is obtained according to the motion vectors of the plurality of sub-blocks. This embodiment of the method is described in detail below with reference to fig. 11.

Fig. 11 shows a schematic flow diagram of another video encoding method 200.

As shown in fig. 11, the above step S221 may include the following steps.

S2211: and performing block division on the current block by adopting a target block division mode to obtain at least one sub-block.

S2212: and performing pixel search on at least one sub-block in at least one reference frame based on the search center to obtain at least one sub-motion vector of the at least one sub-block.

S2213: and obtaining the motion vector of the current block under the inter-frame prediction according to the at least one sub-motion vector.

Optionally, the current block includes, but is not limited to, a 16 × 16 macroblock, and the current block may be divided according to several manners in fig. 4, for example, the current block is divided according to an N × N division manner of a graph (d) in fig. 4 to obtain 4 8 × 8 sub-blocks, and further, each 8 × 8 sub-block of the 4 8 × 8 sub-blocks may be further divided according to several division manners in fig. 4 to obtain sub-blocks of 4 × 8, 8 × 4, or 4 × 4 pixel size.

There are various block division ways for dividing the current block into a plurality of sub-blocks, and the size of the sub-block includes, but is not limited to, sub-blocks of 16 × 16, 16 × 8, 8 × 16, 8 × 8, 4 × 8, 8 × 4, or 4 × 4 pixel size.

And (4) coding cost (rate distortion cost) required by sub-block coding under the condition of traversing all the block division modes. Specifically, in all block division modes, pixel search is performed on all sub-blocks in each block division mode, and coding cost required by coding all sub-blocks in each block division mode is calculated, so that a target block division mode with the minimum coding cost is obtained.

In some embodiments, the target block division manner is the 2N × 2N division manner of (a) in fig. 4, that is, the size of the sub-block is the same as the size of the current block, and at this time, in the target block division manner, a pixel search is performed on the current block to obtain a motion vector of the current block under inter prediction. After selecting the target block mode, the process of obtaining the motion vector of the current block is the same as the method embodiment in fig. 10 above, and is not described here again.

In other words, in the above embodiment of the method of fig. 10, before step S221, the calculation of the encoding costs of the multiple block modes is also performed, and the finally determined block division mode is the 2N × 2N block division mode, and the current block and the sub-block have the same size.

In other embodiments, the target block division manner is any one of the diagrams (b) to (d) in fig. 4, that is, the current block obtains a plurality of sub-blocks in the target block division manner, and at this time, a pixel search is performed on each of the plurality of sub-blocks to obtain a plurality of motion vectors of the plurality of sub-blocks under inter-frame prediction, where a process of performing the pixel search on each of the plurality of sub-blocks is similar to a process of performing the pixel search on the current block, specifically, in a search area of at least one frame reference frame, a sub-matching block most similar to the sub-block is searched, and the process of searching the sub-matching block may also adopt the RD Cost calculation manner or another calculation manner.

After a plurality of sub-matching blocks corresponding to the sub-blocks are obtained through searching, a plurality of motion vectors of the sub-blocks under inter-frame prediction are obtained according to the position vectors of the sub-blocks and the sub-matching blocks.

Similarly, the pixel search of the sub-block may be divided into an integer pixel search, a sub-pixel search, or both.

For example, if the pixel search of the sub-block is an integer pixel search, the sub-block is directly searched for in the search area without performing pixel interpolation processing on the sub-block and the search area, and the motion vector of the sub-block obtained after the integer pixel search of the sub-block is a motion vector with integer pixel precision.

Or if the pixel search of the sub-block is the sub-pixel search, firstly performing sub-pixel interpolation processing on the sub-block and the search area, then searching the sub-block with the sub-pixel precision in the search area with the sub-pixel precision, and after the sub-pixel search of the sub-block, obtaining the motion vector of the sub-block as the motion vector with the sub-pixel precision.

Or, if the pixel search of the sub-block is integer pixel search and sub-pixel search, the sub-block is first subjected to integer pixel search in the search area to obtain a sub-matching block of the integer pixel, then the sub-search area around the sub-matching block of the integer pixel and the sub-block are subjected to sub-pixel interpolation processing, then the sub-block with sub-pixel precision is searched in the sub-search area with sub-pixel precision, and after the sub-pixel search of the sub-block, the obtained motion vector of the sub-block is the motion vector with sub-pixel precision.

In the embodiment of the present application, for a technical solution related to integer pixel search and sub-pixel search of a sub-block, reference may be made to the above related description of integer pixel search and sub-pixel search of a current block in fig. 10, which is not described herein again.

Optionally, the motion vectors of any one or more sub-blocks in the motion vectors of the plurality of sub-blocks in the target block division mode under inter prediction are used as the one or more motion vectors of the current block under inter prediction.

Optionally, the motion vectors of all sub-blocks in the motion vectors of the plurality of sub-blocks in the target block division manner under inter prediction are used as the plurality of motion vectors of the current block under inter prediction.

Optionally, the mean value or the median value of the motion vectors of the plurality of sub-blocks in the target block division manner in the inter prediction is used as the motion vector of the current block in the inter prediction.

It should be understood that at least one motion vector of the current block under inter prediction may be obtained according to motion vectors of the plurality of sub-blocks under inter prediction in the target block division manner, where the at least one motion vector includes, but is not limited to, a motion vector of at least one of the plurality of sub-blocks, a mean value or a median value of the motion vectors of the plurality of sub-blocks, and other motion vector values may also be obtained for the motion vectors based on the plurality of sub-blocks, which is not limited in this embodiment of the present application.

In the above application embodiments of fig. 10 and 11, in the inter prediction stage, after the pixel search is performed on the current block based on the search center, and the motion vector of the current block under inter prediction is determined, the motion vector of the current block under inter prediction is directly used as the motion vector of the current block.

In addition to the above-mentioned motion vector of the current block under the inter prediction as the motion vector of the current block in the inter prediction stage, the motion vector of the current block can be determined in the subsequent flow of the inter prediction.

Fig. 12 shows a schematic flow diagram of another video encoding method 200.

As shown in fig. 12, the above step S220 may include the following steps.

S223: intra prediction and mode decision are performed on the current block.

S224: and obtaining the motion vector of the current block according to the result of the prediction mode decision.

Specifically, the related art of step S221 may refer to the related description in fig. 10 and fig. 11, and is not described herein again.

And after determining the motion vector of the current block in the inter-frame prediction process, continuously performing intra-frame prediction and mode decision on the current block. Of course, the intra prediction may also occur before or simultaneously with the inter prediction, which is not limited in the embodiment of the present application.

Specifically, the process of intra-frame prediction for the current block in the embodiment of the present application is the same as the process of intra-frame prediction for the current block in the prior art, and related technical solutions may refer to the prior art, which is not described herein again.

After intra prediction and inter prediction are performed, mode decision is performed, that is, encoding costs of the current block in the two prediction modes are compared to select a prediction mode with a smaller encoding cost, or a mode decision related method in the prior art may also be used to perform decision on the prediction mode of the current block, which is not limited in the embodiment of the present application.

Specifically, in step S224, if the current block is encoded in the intra prediction mode after the mode decision, at least one of the global motion vector of the current block, the zero motion vector (0,0), or the motion vector of the neighboring block of the current block is used as the motion vector of the current block, or the motion vector of the current block is set as unavailable. The neighboring block of the current block refers to an image block adjacent to the current block in the current frame, and the neighboring block of the current block is a predicted block and has a motion vector corresponding thereto.

If the current block is coded in an inter-frame prediction mode after mode decision, obtaining a target motion vector corresponding to the current block according to the motion vector of the current block or a time domain motion vector prediction value, and taking the motion vector of the current block under inter-frame prediction as the motion vector of the current block.

Fig. 13 shows a schematic flow diagram of another video encoding method 200.

As shown in fig. 13, the video encoding method 200 further includes:

s230: the motion vector of the current block is stored.

Specifically, the motion vector of the current block is stored in a storage unit, which may be a Buffer (Buffer) unit, or any other hardware or software unit for storage.

Alternatively, the storage unit storing the motion vector of the current block may be the same as the storage unit storing the motion vector of the corresponding block of the current block, in other words, the corresponding block in the encoded frame stores its motion vector into the target storage unit into which the current block in the current frame also stores its motion vector. Further, in the embodiment of the present application, the motion vector of each coding block in the video to be currently coded is stored in the same storage unit.

In addition, in the inter-frame prediction process of a to-be-encoded block in a subsequent to-be-encoded frame, the motion vector of the current block may be acquired from the storage unit, a search center of pixel search of the to-be-encoded block is determined in at least one frame reference frame, and the pixel search is performed based on the search center to obtain the motion vector of the to-be-encoded block, where the current block may be a corresponding block of the to-be-encoded block, and the encoding process of the to-be-encoded block may refer to the encoding method in the above embodiment.

In the above-described embodiments of the application, the current block corresponds to one or more motion vectors.

In some embodiments, this step S230, also stores one or more motion vectors of the current block. In this embodiment, one or more motion vectors are stored for each coding block in an image frame.

If the current block only stores one motion vector in the storage unit, the block to be coded determines a search center in the reference frame according to the motion vector.

For example, if the current block is not divided into blocks, or after the coding cost of the block division is calculated, the target block division mode with the smallest coding cost is the 2N × 2N block division mode, that is, the current block and the sub-block have the same size, and at this time, the motion vectors of the current block after integer pixel search and/or fractional pixel search are all used as the motion vector of the current block. For one or more frames to be coded after the current frame, the block to be coded can select the motion vector of the current block, and determine a search center of the block to be coded in a reference frame.

For another example, after the coding cost calculation of the block division manner, the target block division manner with the minimum coding cost is a non-2N × 2N block division manner, that is, the current block is divided into a plurality of sub-blocks, and at this time, one motion vector of a plurality of motion vectors after integer pixel search and/or fractional pixel search of the plurality of sub-blocks, or a motion vector of a mean value, a median value, or an upper left corner of the plurality of motion vectors is taken as the motion vector of the current block. For one or more frames to be coded after the current frame, the block to be coded can select the motion vector of the current block, and determine a search center of the block to be coded in a reference frame.

If the current block stores a plurality of motion vectors in the storage unit, the block to be coded determines a plurality of search centers in the reference frame according to the plurality of motion vectors.

For example, after the coding cost of the block division manner is calculated, the target block division manner with the minimum coding cost divides the current block into 4 sub-blocks, and then the 4 sub-motion vectors of the 4 sub-blocks after integer pixel search and/or fractional pixel search are all used as the motion vector of the current block. For one or more frames to be coded after the current frame, the block to be coded can select 4 sub-motion vectors of the current block, and determine 4 search centers of the block to be coded in a reference frame.

Of course, 2 sub motion vectors or 3 motion vectors of the 4 sub motion vectors may be used as the motion vector of the current block. For one or more frames to be coded after the current frame, the block to be coded may select 2 sub motion vectors or 3 sub motion vectors of the current block, and determine 2 or 3 search centers of the block to be coded in a reference frame.

In other embodiments, in step S230, any one of the motion vector of the current block and the motion vector of the neighboring block of the current block is stored as the motion vector of the current block and the neighboring block, or the mean value or the median value of the motion vector of the current block and the motion vector of the neighboring block of the current block is stored as the motion vector of the current block and the neighboring block, where the neighboring block of the current block refers to an image block neighboring to the current block in the current frame, and the neighboring block of the current block is a predicted block and has a corresponding motion vector. In the present embodiment, one motion vector is stored for each of a plurality of encoding blocks in an image frame.

Optionally, in addition to using any one of the motion vectors of the current block and the neighboring block as the motion vector of the current block and the neighboring block, or using a median value or an average value of a plurality of motion vectors of the current block and the neighboring block as the motion vector of the current block and the neighboring block, the motion vectors of the current block and the neighboring block obtained through other calculation methods may also be used, for example, weighting calculation is performed on a plurality of motion vectors, and the like, which is not specifically limited in this embodiment of the application.

It should be noted here that, for the scheme in this application, the current frame where the current block is located is not the starting frame in the video to be encoded, and in this application, the starting frame in the video is also referred to as the 0 th frame, in other words, the current frame where the current block is located is the 1 st frame or the image frame after the 1 st frame in the video to be encoded. The method comprises the steps that an intra-frame prediction mode is adopted for coding blocks in a frame 0 of a video to be coded, motion vectors of the coding blocks in the frame 0 can be determined to be global motion vectors or zero motion vectors, the global motion vectors or the zero motion vectors are stored in a storage unit, when inter-frame prediction is conducted on the coding blocks in a frame 1 and subsequent image frames, a search center can be determined according to the global motion vectors or the zero motion vectors of the coding blocks in the frame 0, and the step of multi-P search is also not needed.

The video coding method embodiments of the present application are described in detail above with reference to fig. 7 to 13, and the video coding apparatus embodiments of the present application are described in detail below with reference to fig. 14, it being understood that the apparatus embodiments correspond to the method embodiments and that similar descriptions may refer to the method embodiments.

Fig. 14 is a schematic block diagram of a video encoding device 20 according to an embodiment of the present application.

As shown in fig. 14, the video encoding apparatus 20 includes: a processor 21 and a memory 22;

the processor 21 is configured to: determining a search center of pixel search of a current block according to a motion vector of a corresponding block of the current block in an encoded frame;

and performing pixel search on the current block based on the search center to obtain the motion vector of the current block.

Optionally, the corresponding block is a co-located block of the current block in the encoded frame.

Optionally, the corresponding block is a neighboring block of a co-located block of the current block in the encoded frame.

Optionally, the corresponding block is a lower-right block of a co-located block of the current block in the encoded frame.

Optionally, if the current block does not exist in the bottom right block of the encoded frame, the processor 21 is further configured to: and using the co-located block of the current block in the encoded frame as the corresponding block.

Optionally, the encoded frame is an image frame before the frame where the current block is located.

Optionally, the processor 21 is specifically configured to: determining at least one search center of pixel search of the current block according to at least one motion vector in motion vectors of a plurality of corresponding blocks in the previous multi-frame image; or, determining a search center of the pixel search of the current block according to the average value or the median value of the motion vectors of a plurality of corresponding blocks in the previous multi-frame image.

Optionally, the processor 21 is further configured to: according to the motion vector of the corresponding block, and at least one of the following motion vectors: the global motion vector of the current block, the zero motion vector, and the motion vectors of the neighboring blocks of the current block determine at least one search center for a pixel search of the current block.

Optionally, the processor 21 is specifically configured to: performing pixel search on the current block based on the search center, and determining a motion vector of the current block under inter-frame prediction; and taking the motion vector of the current block under inter prediction as the motion vector of the current block.

Optionally, the processor 21 is specifically configured to: performing pixel search on the current block based on the search center, and determining a motion vector of the current block under inter-frame prediction; performing intra-frame prediction and mode decision on the current block; and obtaining the motion vector of the current block according to the decision result of the prediction mode.

Optionally, if the result of the mode decision is inter prediction, the processor 21 is further configured to: taking the motion vector of the current block under the inter-frame prediction as the motion vector of the current block;

if the result of the mode decision is intra prediction, the processor 21 is further configured to: and taking at least one of the global motion vector of the current block, the zero motion vector and the motion vector of the adjacent block of the current block as the motion vector of the current block.

Optionally, the processor 21 is specifically configured to: based on the search center, carrying out pixel search on the current block in at least one frame reference frame to obtain a matching block corresponding to the current block; and obtaining a motion vector of the current block under the inter-frame prediction according to the position vectors of the matching block and the current block.

Optionally, the processor 21 is specifically configured to: carrying out block division on the current block by adopting a target block division mode to obtain at least one sub-block; performing pixel search on the at least one sub-block in at least one reference frame based on the search center to obtain at least one sub-motion vector of the at least one sub-block; and obtaining the motion vector of the current block under the inter-frame prediction according to the at least one sub-motion vector.

Optionally, the target block division mode is a block division mode with the smallest coding cost among a plurality of block division modes of the current block.

Optionally, the processor 21 is specifically configured to: obtaining at least one motion vector of the current block under the inter-frame prediction according to the at least one sub-motion vector; or,

obtaining a motion vector of the current block under the inter-frame prediction according to the average value or the median value of the at least one sub-motion vector;

and obtaining a motion vector of the current block under the inter prediction according to any one of the at least one sub motion vector.

Optionally, the memory 22 is configured to: and storing the motion vector of the current block, wherein the motion vector of the current block is used for determining a search center of pixel search of a block to be coded in the frame to be coded.

Optionally, the memory 22 is specifically configured to: storing any one of the motion vector of the current block and the motion vector of an adjacent block of the current block as the motion vectors of the current block and the adjacent block; or,

and storing the mean value or the median value of the motion vector of the current block and the motion vectors of the adjacent blocks of the current block as the motion vectors of the current block and the adjacent blocks.

Optionally, the pixel search comprises an integer pixel search and/or a fractional pixel search.

Optionally, the current block and the corresponding block are macroblocks of 16 × 16 pixels.

The embodiment of the present application further provides an electronic device, which may include the video encoding apparatus according to the various embodiments of the present application.

Optionally, the electronic device may include, but is not limited to, a cell phone, a drone, a camera, and the like.

The embodiment of the present application further provides a video encoding apparatus, which includes a processor and a memory, where the memory is used to store program instructions, and the processor is used to call the program instructions to execute the video encoding method according to the various embodiments of the present application.

It should be understood that the processor in the present application may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will also be appreciated that the memory herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memories of the apparatus and methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory.

Embodiments of the present application also provide a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer executes the method of the above method embodiments.

Embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer, cause the computer to perform the method of the above method embodiments.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video encoding, comprising:

determining a search center of pixel search of a current block according to a motion vector of a corresponding block of the current block in an encoded frame;

2. The method of claim 1, wherein the corresponding block is a co-located block of the current block in the encoded frame.

3. The method of claim 1, wherein the corresponding block is a neighboring block to a co-located block of the current block in the encoded frame.

4. The method of claim 3, wherein the corresponding block is a lower-right block of a co-located block of the current block in the encoded frame.

5. The method of claim 1, further comprising:

and if the current block does not exist in the lower right block of the co-located block in the encoded frame, using the co-located block of the current block in the encoded frame as the corresponding block.

6. The method according to any of claims 1 to 5, wherein the encoded frame is an image frame preceding the frame in which the current block is located.

7. The method according to any one of claims 1 to 5, wherein the encoded frame includes a plurality of frames of images before the frame where the current block is located, and the determining, according to the motion vector of the corresponding block of the current block in the encoded frame, the search center of the pixel search of the current block includes:

determining at least one search center of pixel search of the current block according to at least one motion vector in motion vectors of a plurality of corresponding blocks in the previous multi-frame image; or,

and determining a searching center of the pixel search of the current block according to the average value or the median value of the motion vectors of a plurality of corresponding blocks in the previous multi-frame image.

8. The method of claim 7, wherein determining a search center for a pixel search of the current block based on a motion vector of a corresponding block of the current block in the encoded frame further comprises:

according to the motion vector of the corresponding block, and at least one of the following motion vectors: determining a plurality of search centers for pixel search of the current block based on the global motion vector of the current block, the zero motion vector, or the motion vector of the neighboring block of the current block.

9. The method of any one of claims 1 to 8, wherein the performing a pixel search on the current block based on the search center to obtain the motion vector of the current block comprises:

performing pixel search on the current block based on the search center, and determining a motion vector of the current block under inter-frame prediction;

and taking the motion vector of the current block under the inter-frame prediction as the motion vector of the current block.

10. The method of any one of claims 1 to 8, wherein the performing a pixel search on the current block based on the search center to obtain the motion vector of the current block comprises:

performing intra prediction and mode decision on the current block;

and obtaining the motion vector of the current block according to the result of the prediction mode decision.

11. The method of claim 10, wherein deriving the motion vector of the current block according to the result of the prediction mode decision comprises:

if the mode decision result is inter-frame prediction, taking the motion vector of the current block under the inter-frame prediction as the motion vector of the current block;

and if the mode decision result is intra-frame prediction, taking at least one motion vector of the global motion vector, the zero motion vector or the motion vector of the adjacent block of the current block as the motion vector of the current block.

12. The method of claim 9 or 10, wherein the performing a pixel search on the current block based on the search center to determine the motion vector of the current block under inter prediction comprises:

based on the search center, carrying out pixel search on the current block in at least one frame of reference frame to obtain a matching block corresponding to the current block;

and obtaining a motion vector of the current block under the inter-frame prediction according to the position vectors of the matching block and the current block.

13. The method of claim 9 or 10, wherein the performing a pixel search on the current block based on the search center to determine the motion vector of the current block under inter prediction comprises:

performing block division on the current block to obtain at least one sub-block;

performing pixel search on the at least one sub-block in at least one reference frame based on the search center to obtain at least one sub-motion vector of the at least one sub-block;

and obtaining the motion vector of the current block under the inter-frame prediction according to the at least one sub-motion vector.

14. The method of claim 13, wherein the block partitioning the current block into at least one sub-block comprises:

and performing block division on the current block by adopting a target block division mode to obtain at least one sub-block, wherein the target block division mode is the block division mode with the minimum coding cost in the multiple block division modes of the current block.

15. The method of claim 13 or 14, wherein deriving the motion vector of the current block under inter prediction according to the at least one sub motion vector comprises:

obtaining at least one motion vector of the current block under the inter-frame prediction according to the at least one sub-motion vector; or,

obtaining a motion vector of the current block under the inter-frame prediction according to the average value or the median value of the at least one sub-motion vector; or,

and obtaining a motion vector of the current block under the inter-frame prediction according to any one of the at least one sub-motion vector.

16. The method according to any one of claims 1 to 15, further comprising:

and storing the motion vector of the current block, wherein the motion vector of the current block is used for determining a search center of pixel search of a block to be coded in a frame to be coded.

17. The method of claim 16, the storing the motion vector for the current block, comprising:

storing the motion vectors of the current block and the adjacent block of the current block respectively; or,

storing any one of the motion vector of the current block and the motion vector of an adjacent block of the current block as the motion vectors of the current block and the adjacent block; or,

18. The method according to any one of claims 1 to 17, wherein the pixel search comprises a whole pixel search and/or a fractional pixel search.

19. The method of any one of claims 1 to 18, wherein the current block and the corresponding block are macroblocks of 16x16 pixels.

20. An apparatus for video encoding, comprising: a processor for processing the received data, wherein the processor is used for processing the received data,

the processor is configured to perform: determining a search center of pixel search of a current block according to a motion vector of a corresponding block of the current block in an encoded frame;

21. The apparatus of claim 20, wherein the corresponding block is a co-located block of the current block in the encoded frame.

22. The apparatus of claim 20, wherein the corresponding block is a neighboring block to a co-located block of the current block in the encoded frame.

23. The apparatus of claim 22, wherein the corresponding block is a bottom-right block of a co-located block of the current block in the encoded frame.

24. The apparatus of claim 20, wherein the processor is further configured to:

and if the current block does not exist in the lower right block of the encoded frame, using the co-located block of the current block in the encoded frame as the corresponding block.

25. The apparatus according to any of claims 20-24, wherein the encoded frame is a frame image previous to a frame in which the current block is located.

26. The apparatus of any one of claims 20 to 24, wherein the processor is configured to:

27. The apparatus of claim 26, wherein the processor is further configured to:

according to the motion vector of the corresponding block, and at least one of the following motion vectors: the global motion vector of the current block, the zero motion vector, and the motion vectors of the neighboring blocks of the current block, and determining at least one search center of a pixel search of the current block.

28. The apparatus according to any one of claims 20 to 27, wherein the processor is configured to:

29. The apparatus according to any one of claims 20 to 27, wherein the processor is configured to:

performing intra prediction and mode decision on the current block;

30. The apparatus of claim 29, wherein the processor is configured to:

and if the mode decision result is intra-frame prediction, taking at least one motion vector of the global motion vector, the zero motion vector and the motion vector of the adjacent block of the current block as the motion vector of the current block.

31. The apparatus of claim 28 or 29, wherein the processor is configured to:

32. The apparatus of claim 28 or 29, wherein the processor is configured to:

33. The apparatus of claim 32, wherein the processor is configured to:

34. The apparatus of claim 32 or 33, wherein the processor is configured to:

35. The apparatus of any one of claims 20 to 34, further comprising: a memory for storing a plurality of data to be transmitted,

the memory is to: and storing the motion vector of the current block, wherein the motion vector of the current block is used for determining a search center of pixel search of a block to be coded in a frame to be coded.

36. The apparatus of claim 35, the memory to:

storing any one of the motion vector of the current block and the motion vector of the adjacent block of the current block as the motion vector of the current block; or,

37. The apparatus according to any of claims 20 to 36, wherein the pixel search comprises a whole pixel search and/or a fractional pixel search.

38. The apparatus according to any of the claims 20 to 37, wherein the current block and the corresponding block are macroblocks of 16x16 pixels.

39. A computer-readable storage medium for storing program instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 19.

40. An electronic device, comprising:

an apparatus for video encoding as claimed in any one of claims 20 to 38.