WO2022061563A1 - 视频编码方法、装置及计算机可读存储介质 - Google Patents

视频编码方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2022061563A1
WO2022061563A1 PCT/CN2020/117038 CN2020117038W WO2022061563A1 WO 2022061563 A1 WO2022061563 A1 WO 2022061563A1 CN 2020117038 W CN2020117038 W CN 2020117038W WO 2022061563 A1 WO2022061563 A1 WO 2022061563A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
encoded
block
prediction
blocks
Prior art date
Application number
PCT/CN2020/117038
Other languages
English (en)
French (fr)
Inventor
王悦名
郑萧桢
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/117038 priority Critical patent/WO2022061563A1/zh
Publication of WO2022061563A1 publication Critical patent/WO2022061563A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes

Definitions

  • the present application relates to the technical field of image coding, and in particular, to a video coding method, apparatus, and computer-readable storage medium.
  • Video compression and coding technology is one of the key technologies in video transmission.
  • the high-efficiency video compression technology compresses video to a large extent, which can effectively reduce the demand for network transmission bandwidth.
  • the intra-frame prediction method is usually used to eliminate the spatial redundancy of the image, and the inter-frame prediction method is used to eliminate the temporal redundancy.
  • the intra-frame prediction method refers to using the correlation of the video spatial domain to predict the current pixel using the encoded pixels of the current image, so as to achieve the purpose of removing the video spatial redundancy, and then use the prediction residual as the subsequent encoded image block. Enter to proceed to the next encoding process.
  • an image (or picture) is divided into a plurality of image blocks for coding, and the processing of these image blocks depends on their size, spatial position and coding mode specified by the encoder.
  • the intra-frame prediction method to encode the current image block to be encoded
  • the current image to be coded depends on the adjacent coded image blocks on the left and above it, so when calculating the predicted value of each image block to be coded (except the image block to be coded at the image boundary), it must be Waiting for the image blocks to the left and above it to finish encoding before proceeding, so strong dependencies may cause encoding delays.
  • each pipeline stage performs different encoding steps in parallel, so that multiple image blocks to be encoded can be processed in parallel, thereby improving encoding efficiency.
  • the current image block to be coded has a strong dependence on the reconstructed pixels of adjacent image blocks, it is possible that when one pipeline stage needs to use the reconstructed pixels of the adjacent image block to process the current image block to be coded, the other pipeline stage The adjacent image block is being processed, and the reconstructed pixels of the adjacent image block have not been generated at this time, so that the chip cannot be parallelized, resulting in insufficient processing speed and coding efficiency of the finally realized chip.
  • the approach in the related art is to use the original pixels of adjacent image blocks to replace the reconstructed pixels for processing, but this obviously leads to inaccurate prediction results.
  • one of the objectives of the present application is to provide a video encoding method, device and computer-readable storage medium.
  • an embodiment of the present application provides a video encoding method, including:
  • the reference pixels of adjacent image blocks of the current image block to be encoded are used to generate M predicted image blocks corresponding to the current image block to be encoded under the M prediction modules; is a natural number;
  • the coding costs corresponding to the M prediction modules are obtained respectively, and the target prediction mode with the smallest coding cost is determined from the M prediction modes;
  • the reference pixels of the current to-be-coded image block are generated according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients for use by the next to-be-coded image block.
  • embodiments of the present application provide a video encoding apparatus, including one or more processors, working individually or together, the processors include multiple pipeline stages; and a memory for storing executable instructions ;
  • the reference pixels of adjacent image blocks of the current image block to be encoded are used to generate M predicted image blocks corresponding to the current image block to be encoded under the M prediction modules; is a natural number;
  • the coding costs corresponding to the M prediction modules are obtained respectively, and the target prediction mode with the smallest coding cost is determined from the M prediction modes;
  • the reference pixels of the current to-be-coded image block are generated according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients for use by the next to-be-coded image block.
  • embodiments of the present application provide a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, implement the method described in the first aspect.
  • the video encoding method, device, and computer-readable storage medium provided by the embodiments of the present application can improve the accuracy of prediction results.
  • FIG. 1 is a schematic diagram of a video communication system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a video encoding process provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of parallel processing of multiple pipeline stages provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a video encoding method provided by an embodiment of the present application.
  • FIGS. 5 and 6 are schematic diagrams of image block division provided by an embodiment of the present application.
  • FIG. 7 , FIG. 8A and FIG. 8B are schematic diagrams of residual block division provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another video encoding method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a video encoding apparatus provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a typical video communication system.
  • the sending end 100 includes a video capturing device 101, a video encoding device 102 and a sending device 103.
  • the video capturing device 101 will The collected video is sent to the video encoding device 102 for compression encoding of image information, and then sent out through the sending device 103 .
  • the receiving end 200 includes a receiving device 201, a video display device 203 and a video decoding device 202.
  • the receiving device 201 receives the compressed video data sent by the transmitting end 100, and the video decoding device 202 decodes the received video data to recover an image, and decodes the video data.
  • the resulting image is displayed on the video display device 203 .
  • the video encoding method in the embodiment of the present application is mainly applied to the video encoding device in the video communication system, and the intra-frame prediction method is used to compress and encode multimedia information to achieve the purpose of removing video spatial redundancy.
  • FIG. 2 shows a coding flow chart.
  • the prediction includes two types of intra-frame prediction and inter-frame prediction, the purpose of which is to use the prediction block information to remove redundant information of the current image to be encoded.
  • Each frame in the video can be regarded as an independent image, and there will be a certain spatial redundancy in the image.
  • the sky area that often appears in the background of the image or video has very similar internal pixels.
  • the area of provides a large compression space for the encoding of images or videos.
  • Intra-frame prediction is used to remove the spatial redundancy within each frame.
  • Intra-frame prediction uses the information of the current frame to obtain prediction block data, and the process includes dividing the to-be-coded image into several to-be-coded image blocks; image block to generate the prediction block of the current image block to be encoded.
  • the correlation between two adjacent frames will be very high, therefore, temporal redundancy between frames will be formed.
  • Inter-frame prediction is used to remove the temporal redundancy between frames; inter-frame prediction uses the information of the reference frame to obtain prediction block data, and the process includes dividing the image to be encoded into several image blocks to be encoded; There are several image blocks to be encoded, and the image block that best matches the current image block to be encoded is searched in the reference frame as the prediction block, and the relative displacement between the prediction block and the current image block to be encoded is the motion vector; the reference frame can be the same as the image block to be encoded. The encoded image adjacent to the encoded image.
  • An image frame that uses only intra-frame prediction mode in encoding is called an I-frame
  • an image frame that uses both intra-frame prediction and inter-frame prediction is called a P or B frame.
  • a residual block is obtained by subtracting the corresponding pixel values of the image block to be encoded and the pre-side block.
  • the transformation is to transform the residual block from the time domain to the frequency domain, so that the residual block can be further analyzed in the frequency domain, and a transformation matrix can be used to transform the residual block.
  • the transformation of the residual block usually adopts a two-dimensional transformation, that is, the residual value in the residual block is multiplied by an NxN transformation matrix and its transposed matrix respectively at the encoding end, and the transformation coefficient is obtained after the multiplication.
  • the video content comes from the real world, and it cannot ensure that all the information contained in it can be perceived by the human eye. Therefore, the video can be appropriately simplified according to the characteristics of the human eye perceiving the light signal to remove the visual redundancy.
  • Quantization is used to remove visual redundancy based on human eyes; wherein, the transformed coefficients obtained after transformation are quantized by quantization parameters to obtain quantized coefficients, and the coding efficiency can be further improved through the quantization process.
  • the values in the corresponding signal controls will not follow a single uniform distribution, but usually take the maximum value at one or several special points. value or minimum value, the redundancy in this process is statistical redundancy.
  • Entropy coding is used to remove this statistical redundancy, entropy coding the quantized coefficients, by assigning shorter code words to value points with higher probability, and assigning longer code words to value points with smaller probability. Ways to improve compression efficiency and remove statistical redundancy.
  • the code stream obtained by entropy coding and the coding mode information after coding are stored or sent to the decoding end.
  • the quantized coefficients are obtained through the process of inverse quantization and inverse transformation to obtain a reconstructed residual block, and then the reconstructed residual block is added to the corresponding prediction block to obtain a reconstructed frame.
  • the reconstructed frame is used as a reference for other images to be encoded. frame is used for inter prediction of other images to be encoded.
  • the entropy encoding code stream is first obtained and then entropy decoding is performed to obtain the corresponding residual block, and the corresponding prediction block is obtained according to the motion vector or intra prediction obtained by decoding.
  • the video coding process includes the steps of prediction, transformation, quantization, inverse transformation, inverse quantization, entropy coding and loop filtering, among which the prediction step is further divided into intra-frame prediction and inter-frame prediction steps.
  • intra-frame prediction since the intra-frame prediction algorithm of the high-efficiency video coding standard calculates the prediction mode of the to-be-coded image block of each size, it relies on the coded image blocks on the left and above it to generate predictions.
  • the hardware encoder Before calculating the prediction block of each to-be-coded image block (except the to-be-coded image block located at the image boundary), it must wait for the image blocks to the left and above to be encoded first. Since the image block to be encoded has a strong dependence on the reconstructed pixels of adjacent encoded image blocks, the hardware encoder cannot implement the parallelization within the intra-frame prediction algorithm when implementing the algorithm, resulting in the processing speed and efficiency of the final hardware encoder. insufficient.
  • FIG. 3 is a schematic diagram of the pipeline stage division of a hardware encoder.
  • the pipeline stages are divided into 5 stages, which are integer pixel search, sub-pixel search, intra prediction, mode decision, and Entropy coding and filtering, where the integer-pixel search pipeline and sub-pixel search pipeline perform steps related to inter-frame search, the intra-frame prediction pipeline performs steps related to intra-frame search, and the mode decision pipeline is related to frame search.
  • Several prediction modes are used as the most probable prediction set; further prediction mode discrimination is performed in the mode decision pipeline stage, and the prediction modes in the most probable prediction mode set are traversed to select the optimal intra prediction mode.
  • Each pipeline stage is processed in parallel, in the example shown in Figure 3:
  • N is an integer.
  • the N+1 th image block to be encoded is subjected to an integer pixel search process in the integer pixel search pipeline stage, and the Nth image block to be encoded is subjected to a sub-pixel search process in the sub-pixel search pipeline stage.
  • the N+2 th image block to be encoded is subjected to integer pixel search processing in the integer pixel search pipeline stage, and the N+1 th image block to be encoded is subjected to sub-pixel search processing in the sub-pixel search pipeline stage.
  • Intra-frame prediction processing is performed in the intra-frame prediction pipeline stage for each image block to be encoded.
  • the N+3 th image block to be encoded is subjected to integer pixel search processing in the integer pixel search pipeline stage
  • the N+2 th image block to be encoded is subjected to pixel-by-pixel search processing in the sub-pixel search pipeline stage
  • the Nth The +1 image block to be encoded is subjected to intra prediction processing in the intra prediction pipeline stage
  • the Nth image block to be encoded is subjected to mode decision processing in the mode decision pipeline stage.
  • the N+4th image block to be encoded is subjected to the integer pixel search process in the integer pixel search pipeline stage
  • the N+3th image block to be encoded is subjected to pixel-by-pixel search processing in the sub-pixel search pipeline stage
  • the Nth The +2 to-be-coded image blocks are subjected to intra-frame prediction processing in the intra-frame prediction pipeline stage
  • the N+1-th to-be-coded image block is subjected to mode decision processing in the mode decision pipeline stage
  • the N-th to-be-coded image block is subjected to entropy coding Entropy coding and filtering are performed in the filtering pipeline stage.
  • the prediction block of the N th image block to be coded needs to be used, and the N th image block to be coded needs to be used.
  • the block is subjected to mode decision processing in the mode decision pipeline stage.
  • the prediction block of the Nth image block to be encoded cannot be obtained, then the N+1th image block to be encoded cannot continue to be processed, and it is necessary to wait for the Nth image block to be encoded. Coding the prediction block of the image block makes the hardware encoder unable to achieve internal parallelization when implementing intra-frame prediction, resulting in insufficient processing speed and efficiency of the final hardware encoder.
  • the above division of flow levels is only for illustration, and different flow levels may be divided according to actual needs, which is not limited in this embodiment.
  • an embodiment of the present application provides a video encoding method, and the video encoding method of the embodiment of the present application can be applied to a chip including multiple pipeline stages, and the chip can be installed in a video encoding device (or a hardware encoder). ), the video encoding device uses the pipeline stage to perform video encoding processing, wherein the video encoding method of the embodiment of the present application may be performed in one of the pipeline stages, for example, in the pipeline stage exemplarily divided in FIG. 3 , This can be done in the intra prediction pipeline stage.
  • the video encoding method in the embodiment of the present application may also be executed on a software encoder, which is not limited in the embodiment of the present application.
  • the reference pixels of the current image to be encoded may be generated in one of the pipeline stages, for example, in the intra-frame prediction pipeline stage, so that during the parallel operation of multiple pipeline stages of the video encoding device,
  • the Nth image block to be encoded is subjected to the first prediction mode determination in the intra prediction pipeline stage, and after the reference pixel of the Nth image block to be encoded is generated by using the video encoding method provided in the embodiment of the present application, the Nth image block to be encoded is generated.
  • the image blocks to be encoded enter the next pipeline stage, such as entering the mode decision pipeline stage. At this time, the N+1 th image block to be encoded enters the intra prediction pipeline stage for processing.
  • the Nth image block to be encoded is predicted in the intra frame
  • the reference pixel of the Nth image block to be encoded is generated in the pipeline stage, then the N+1th image block to be encoded can be directly used when the first prediction mode judgment is performed in the intra prediction pipeline stage, and the Nth image block to be encoded can be directly used.
  • FIG. 4 is a schematic flowchart of a video encoding method provided by an embodiment of the present application.
  • the method can be applied to a video encoding device, and the method includes:
  • step S101 according to the M prediction modes of intra-frame prediction, use reference pixels of adjacent image blocks of the current image block to be encoded to generate M corresponding to the current image block to be encoded under the M prediction modules Predicted image block; M is a natural number.
  • step S102 M residual blocks are acquired based on differences between the current image block to be encoded and the M predicted image blocks respectively.
  • step S103 the coding costs corresponding to the M types of prediction modules are obtained respectively according to the M residual blocks, and a target prediction mode with the smallest coding cost is determined from the M types of prediction modes.
  • step S104 some frequency coefficients of the residual block corresponding to the target prediction mode are obtained.
  • the reference pixels of the current to-be-coded image block are generated according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients for use by the next to-be-coded image block.
  • the generating the reference pixel of the current image block to be encoded according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients includes: directly using the obtained DC coefficient to generate the the reference pixels of the image block to be encoded currently; or perform quantization and inverse quantization processing on the obtained DC coefficients, and generate the reference pixels of the image block currently to be encoded from the quantized and inverse-quantized DC coefficients.
  • the reference pixels of the current image block to be encoded may be generated according to the prediction block corresponding to the target prediction mode and the quantized and inverse-quantized DC coefficients.
  • the reference pixel of the current image block to be encoded may be generated according to the prediction block corresponding to the target prediction mode and the obtained DC coefficient (ie, the DC coefficient that has not undergone quantization and inverse quantization processing).
  • the to-be-coded image block refers to a prediction unit to be subjected to intra-frame prediction.
  • the basic coding unit used in the prediction link is a prediction unit (Prediction Unit, hereinafter referred to as PU), and all operations related to prediction are in units of PU.
  • PU prediction unit
  • the direction of intra prediction, the motion vector difference and reference frame index of inter prediction, motion vector prediction, motion estimation and motion compensation are all processed based on PU.
  • different video compression standards have different sizes of the image blocks to be encoded, and their names may also be different. Specific settings can be made according to the actual application scenario. There are no restrictions on this. Those skilled in the art know that when other standards are used for encoding, the technical solutions described in this application are also applicable to similar technical problems.
  • the current image block to be encoded is one of the images divided into L ⁇ L image blocks, where L is a natural number.
  • the basic encoding unit is an image block to be encoded with a size of 64 ⁇ 64.
  • predictive coding mode it may be further divided into 1 64x64 image block to be coded or 4 32x32 image blocks to be coded, or 16 16x16 image blocks to be coded, or 64 8x8 image blocks to be coded, or 256 4x4 image blocks
  • the to-be-coded image block or a combination of several to-be-coded image blocks of different sizes please refer to FIG. 5 , which is a schematic diagram showing an example of division of a 64 ⁇ 64 image block to be encoded.
  • the image block to be encoded with the size of 64x64 on the left is finally divided into 1 coded image block of 32x32, 10 coded image blocks of 16x16, 7 coded image blocks of 8x8 and 4 coded images of 4x4 Piece.
  • the M kinds of prediction modules may be all prediction modes or partial prediction modes corresponding to intra-frame prediction, for example, the intra-frame prediction modes in HEVC (High Efficiency Video Coding, a video compression standard) include 33 kinds. Angle mode, DC mode and Planar mode.
  • the M types of prediction modules may refer to the 35 prediction modes, that is, using the coded image blocks adjacent to the image block to be encoded according to the calculation corresponding to the 35 prediction modes The formula calculates 35 candidate prediction pixel blocks, and then selects the optimal prediction mode from the 35 prediction modes according to the optimization principle. Or, according to the actual application scenario, it is considered that the mode discrimination of the 35 prediction modes consumes a lot of hardware resources and time.
  • the M prediction modes may be some of the 35 prediction modes. 10, 15 or 20 prediction modes are selected; or all prediction modes of intra-frame prediction can be divided into at least 2 groups, and the M prediction modes are all prediction modes corresponding to 1 group; The selection process does not impose any restrictions, and can be selected according to the actual application scenario.
  • the M prediction modes may include intra prediction luma component prediction mode and/or chroma component prediction mode (and/or represent both or either).
  • only the chroma block of the image block to be encoded may adopt the video encoding method of the embodiment of the present application, or only the luminance block of the image block to be encoded may adopt the video encoding method of the embodiment of the present application, or Both the chrominance block and the luminance block of the image block to be encoded adopt the video encoding method of the embodiment of the present application, which is not limited in the embodiment of the present application, and can be specifically selected according to the actual application scenario.
  • the M prediction modes may include all luma component prediction modes or some luma component prediction modes, and/or all chroma component prediction modes or some chroma component prediction modes for intra prediction. Specific choices can be made according to actual application scenarios.
  • the video encoding device after determining M prediction modes for intra-frame prediction, the video encoding device generates the image block by using reference pixels of adjacent image blocks of the current image block to be encoded and calculation formulas corresponding to each prediction mode. Predicted image blocks corresponding to the currently to-be-encoded image block in various prediction modes; wherein the reference pixels of adjacent image blocks of the currently to-be-encoded image block refer to pixels generated by using the video encoding method of the embodiment of the present application.
  • the size of the image block to be encoded may be specifically divided according to the actual application scenario. Please refer to Fig. 6.
  • Fig. 6 shows three image blocks, namely image block 10, image block 20 and image block 30. Further, the image blocks can be further divided. Taking image block 20 as an example, image block 20 It is further divided into sub-image block 21 , sub-image block 22 , sub-image block 23 and sub-image block 24 .
  • the image block 10 is an already encoded image block
  • the image block 20 and the image block 30 are image blocks to be encoded
  • the current image block to be encoded is the image block 20
  • the adjacent images of the current image block to be encoded The block is an image block 10
  • the reference pixels of the image block 10 are pixels generated by using the video encoding method of the embodiment of the present application
  • the image block 20 is used as the current image block to be encoded
  • the video encoding method of the embodiment of the present application can be used.
  • the sub-image block 21 is an encoded image block
  • the sub-image block 22, the sub-image block 23, and the sub-image block 24 are image blocks to be encoded.
  • the current image block to be encoded is the sub-image block 22
  • the current image block to be encoded is the sub-image block 22
  • the reference pixels of the sub-image blocks 21 are pixels generated by using the video encoding method of the embodiment of the present application
  • the sub-image block 22 is used as the current image block to be encoded
  • the reference pixels of the adjacent image blocks can be used to generate the reference pixels of the sub-image block 22, which are used for the next image block to be encoded (that is, the sub-image block 23).
  • step S102 after acquiring the M predicted image blocks of the currently to-be-encoded image block, the video encoding device acquires the M predicted image blocks based on the differences between the current to-be-encoded image block and the M predicted image blocks respectively.
  • M residual blocks specifically, the M residual blocks are respectively the results of the subtraction of the M predicted image blocks and the current image block to be encoded, and each position on the residual block is The residual value is the result of subtracting the original pixel value at the corresponding position of the current image block to be encoded and the predicted pixel value at the corresponding position of the predicted image block.
  • step S103 the video encoding apparatus acquires the encoding costs corresponding to the M types of prediction modules according to the M residual blocks, respectively, and obtains encoding costs corresponding to the M types of prediction modules from the M types of prediction modes Among them, the one with the smallest coding cost is determined as the target prediction mode. In this embodiment, selecting the one with the smallest coding cost as the target prediction mode is beneficial to obtain an accurate prediction result, thereby improving the coding efficiency.
  • the encoding cost includes, but is not limited to, at least one of the following: Rate DistortionOptimized (RDO), Absolute Error (Sum of Absolute Difference, SAD), transformed absolute error and Sum of Absolute Transformed Difference, SATD), mean squared error (MSE), sum of squared difference (SSD), mean absolute difference (MAE), or number of encoded bits.
  • RDO Rate DistortionOptimized
  • SAD Absolute Error
  • SATD Sum of Absolute Transformed Difference
  • MSE mean squared error
  • SSD sum of squared difference
  • MAE mean absolute difference
  • step S104 after determining the target prediction mode, the video encoding apparatus obtains partial frequency coefficients of the residual block corresponding to the target prediction mode, wherein the partial frequency coefficients may be the residual The DC coefficients corresponding to the difference block, or the partial frequency coefficients may also be the DC coefficients and partial AC coefficients corresponding to the residual block.
  • the reference pixel of the current image block to be encoded obtained according to the video encoding method of the embodiment of the present application is not equal to the final reconstructed pixel corresponding to the prediction block of the current image block to be encoded.
  • Fig. 3 illustrates that because the final reconstructed pixel corresponding to the prediction block of the current image block to be encoded is acquired in the mode decision pipeline stage, the acquisition time is too long. At the same time, the next image block to be encoded needs to use the reconstructed pixels corresponding to the prediction block of the current image block to be encoded in the intra prediction pipeline stage, but the reconstructed pixels corresponding to the prediction block of the current image block to be encoded need to be used.
  • the video encoding method of the present application can be used to first generate the reference pixels of the current image block to be encoded in the intra prediction pipeline stage (for example, the estimated reconstruction pixels) to avoid consuming too many encoding resources and encoding time.
  • the reference pixels of the current image block to be encoded are superior to the original pixels of the current image block to be encoded, but inferior to the final reconstructed pixels corresponding to the prediction block of the current image block to be encoded.
  • the acquisition duration of the reference pixels (for example, the estimated reconstructed pixels) of the image block currently to be encoded is also shorter than the acquisition duration of the final reconstructed pixels corresponding to the prediction block of the current image block to be encoded, so as to avoid consuming too many encoding resources , the next image block to be coded does not need to wait for the generation of reference pixels of the current image block to be coded, thereby helping to improve coding efficiency.
  • the video encoding apparatus can obtain the DC coefficients of the residual block corresponding to the target prediction mode in the time domain, without needing to
  • the residual block is transformed (for example, through discrete sine transform and/or discrete cosine transform) to the frequency domain to obtain the DC coefficients; specifically, in the time domain, the video encoding apparatus counts the corresponding prediction modes of the target prediction mode.
  • the sum of all residual values in the residual block and the number of residual values, and the DC coefficient is obtained by the ratio of the two.
  • there is no need to perform a transformation step and the DC coefficients can be directly obtained in the time domain, thereby effectively reducing the processing time of transformation, which is beneficial to improving coding efficiency.
  • the video encoding apparatus may also transform the residual block from the time domain to the frequency domain, so as to obtain the DC coefficients and part of the AC coefficients of the residual block in the frequency domain.
  • more information is obtained by acquiring the DC coefficients and part of the AC coefficients than by acquiring only the DC coefficients, so that the subsequently generated reference pixels of the current image block to be encoded can assist in the prediction of the next image block to be encoded Mode discrimination is more accurate.
  • the coefficients to be obtained can be selected according to actual needs. If you want the shortest possible coding time, you can choose to obtain the DC of the residual block corresponding to the target prediction mode in the time domain. If you want a more accurate prediction result, you can choose to obtain the DC coefficient and some AC coefficients of the residual block corresponding to the target prediction mode in the frequency domain, which is not limited in this embodiment of the present application.
  • the video encoding apparatus generates reference pixels of the current image block to be encoded according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients, and the reference pixel of the current image block to be encoded Pixels are available for the next image block to be encoded.
  • next to-be-coded image block does not need to wait for the prediction block of the current to-be-coded image block, which is beneficial to ensure the normal operation of the parallel processing processes of multiple pipeline stages, and the next to-be-coded image block uses the The prediction mode is judged on the reference pixels of the current image block to be encoded, which is beneficial to improve the accuracy of the prediction result.
  • the size of the image block to be encoded may be specifically divided according to the actual application scenario.
  • the next image block to be encoded is the image block 30; when the image block 21 is the current image block to be encoded, the next image block to be encoded is the image block 30.
  • the block is image block 22 .
  • the video encoding apparatus when only the DC coefficients of the residual blocks corresponding to the target prediction mode are acquired, the video encoding apparatus generates the DC coefficients according to the predicted image blocks corresponding to the target prediction mode and the DC coefficients
  • the reference pixel of the current image block to be coded the DC coefficient can be directly used to generate the reference pixel of the current image block to be coded, or the DC coefficient can be quantized and inverse quantized, and then the processed DC coefficient can be used. Generating reference pixels of the current image block to be encoded.
  • the video encoding apparatus may add the pixel value of each pixel on the predicted image block to the DC coefficient without quantization and inverse quantization, or after quantization and inverse quantization, so as to obtain the current The reference pixel of the image block to be encoded.
  • generating the reference pixel of the current image block to be encoded according to the prediction block corresponding to the target prediction mode and the DC coefficient includes: The pixel values of each pixel are respectively added to the coefficient values of the DC coefficients that have not undergone quantization and inverse quantization processing, or have undergone quantization and inverse quantization processing, to obtain a plurality of reconstructed pixel values of a plurality of estimated reconstructed pixels; and The plurality of estimated reconstructed pixels are used as reference pixels of the current image block to be encoded.
  • the coding time can be effectively shortened, and the coding efficiency can be improved.
  • the video encoding apparatus may, according to the prediction block corresponding to the target prediction mode, the The DC coefficients and part of the AC coefficients are used to generate the reference pixels of the current image block to be encoded.
  • the DC coefficient and some of the AC coefficients are obtained, that is, more information is obtained, so that the accuracy of the prediction result can be further improved.
  • the residual block corresponding to the target prediction mode may be divided into at least two subsections.
  • the residual block is obtained, and the DC coefficients of the at least two sub-residual blocks are obtained respectively.
  • the residual block is divided into smaller ones.
  • the blocks calculate the DC coefficients separately, which is beneficial to improve the accuracy of the obtained DC coefficients of each sub-residual block.
  • the current to-be-coded block may be divided into at least two to-be-coded sub-blocks corresponding to the at least two sub-residual blocks one-to-one; and, The predicted image block corresponding to the target prediction mode is divided into at least two sub-predicted image blocks corresponding to the at least two sub-residual blocks one-to-one, and then for each sub-block to be encoded, according to the corresponding sub-predicted image block and The DC coefficients of the corresponding sub-residual blocks generate the reference pixels of the sub-blocks to be encoded; finally, the reference pixels of the current image block to be encoded are obtained according to the reference pixels of all the sub-blocks to be encoded.
  • the process of acquiring the reference pixels of each sub-block to be encoded can be performed in parallel, which is beneficial to improve the efficiency of acquiring the reference pixels of the current image block to be encoded, and further
  • FIG. 7 includes the image block 100 currently to be encoded and the corresponding target prediction mode.
  • the predicted image block 200 and the residual block 300 corresponding to the target prediction mode, the residual block 300 can be divided into two sub-residual blocks, which are respectively the sub-residual block 301 and the sub-residual block 302, then the
  • the current to-be-coded image block 100 can be divided into two to-be-coded sub-blocks corresponding to two sub-residual blocks one-to-one, namely the to-be-coded sub-block 101 and the to-be-coded sub-block 102, the predicted image corresponding to the target prediction mode
  • the block 200 can also be divided into two sub-prediction image blocks corresponding to the two sub-residual blocks one-to-one, namely the sub-prediction image block 201 and the sub-prediction image
  • the result of adding the DC coefficients of the sub-residual block 201 and the sub-residual block 301, the reference pixel of the sub-block 102 to be encoded is the result of adding the DC coefficients of the sub-prediction image block 202 and the sub-residual block 302.
  • the reference pixels of the sub-block to be encoded 101 and the reference pixels of the sub-block to be encoded 102 can be obtained in parallel, which is beneficial to improve the efficiency of obtaining the reference pixels of the current image block to be encoded and further improve the encoding efficiency.
  • the residual block 300 can also be divided into 4 sub-residual blocks, and in FIG. 8B , the residual block 300 can also be divided into 6 sub-residual blocks , which can be specifically divided according to the actual application scenario, which is not limited in this embodiment.
  • FIG. 9 is a schematic flowchart of another video encoding method according to an exemplary embodiment of the present application.
  • the method can be applied to the video encoding device, and the method includes:
  • step S201 according to the M prediction modes of intra-frame prediction, the reference pixels of the adjacent image blocks of the current image block to be encoded are used to generate M corresponding to the current image block to be encoded under the M prediction modules.
  • Predicted image block; M is a natural number. Similar to step S101, details are not repeated here.
  • step S202 M residual blocks are obtained based on the differences between the current image block to be encoded and the M predicted image blocks respectively. Similar to step S102, details are not repeated here.
  • step S203 the coding costs corresponding to the M types of prediction modules are obtained respectively according to the M residual blocks, and a target prediction mode with the smallest coding cost is determined from the M types of prediction modes. Similar to step S103, details are not repeated here.
  • step S204 some frequency coefficients of the residual block corresponding to the target prediction mode are obtained. Similar to step S104, details are not repeated here.
  • step S205 the partial frequency coefficients are subjected to quantization processing and inverse quantization processing to obtain processed frequency coefficients.
  • the reference pixels of the current to-be-coded image block are generated according to the prediction block corresponding to the target prediction mode and the processed frequency coefficient, for use by the next to-be-coded image block.
  • the generating the reference pixel of the current image block to be encoded according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients includes: directly using the obtained DC coefficient to generate the the reference pixels of the image block to be encoded currently; or perform quantization and inverse quantization processing on the obtained DC coefficients, and generate the reference pixels of the image block currently to be encoded from the quantized and inverse-quantized DC coefficients.
  • the reference pixels of the current image block to be encoded may be generated according to the prediction block corresponding to the target prediction mode and the quantized and inverse-quantized DC coefficients.
  • the reference pixel of the current image block to be encoded may be generated according to the prediction block corresponding to the target prediction mode and the obtained DC coefficient (ie, the DC coefficient that has not undergone quantization and inverse quantization processing).
  • the partial frequency coefficients of the residual block corresponding to the target prediction mode are obtained, the partial frequency coefficients are subjected to quantization processing and inverse quantization processing to obtain processed frequency coefficients.
  • the visual redundancy can be removed, so that the accuracy of the prediction result can be further improved.
  • the video encoding device generates the reference pixel of the current image block to be encoded according to the prediction block corresponding to the target prediction mode and the processed frequency coefficient, so that the next image block to be encoded is using the current image block.
  • intra-frame prediction is performed on the reference pixels of the image block to be encoded, the prediction result is more accurate.
  • the video encoding apparatus when only the DC coefficients of the residual blocks corresponding to the target prediction mode are acquired, the video encoding apparatus performs quantization processing and inverse quantization processing on the DC coefficients, so as to remove the visual redundancy, which can further improve the accuracy of prediction results. It should be noted that, if the DC coefficient is a fixed-point number, the amplification effect of the quantization and inverse quantization operations on the DC coefficient needs to be considered in the quantization process and the inverse quantization process.
  • the video encoding apparatus performs quantization processing on the DC coefficients and the partial AC coefficients, and Inverse quantization processing to remove the visual redundancy, which can further improve the accuracy of the prediction results.
  • an embodiment of the present application further provides a video encoding apparatus 30, including one or more processors 31, which work independently or together, and the processors 31 include multiple pipeline stages; and Memory 32 for storing executable instructions.
  • the reference pixels of adjacent image blocks of the current image block to be encoded are used to generate M predicted image blocks corresponding to the current image block to be encoded under the M prediction modules; is a natural number;
  • the coding costs corresponding to the M prediction modules are obtained respectively, and the target prediction mode with the smallest coding cost is determined from the M prediction modes;
  • the reference pixels of the current to-be-coded image block are generated according to the predicted image block corresponding to the target prediction mode and the partial frequency coefficients for use by the next to-be-coded image block.
  • the processor 31 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 32 may include at least one type of storage medium including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory 32, etc.), random access memory (RAM), static random access memory (SRAM), read only memory 32 (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 32 may be an internal storage unit of the video encoding apparatus 30, such as a hard disk or a memory.
  • the memory 32 may also be an external storage device of the video encoding device 30, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the video encoding device 30 card, Flash Card, etc. Further, the memory 32 may also include both an internal storage unit of the video encoding apparatus 30 and an external storage device. The memory 32 is used to store computer programs and other programs and data required by the device. The memory 32 may also be used to temporarily store data that has been or will be output.
  • an external storage device of the video encoding device 30 such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the video encoding device 30 card, Flash Card, etc.
  • the memory 32 may also include both an internal storage unit of the video encoding apparatus 30 and an external storage device.
  • the memory 32 is used to store computer programs and other programs and data required by the device.
  • the memory 32
  • the processor 31 is specifically configured to: acquire the DC coefficient of the residual block corresponding to the target prediction mode in the time domain.
  • the processor 31 is specifically configured to: in the time domain, count the sum of all residual values and the number of residual values in the residual block corresponding to the target prediction mode, and use the two The ratio of to obtain the DC coefficient.
  • the processor 31 is specifically configured to: generate the reference pixels of the current image block to be encoded according to the prediction block corresponding to the target prediction mode and the DC coefficient.
  • the partial frequency coefficients include DC coefficients, or DC coefficients and partial AC coefficients.
  • the processor 31 is further configured to: perform quantization processing and inverse quantization processing on the partial frequency coefficients to obtain processed frequency coefficients;
  • the frequency coefficients generate reference pixels of the current image block to be encoded.
  • the M residual blocks are respectively the results of the subtraction of the M predicted image blocks and the current image block to be encoded.
  • the M types of prediction modules are all prediction modes or part of prediction modes corresponding to intra prediction.
  • the M prediction modes include a luma component prediction mode and/or a chroma component prediction mode for intra-frame prediction.
  • the encoding cost includes at least one of the following: rate-distortion optimization of the residual block, absolute error, transformed sum of absolute errors, mean squared error (MSE), sum of squares of differences, average Absolute difference or coded bit sum squared or absolute value sum.
  • rate-distortion optimization of the residual block absolute error
  • transformed sum of absolute errors absolute errors
  • MSE mean squared error
  • sum of squares of differences average Absolute difference or coded bit sum squared or absolute value sum.
  • the processor is specifically configured to: divide the residual block corresponding to the target prediction mode into at least two sub-residual blocks, and obtain the DC coefficients of the at least two sub-residual blocks respectively;
  • the current block to be encoded is divided into at least two sub-blocks to be encoded corresponding to the at least two sub-residual blocks one-to-one; and the prediction image block corresponding to the target prediction mode is divided into at least two sub-blocks corresponding to the at least two sub-residual blocks at least two sub-prediction image blocks corresponding to the residual blocks one-to-one; for each sub-block to be encoded, generate the reference pixels of the sub-block to be encoded according to the corresponding sub-prediction image block and the DC coefficient of the corresponding sub-residual block;
  • the reference pixels of the current image block to be encoded are obtained according to the reference pixels of all sub-blocks to be encoded.
  • the current image block to be encoded is one of images divided into L ⁇ L image blocks, where L is a natural number.
  • the video encoding method, device, and computer-readable storage medium provided by the embodiments of the present application can quickly generate the reference pixels of the currently to-be-encoded image block for use by the next to-be-encoded image block, so that the next to-be-encoded image block is There is no need to wait for the generation of reconstructed pixels of adjacent image blocks, so as to ensure the parallel processing of each pipeline stage, which is conducive to improving coding efficiency; Pixels are used to make predictions, and the obtained prediction results are more accurate.
  • the apparatus embodiments since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts.
  • the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种视频编码方法、装置及计算机可读存储介质,所述方法包括:根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数(S101);基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块(S102);根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式(S103);获取所述目标预测模式对应的残差块的部分频率系数(S104);根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用(S105)。该方法能够提高预测结果的准确性。

Description

视频编码方法、装置及计算机可读存储介质 技术领域
本申请涉及图像编码技术领域,具体而言,涉及一种视频编码方法、装置及计算机可读存储介质。
背景技术
视频的压缩编码技术是视频传输中的关键技术之一,通过高效的视频压缩技术将视频进行大幅度的压缩,可以有效的降低对网络传输带宽的需求。
目前,在视频编码过程中,通常使用帧内预测方法来消除图像的空间冗余度,使用帧间预测方法来消除时间冗余度。具体而言,帧内预测方法是指利用视频空间域的相关性,使用当前图像已编码的像素预测当前像素,以达到去除视频空域冗余的目的,然后将预测残差作为后续编码图像块的输入,进行下一步编码处理。
现有视频编码标准中将图像(或者说图片)划分成多个图像块来进行编码,这些图像块的处理取决于它们的尺寸、空间位置和编码器指定的编码模式。其中,使用帧内预测方法对当前待编码图像块进行编码时,需要依赖当前待编码图像相邻的已编码图像块来产生预测值,比如在自上而下且自左向右的预测方向上,所述当前待编码图像依赖于其左侧和上方相邻的已编码图像块,因此在计算每个待编码图像块(除位于图像边界处的待编码图像块外)的预测值时都必须等待其左侧及上方的图像块先编码完成后才能进行,因此存在很强的依赖性可能会导致编码延迟。
特别是在具备多个流水级的芯片(或者说硬件编码器)中进行视频编码时,各流水级并行进行不同的编码步骤,则可以并行处理多个待编码图像块,从而提高编码效率。但由于当前待编码图像块对相邻图像块的重建像素存在很强的依赖性,可能其中一个流水级在处理当前待编码图像块需要用到相邻图像块的重建像素时,另一个流水级正在处理该相邻图像块,此时相邻图像块的重建像素还未生成,使得芯片无法实现并行化,导致最终实现的芯片处理速度及编码效率不足。对此,相关技术的做法是采用相邻图像块的原始像素替代重建像素来处理,但这显然会导致预测结果的不准确。
发明内容
有鉴于此,本申请的目的之一是提供一种视频编码方法、装置及计算机可读存储介质。
第一方面,本申请实施例提供了一种视频编码方法,包括:
根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数;
基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块;
根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式;
获取所述目标预测模式对应的残差块的部分频率系数;
根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用。
第二方面,本申请实施例提供了一种视频编码装置,包括一个或多个处理器,单独的或共同的工作,所述处理器包括多个流水级;以及用于存储可执行指令的存储器;
所述处理器在执行所述可执行指令时,在其中一个流水级中执行以下步骤:
根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数;
基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块;
根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式;
获取所述目标预测模式对应的残差块的部分频率系数;
根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用。
第三方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现第一方面所述的方法。
本申请实施例所提供的一种视频编码方法、装置及计算机可读存储介质,能够提 高预测结果的准确性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的一种视频通信***的示意图;
图2是本申请一个实施例提供的一种视频编码过程的示意图;
图3是本申请一个实施例提供的多个流水级并行处理的示意图;
图4是本申请一个实施例提供的一种视频编码方法的流程示意图;
图5、图6是本申请一个实施例提供的图像块划分的示意图;
图7、图8A以及图8B是本申请一个实施例提供的残差块划分的示意图;
图9是本申请一个实施例提供的另一种视频编码方法的流程示意图;
图10是本申请一个实施例提供的一种视频编码装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例中的视频编码方法应用于多媒体信息的压缩编码过程,多媒体信息包括视频、静态图像、动态图像等。本申请中以视频通信为例进行介绍,图1为典型的视频通信***示意图,如图1所示:发送端100包括视频采集装置101、视频编码装置102和发送装置103,视频采集装置101将采集到的视频发送至视频编码装置102中进行图像信息的压缩编码,然后通过发送装置103发送出去。接收端200包括接收装置201、视频显示装置203和视频解码装置202,接收装置201接收发送端100发送的压缩的视频数据,视频解码装置202对接收的视频数据进行解码恢复出图像,并将解码出来的图像显示在视频显示装置203上。本申请实施例中的视频编码方法主要应用于视频通信***中的视频编码装置上,使用帧内预测方法对多媒体信息进行压缩编 码,达到去除视频空域冗余的目的。
以下,对视频编码进行简要说明。一般来说,视频编码过程包括预测、变换、量化和熵编码等步骤,图2示出了一种编码流程图。其中预测包括帧内预测和帧间预测两种类型,其目的在于利用预测块信息来去除当前待编码图像的冗余信息。
视频内的每一帧都可以看成是一幅独立的图像,图像中会存在一定的空间冗余,比如经常在图像或视频背景中出现的天空区域,其内部像素之间就极为相似,这样的区域为图像或视频的编码提供了很大的压缩空间。帧内预测就是用来去除每一帧内部的空间冗余。帧内预测利用本帧图像的信息获得预测块数据,其过程包括将待编码图像划分成若干个待编码图像块;然后,针对每个待编码图像块,利用待编码图像块相邻的已编码图像块来生成当前待编码图像块的预测块。
为了保持视频播放的连贯性,使人眼感觉不到帧与帧之间的停顿,一般视频都会采用25帧/秒以上的帧采样率。也就是说,在时间上两个连续帧之间的时间间隔将小于1/25=0.04秒。当视频中运动物体的速度不致过快时,相邻两帧之间的相关度就会非常高,因此,会形成帧与帧之间的时间冗余。帧间预测就是用来去除帧与帧之间的时间冗余;帧间预测利用参考帧的信息获得预测块数据,其过程包括将待编码图像划分成若干个待编码图像块;然后,针对每个待编码图像块,在参考帧中搜索与当前待编码图像块最匹配的图像块作为预测块,预测块与当前待编码图像块的相对位移即为运动矢量;所述参考帧可以是与待编码图像相邻的已编码图像。
在编码中仅使用帧内预测模式的图像帧被称为I帧,同时使用帧内预测及帧间预测的图像帧被称为P或B帧。使用帧内预测或帧间预测获得预测块后,将该待编码图像块与预侧块的相应像素值相减得到残差块。
变换是将残差块从时域变换到频域上,进而能够在频域上对残差块进行进一步分析,可以使用变换矩阵对残差块进行变换。残差块的变换通常采用二维变换,即在编码端将残差块中的残差值分别与一个NxN的变换矩阵及其转置矩阵相乘,相乘之后得到的是变换系数。
视频内容来源于现实世界,不能确保其包含的全部信息都能被人眼感知,故可以针对人眼感知光信号的特性对视频做适当的精简,以去除其中的视觉冗余。量化就是用来去除基于人眼的视觉冗余;其中,经变换后得到的变换系数经量化参数量化后可得到量化系数,经量化过程可以进一步提高编码效率。
一般来说,显示世界中的信号,尤其是视频中的各种参量信号,在其对应信号控件中的取值不会遵循单一的均匀分布,而通常会在一个或几个特殊点取极大值或极小 值,在这过程中的冗余即为统计冗余。熵编码就是用来去除这种统计冗余,将量化后的系数进行熵编码,通过为概率较大的取值点分配较短码字,为概率较小的取值点分配较长码字的途径来提高压缩效率,去除统计冗余。
最后将熵编码得到的码流及进行编码后的编码模式信息,如帧内预测模式、运动矢量信息等,进行存储或发送到解码端。另外,量化后的系数通过反量化和反变换过程获取重建残差块,然后重建残差块与对应的预测块相加得到重建帧,重建帧经过环路滤波之后,作为其他待编码图像的参考帧使用,以便其他待编码图像进行帧间预测。
在图像的解码端,首先获得熵编码码流后进行熵解码,得到相应的残差块,根据解码得到的运动矢量或帧内预测等信息获取对应的预测块,根据预测块与残差块得到当前待编码图像块中各像素点的重建值。
为了提高处理速度和资源利用率,在硬件编码器中进行视频编码处理时,通常会划分多个流水级,相应的会将视频编码处理过程拆分成多个步骤,每个流水级并行处理不同的步骤,一般来说,视频编码处理过程包括预测、变换、量化、反变换、反量化、熵编码和环路滤波等步骤,其中,预测步骤又分为帧内预测和帧间预测步骤。在进行帧内预测时,由于高效率视频编码标准的帧内预测算法进行计算每个尺寸的待编码图像块的预测模式时,均依赖于其左侧及上方的已编码图像块用于产生预测块,计算每个待编码图像块(除位于图像边界处的待编码图像块外)的预测块前都必须等待其左侧及上方的图像块先编码完成。由于待编码图像块对相邻编码图像块的重建像素存在很强的依赖性,硬件编码器实现该算法时无法在帧内预测算法内部实现并行化,导致最终实现的硬件编码器处理速度及效率不足。
在一个例子中,请参阅图3,图3为一个硬件编码器的流水级划分的示意图,流水级一共分为5级,分别为整像素搜索、分像素搜索、帧内预测、模式决策、以及熵编码和滤波,其中,整像素搜索流水级和分像素搜索流水级进行的是有关帧间搜索的步骤,帧内预测流水级进行的是有关帧内搜索的步骤,模式决策流水级是对帧间搜索和帧内搜索得到的一种或多种模式进行变换、量化、反变换、反量化的步骤;其中,在帧内预测流水级中进行首次预测模式判别,遍历所有的预测模式以选出最可能预测模式集,具体来说,会遍历所有的预测模式得到采用每种预测模式下的残差块数据,根据残差块数据来获取每种预测模式对应的编码代价,将编码代价最小的几个预测模式作为最可能预测集;在模式决策流水级中进行进一步地预测模式判别,遍历所述最可能预测模式集中的预测模式以选出最优帧内预测模式。各流水级并行处理,在图3所示的例子中:
在T1时刻,第N个待编码图像块在整像素搜索流水级中进行整像素搜索处理;N为整数。
在T2时刻,第N+1个待编码图像块在整像素搜索流水级中进行整像素搜索处理,第N个待编码图像块在分像素搜索流水级中进行分像素搜索处理。
在T3时刻,第N+2个待编码图像块在整像素搜索流水级中进行整像素搜索处理,第N+1个待编码图像块在分像素搜索流水级中进行分像素搜索处理,第N个待编码图像块在帧内预测流水级中进行帧内预测处理。
在T4时刻,第N+3个待编码图像块在整像素搜索流水级中进行整像素搜索处理,第N+2个待编码图像块在分像素搜索流水级中进行分像素搜索处理,第N+1个待编码图像块在帧内预测流水级中进行帧内预测处理,第N个待编码图像块在模式决策流水级中进行模式决策处理。
在T5时刻,第N+4个待编码图像块在整像素搜索流水级中进行整像素搜索处理,第N+3个待编码图像块在分像素搜索流水级中进行分像素搜索处理,第N+2个待编码图像块在帧内预测流水级中进行帧内预测处理,第N+1个待编码图像块在模式决策流水级中进行模式决策处理,第N个待编码图像块在熵编码和滤波流水级中进行熵编码和滤波处理。
可见,在T4时刻,当第N+1个待编码图像块在帧内预测流水级中进行帧内预测处理,需要用到第N个待编码图像块的预测块,而第N个待编码图像块在模式决策流水级中进行模式决策处理,此时还无法得到第N个待编码图像块的预测块,则所述第N+1个待编码图像块无法继续处理,需要等待第N个待编码图像块的预测块,使得硬件编码器在实现帧内预测时无法在内部实现并行化,导致最终实现的硬件编码器的处理速度及效率不足。当然,可以理解的是,以上流水级的划分仅为举例说明,可根据实际需要划分不同的流水级,本实施例对此不做任何限制。
另外,为了实现并行化处理的过程,相关技术中,当第N+1个待编码图像块在帧内预测流水级中进行帧内预测处理时,由于还无法得到第N个待编码图像块的预测块,因此会采用第N个待编码图像块的原始像素来进行首次预测模式判别过程,但这显然会导致预测结果的不准确性。
基于此,本申请实施例提供了一种视频编码方法,本申请实施例的视频编码方法可应用于包括多个流水级的芯片上,所述芯片可安装于视频编码装置(或者说硬件编码器)上,由所述视频编码装置使用所述流水级进行视频编码处理,其中,本申请实施例的视频编码方法可在其中一个流水级中执行,比如在图3示例性划分的流水级中, 可在帧内预测流水级中进行。或者,本申请实施例的视频编码方法也可以在软件编码器上执行,本申请实施例对此不做任何限制。
本申请实施例提供的视频编码方法,可以在其中一个流水级中生成当前待编码图像的参考像素,比如在帧内预测流水级中进行,这样在视频编码装置的多流水级并行作业过程中,当第N个待编码图像块在帧内预测流水级中进行首次预测模式判别时,以及利用本申请实施例提供的视频编码方法生成第N个待编码图像块的参考像素之后,所述第N个待编码图像块进入下一个流水级,比如进入模式决策流水级,此时第N+1个待编码图像块进入帧内预测流水级进行处理,由于第N个待编码图像块在帧内预测流水级中生成了第N个待编码图像块的参考像素,则第N+1个待编码图像块在帧内预测流水级中进行首次预测模式判别时,可以直接利用第N个待编码图像块的参考像素进行模式判别,无需等待所述第N个待编码图像块在模式决策流水级中生成的预测块,从而保证帧内预测流水级的并行处理过程,有利于提高编码效率;而且,相对于使用相邻图像块的原始像素进行预测模式判别,使用相邻图像块的参考像素(利用本申请实施例的视频编码方法生成)来进行预测模式判别,获取的预测结果更为准确。
以下对利用本申请的视频编码方法生成所述当前编码图像块的参考像素的过程进行具体说明:请参阅图4,图4为本申请实施例提供的一种视频编码方法的流程示意图,所述方法可应用于视频编码装置上,所述方法包括:
在步骤S101中,根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数。
在步骤S102中,基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块。
在步骤S103中,根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式。
在步骤S104中,获取所述目标预测模式对应的残差块的部分频率系数。
在步骤S105中,根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用。在一个实施方式中,所述根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,包括:将获取到的DC系数直接用于生成所述当前待编码图像块的参考像素;或者将获取到的DC系数进行量化和反量化处理,并且将已量化和反量化的DC系数生成所述当前待编码图像块的参考像素。例如,可以根据所述目 标预测模式对应的预测块和所述已量化和反量化的DC系数,生成所述当前待编码图像块的参考像素。或者,可以根据所述目标预测模式对应的预测块和所述获取到的DC系数(即,未经过量化和反量化处理的DC系数),生成所述当前待编码图像块的参考像素。
对于步骤S101,所述待编码图像块是指要进行帧内预测的预测单元。在进行预测环节时使用的基本编码单元为预测单元(Prediction Unit,以下简称为PU),所有和预测相关的操作都是以PU为单位的。例如:帧内预测的方向、帧间预测的运动矢量差和参考帧索引、运动矢量预测、运动估计以及运动补偿都是基于PU进行处理的。可以理解的是,在实际应用过程中,不同的视频压缩标准,其划分的待编码图像块的大小也有所不同,其命名可能也有所不同,可依据实际应用场景进行具体设置,本申请实施例对此不做任何限制。本领域技术人员可知,当编码采用其他标准时,本申请所记载的技术方案对于类似技术问题同样适用。
所述当前待编码图像块为被划分为L×L个图像块的图像中的其中一个,L为自然数,作为视频编码的一个示例,基本编码单位是尺寸为64x64的待编码图像块,帧内预测编码模式下进一步可能划分为1个64x64的待编码图像块或4个32x32的待编码图像块,或者16个16x16的待编码图像块,或者64个8x8的待编码图像块,或者256个4x4的待编码图像块或者若干不同尺寸的待编码图像块相结合。在一个例子中,请参阅图5,图5是表示64x64的待编码图像块的划分的一例的示意图。左侧尺寸为64x64的待编码图像块在编码后最终划分为1个32x32的已编码图像块、10个16x16的已编码图像块、7个8x8的已编码图像块及4个4x4的已编码图像块。
在一实施例中,所述M种预测模块可以为帧内预测对应的所有预测模式或者部分预测模式,比如在HEVC(HighEfficiency Video Coding,一种视频压缩标准)中的帧内预测模式包括33种角度模式、DC模式和Planar模式。在对每个待编码图像块进行帧内预测时,所述M种预测模块可以指这35种预测模式,即分别利用待编码图像块相邻的已编码图像块按照35个预测模式对应的计算公式算出35个候选的预测像素块,之后再根据最优化原则从35个预测模式中选出最优的预测模式。或者,根据实际应用场景,考虑进行35种预测模式的模式判别对硬件资源和时间的消耗很大,因此,所述M种预测模式可以是这35种预测模式中的部分预测模式,比如可以从中选择出10种、15种或者20种预测模式;或者可以将帧内预测的全部预测模式分成至少2组,所述M种预测模式为其中1组对应的所有预测模式;当然,本申请对于具体的选择过程不做任何限制,可依据实际应用场景进行具体选择。
在一实施例中,所述M种预测模式可以包括帧内预测的亮度分量预测模式和/或色度分量预测模式(和/或表示两者或者两者之一)。在本实施例中,可以只对待编码图像块的色度块采用本申请实施例的视频编码方法,或者也可以只对待编码图像块的亮度块采用本申请实施例的视频编码方法,或者还可以对待编码图像块的色度块和亮度块均采用本申请实施例的视频编码方法,本申请实施例对此不做任何限制,可依据实际应用场景进行具体选择。
进一步地,所述M种预测模式可以包括帧内预测的所有亮度分量预测模式或者部分亮度分量预测模式,和/或,所有色度分量预测模式或者部分色度分量预测模式。可依据实际应用场景进行具体选择。
在一实施例中,在确定帧内预测的M种预测模式之后,所述视频编码装置利用当前待编码图像块的相邻图像块的参考像素以及各个预测模式对应的计算公式,来生成所述当前待编码图像块在各种预测模式下对应的预测图像块;其中,所述当前待编码图像块的相邻图像块的参考像素是指利用本申请实施例的视频编码方法所生成的像素。
其中,待编码图像块的尺寸可依据实际应用场景进行具体划分。请参阅图6,图6中示出3个图像块,分别为图像块10、图像块20和图像块30,进一步地,图像块还可以进一步进行划分,以图像块20为例,图像块20进一步划分为子图像块21、子图像块22、子图像块23和子图像块24。
在一个例子中,图像块10为已编码图像块,图像块20和图像块30为待编码图像块,所述当前待编码图像块是图像块20,所述当前待编码图像块的相邻图像块为图像块10,所述图像块10的参考像素为利用本申请实施例的视频编码方法所生成的像素,然后图像块20作为当前待编码图像块,可以通过本申请实施例的视频编码方法,利用相邻图像块(包括图像块10)的参考像素生成图像块20的参考像素,供下一个待编码图像块(即图像块30)使用。
在一个例子中,子图像块21为已编码图像块,子图像块22、子图像块23和子图像块24为待编码图像块,如果当前待编码图像块是子图像块22,所述当前待编码图像块的相邻图像块为子图像块21,所述子图像块21的参考像素为利用本申请实施例的视频编码方法所生成的像素,然后子图像块22作为当前待编码图像块,可以通过本申请实施例的视频编码方法,利用相邻图像块(包括子图像块21)的参考像素生成子图像块22的参考像素,供下一个待编码图像块(即子图像块23)使用。
接着,在步骤S102中,在获取所述当前待编码图像块的M个预测图像块之后,所述视频编码装置基于所述当前待编码图像块分别与所述M个预测图像块的差异,获 取M个残差块;具体来说,所述M个残差块分别为所述M个预测图像块分别和所述当前待编码图像块相减的结果,所述残差块上的各个位置上的残差值为所述当前待编码图像块相应位置上的原始像素值和所述预测图像块相应位置上的预测像素值相减的结果。
在获取所述M个残差块之后,在步骤S103中,所述视频编码装置根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的一个作为目标预测模式。本实施例中,选择编码代价最小的一个作为目标预测模式,有利于获取准确的预测结果,从而提高编码效率。
在一个例子中,所述编码代价包括但不限于以下至少一项:率失真优化(Rate DistortionOptimized,RDO)、绝对误差(Sum of Absolute Difference,SAD)、变换后的绝对误差和(Sum of Absolute Transformed Difference,SATD)、平均平方误差(MSE)、差值的平方和(sum of squared difference,SSD)、平均绝对差值(MAE)或者编码比特数。
进一步地,在步骤S104中,在确定所述目标预测模式之后,所述视频编码装置获取所述目标预测模式对应的残差块的部分频率系数,其中,所述部分频率系数可以是所述残差块对应的DC系数,或者所述部分频率系数也可以是所述残差块对应的DC系数和部分AC系数。
其中,这里需要强调的是,依据本申请实施例的视频编码方法获取的当前待编码图像块的参考像素并不等同于所述当前待编码图像块的预测块对应的最终的重建像素,这里结合图3进行说明:正因为所述当前待编码图像块的预测块对应的最终的重建像素是在模式决策流水级中获取的,其获取时间过长。与此同时,下一个待编码图像块在帧内预测流水级中需要用到所述当前待编码图像块的预测块对应的重建像素,但所述当前待编码图像块的预测块对应的重建像素还未获取到,从而造成无法并行化处理的问题,基于此,可以利用本申请的视频编码方法先在帧内预测流水级中生成所述当前待编码图像块的参考像素(例如,估计的重建像素),以避免耗费过多的编码资源以及编码时长。所述当前待编码图像块的参考像素优于所述当前待编码图像块的原始像素,但次于所述当前待编码图像块的预测块对应的最终的重建像素,从另一个角度讲,所述当前待编码图像块的参考像素(例如,估计的重建像素)的获取时长也小于所述当前待编码图像块的预测块对应的最终的重建像素的获取时长,从而避免耗费过多的编码资源,也能使下一个待编码图像块无需等待当前待编码图像块的参考像素的生成,从而有利于提高编码效率。
基于此,在一种实现方式中,考虑到如果将所述残差块从时域变换到频域上获取变换系数,可能需要耗费过长的编码时间。为了进一步减少编码时长,考虑到DC系数包括了残差块中的大部分信息,则所述视频编码装置可以在时域上获取所述目标预测模式对应的残差块的DC系数,无需将所述残差块变换(例如,通过离散正弦变换和/或离散余弦变换)到频域上去获取所述DC系数;具体而言,在时域上,所述视频编码装置统计所述目标预测模式对应的残差块中所有残差值之和、以及残差值个数,并通过两者的比值获取DC系数。本实施例中无需进行变换步骤,可以直接在时域上获取所述DC系数,从而有效减少变换的处理时长,有利于提高编码效率。
在另一种实现方式中,所述视频编码装置也可以将所述残差块从时域变换到频域上,从而在频域上获取所述残差块的DC系数和部分AC系数。本实施例中,相对于只获取DC系数,获取DC系数和部分AC系数得到的信息更多,从而使得后续生成的所述当前待编码图像块的参考像素在辅助下一个待编码图像块进行预测模式判别时更为准确。
可以理解的是,在实际应用场景中,可以根据实际需要选择要获取的系数,如果想要尽可能短的编码时长,可以选择在时域上获取所述目标预测模式对应的残差块的DC系数;如果想要预测结果更为准确,可以选择在频域上获取所述目标预测模式对应的残差块的DC系数和部分AC系数,本申请实施例对此不做任何限制。最后,在步骤S105中,所述视频编码装置根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,所述当前待编码图像块的参考像素可供下一个待编码图像块使用。本实施例中,下一个待编码图像块无需等待所述当前待编码图像块的预测块,有利于保证多个流水级的并行化处理过程的正常运行,且下一个待编码图像块使用所述当前待编码图像块的参考像素进行预测模式判别,有利于提高预测结果的准确性。
其中,待编码图像块的尺寸可依据实际应用场景进行具体划分。在一个例子中,请参阅图6,当图像块20为当前待编码图像块时,下一个待编码图像块为图像块30;当图像块21为当前待编码图像块时,下一个待编码图像块为图像块22。
在一种实现方式中,在只获取所述目标预测模式对应的残差块的DC系数的情况下,所述视频编码装置根据所述目标预测模式对应的预测图像块和所述DC系数,生成所述当前待编码图像块的参考像素;所述DC系数可直接用于生成所述当前待编码图像块的参考像素,也可以对DC系数进行量化和反量化处理,再用处理后的DC系数生成所述当前待编码图像块的参考像素。具体而言,所述视频编码装置可以将所述 预测图像块上的各个像素的像素值加上未经过量化和反量化处理、或经过量化和反量化处理后的DC系数,从而得到所述当前待编码图像块的参考像素。根据本发明的一实施方式,根据所述目标预测模式对应的预测块和所述DC系数,生成所述当前待编码图像块的参考像素包括:仅将所述目标预测模式对应的预测块上的各像素的像素值分别加上所述未经过量化和反量化处理、或经过量化和反量化处理后的DC系数的系数值,得到多个估计的重建像素的多个重建像素值;以及将所述多个估计的重建像素,作为所述当前待编码图像块的参考像素。本实施例中,由于无需经过变换步骤,能够有效缩短编码时长,提高编码效率。
在另一种实现方式中,在获取了所述目标预测模式对应的残差块的DC系数和部分AC系数的情况下,所述视频编码装置根据所述目标预测模式对应的预测块、所述DC系数和部分AC系数,生成所述当前待编码图像块的参考像素。本实施例中,由于获取了DC系数和部分AC系数,即获取到更多的信息,从而能够进一步提高预测结果的准确性。
在一实施例中,为了获取更为准确的预测结果,在获取所述目标预测模式对应的残差块的部分频率系数时,可以将所述目标预测模式对应的残差块划分为至少两个子残差块,并分别获取所述至少两个子残差块的DC系数,本实施例中,综合考虑到通常邻近像素之间的差别不会太大,将所述残差块划分为更小的块分别计算DC系数,有利于提高获取的各个子残差块的DC系数的准确性。
进一步地,在获取所述当前待编码图像块的参考像素时,可以将所述当前待编码块划分为与所述至少两个子残差块一一对应的至少两个待编码子块;以及,将所述目标预测模式对应的预测图像块划分为与所述至少两个子残差块一一对应的至少两个子预测图像块,然后对于每一个待编码子块,根据相应的子预测图像块和相应的子残差块的DC系数生成所述待编码子块的参考像素;最后根据所有待编码子块的参考像素获得所述当前待编码图像块的参考像素。本实施例中,获取每一个待编码子块的参考像素的过程可以并行进行,从而有利于提高获取所述当前待编码图像块的参考像素的效率,进一步提高编码效率。
可以理解的是,本申请实施例对于所述残差块的划分不做任何限制,在一个例子中,请参阅图7,图7包括所述当前待编码图像块100、所述目标预测模式对应的预测图像块200和所述目标预测模式对应的残差块300,所述残差块300可被划分为2个子残差块,分别为子残差块301和子残差块302,则所述当前待编码图像块100可被划分为与两个子残差块一一对应的两个待编码子块,分别为待编码子块101和待编码 子块102,所述目标预测模式对应的预测图像块200也可被划分为与两个子残差块一一对应的两个子预测图像块,分别为子预测图像块201和子预测图像块202,则待编码子块101的参考像素为子预测图像块201和子残差块301的DC系数相加的结果,待编码子块102的参考像素为子预测图像块202和子残差块302的DC系数相加的结果。其中,待编码子块101的参考像素和待编码子块102的参考像素可以并行获取,从而有利于提高获取所述当前待编码图像块的参考像素的效率,进一步提高编码效率。
或者,请参阅图8A以及8B,图8A中,所述残差块300也可以被划分成4个子残差块,图8B中,所述残差块300也可以被划分成6个子残差块,可依据实际应用场景进行具体划分,本实施例对此不做任何限制。
请参阅图9,为本申请根据一示例性实施例示出的另一种视频编码方法的流程示意图,所述方法可应用于所述视频编码装置上,所述方法包括:
在步骤S201中,根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数。与步骤S101类似,此处不再赘述。
在步骤S202中,基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块。与步骤S102类似,此处不再赘述。
在步骤S203中,根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式。与步骤S103类似,此处不再赘述。
在步骤S204中,获取所述目标预测模式对应的残差块的部分频率系数。与步骤S104类似,此处不再赘述。
在步骤S205中,将所述部分频率系数进行量化处理以及反量化处理,得到处理后的频率系数。
在步骤S206中,根据所述目标预测模式对应的预测块和处理后的频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用。在一个实施方式中,所述根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,包括:将获取到的DC系数直接用于生成所述当前待编码图像块的参考像素;或者将获取到的DC系数进行量化和反量化处理,并且将已量化和反量化的DC系数生成所述当前待编码图像块的参考像素。例如,可以根据所述目标预测模式对应的预测块和所述已量化和反量化的DC系数,生成所述当前待编码图像块 的参考像素。或者,可以根据所述目标预测模式对应的预测块和所述获取到的DC系数(即,未经过量化和反量化处理的DC系数),生成所述当前待编码图像块的参考像素。
在本实施例中,在获取了所述目标预测模式对应的残差块的部分频率系数,将所述部分频率系数进行量化处理以及反量化处理,从而得到处理后的频率系数。本实施例中,通过量化处理过程,能够去除其中的视觉冗余,从而能够进一步提高预测结果的准确性。进一步地,所述视频编码装置在根据所述目标预测模式对应的预测块和处理后的频率系数生成的所述当前待编码图像块的参考像素,使得下一个待编码图像块在使用所述当前待编码图像块的参考像素进行帧内预测时,其预测结果更为准确。
在一种实现方式中,在只获取所述目标预测模式对应的残差块的DC系数的情况下,所述视频编码装置将所述DC系数进行量化处理以及反量化处理,从而去除其中的视觉冗余,从而能够进一步提高预测结果的准确性。需要说明的是,若DC系数为定点数,则需要在量化处理和反量化处理过程中考虑量化和反量化操作对DC系数的放大作用。
在另一种实现方式中,在获取了所述目标预测模式对应的残差块的DC系数和部分AC系数的情况下,所述视频编码装置将所述DC系数和部分AC系数进行量化处理以及反量化处理,从而去除其中的视觉冗余,从而能够进一步提高预测结果的准确性。
相应的,请参阅图10,本申请实施例还提供了一种视频编码装置30,包括一个或多个处理器31,单独的或共同的工作,所述处理器31包括多个流水级;以及用于存储可执行指令的存储器32。
所述处理器31在执行所述可执行指令时,在其中一个流水级中执行以下步骤:
根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数;
基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块;
根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式;
获取所述目标预测模式对应的残差块的部分频率系数;
根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编 码图像块的参考像素,供下一个待编码图像块使用。
所述处理器31可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器32可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器32等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器32(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。存储器32可以是所述视频编码装置30的内部存储单元,例如硬盘或内存。存储器32也可以是所述视频编码装置30的外部存储设备,例如所述视频编码装置30上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器32还可以既包括所述视频编码装置30的内部存储单元也包括外部存储设备。存储器32用于存储计算机程序以及设备所需的其他程序和数据。存储器32还可以用于暂时地存储已经输出或者将要输出的数据。
在一实施例中,所述处理器31具体用于:在时域上获取所述目标预测模式对应的残差块的DC系数。
在一实施例中,所述处理器31具体用于:在时域上,统计所述目标预测模式对应的残差块中所有残差值之和、以及残差值个数,并通过两者的比值获取DC系数。
在一实施例中,所述处理器31具体用于:根据所述目标预测模式对应的预测块和所述DC系数,生成所述当前待编码图像块的参考像素。
在一实施例中,所述部分频率系数包括DC系数,或者DC系数和部分AC系数。
在一实施例中,所述处理器31还用于:将所述部分频率系数进行量化处理以及反量化处理,得到处理后的频率系数;根据所述目标预测模式对应的预测块和处理后的频率系数生成所述当前待编码图像块的参考像素。
在一实施例中,所述M个残差块分别为所述M个预测图像块分别和所述当前待编码图像块相减的结果。
在一实施例中,所述M种预测模块为帧内预测对应的所有预测模式或者部分预测模式。
在一实施例中,所述M种预测模式包括帧内预测的亮度分量预测模式和/或色度 分量预测模式。
在一实施例中,所述编码代价包括以下至少一种:所述残差块的率失真优化、绝对误差、变换后的绝对误差和、平均平方误差(MSE)、差值的平方和、平均绝对差值或者编码比特数平方和或者绝对值和。
在一实施例中,所述处理器具体用于:将所述目标预测模式对应的残差块划分为至少两个子残差块,并分别获取所述至少两个子残差块的DC系数;将所述当前待编码块划分为与所述至少两个子残差块一一对应的至少两个待编码子块;以及,将所述目标预测模式对应的预测图像块划分为与所述至少两个子残差块一一对应的至少两个子预测图像块;对于每一个待编码子块,根据相应的子预测图像块和相应的子残差块的DC系数生成所述待编码子块的参考像素;根据所有待编码子块的参考像素获得所述当前待编码图像块的参考像素。
在一实施例中,所述当前待编码图像块为被划分为L×L个图像块的图像中的其中一个,L为自然数。
本申请实施例所提供的一种视频编码方法、装置及计算机可读存储介质,能够快速生成所述当前待编码图像块的参考像素供下一个待编码图像块使用,使得下一个待编码图像块无需等待相邻图像块的重建像素的生成,从而保证各流水级的并行处理过程,有利于提高编码效率;而且,相对于使用相邻图像块的原始像素进行预测,使用相邻图像块的参考像素来进行预测,获取的预测结果更为准确。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的 相同要素。
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (30)

  1. 一种视频编码方法,其特征在于,包括:
    根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数;
    基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块;
    根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式;
    获取所述目标预测模式对应的残差块的部分频率系数;
    根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述目标预测模式对应的残差块的部分频率系数,仅包括:
    在时域上获取所述目标预测模式对应的残差块的DC系数。
  3. 根据权利要求1所述的方法,其特征在于,所述获取所述目标预测模式对应的残差块的部分频率系数,仅包括:
    在时域上,统计所述目标预测模式对应的残差块中所有残差值之和、以及残差值个数,并通过两者的比值获取DC系数。
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,包括:
    将获取到的DC系数直接用于生成所述当前待编码图像块的参考像素;或者
    将获取到的DC系数进行量化和反量化处理,并且将已量化和反量化的DC系数生成所述当前待编码图像块的参考像素。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,包括:
    根据所述目标预测模式对应的预测块和所述获取到的DC系数,生成所述当前待编码图像块的参考像素;或者
    根据所述目标预测模式对应的预测块和所述已量化和反量化的DC系数,生成所述当前待编码图像块的参考像素。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述目标预测模式对应的 预测块和所述DC系数,生成所述当前待编码图像块的参考像素,包括:
    仅将所述目标预测模式对应的预测块上的各像素的像素值分别加上所述DC系数的系数值,得到多个估计的重建像素的多个重建像素值;以及
    将所述多个估计的重建像素,作为所述当前待编码图像块的参考像素。
  7. 根据权利要求1所述的方法,其特征在于,所述部分频率系数包括DC系数,或者所述部分频率系数包括DC系数和部分AC系数。
  8. 根据权利要求1所述的方法,其特征在于,在所述根据所述目标预测模式对应的预测块和所述部分频率系数生成所述当前待编码图像块的参考块像素之前,还包括:
    将所述部分频率系数进行量化处理以及反量化处理,得到处理后的频率系数;
    所述根据所述目标预测模式对应的预测块和所述部分系数生成所述当前待编码图像块的参考像素,包括:
    根据所述目标预测模式对应的预测块和处理后的频率系数生成所述当前待编码图像块的参考像素。
  9. 根据权利要求1所述的方法,其特征在于,所述M个残差块分别为所述M个预测图像块分别和所述当前待编码图像块相减的结果。
  10. 根据权利要求1所述的方法,其特征在于,所述M种预测模块为帧内预测对应的所有预测模式或者部分预测模式。
  11. 根据权利要求1所述的方法,其特征在于,所述M种预测模式包括帧内预测的亮度分量预测模式和/或色度分量预测模式。
  12. 根据权利要求1所述的方法,其特征在于,所述编码代价包括以下至少一种:率失真优化、绝对误差、变换后的绝对误差和、平均平方误差(MSE)、差值的平方和、平均绝对差值或者编码比特数。
  13. 根据权利要求1所述的方法,其特征在于,所述获取所述目标预测模式对应的残差块的部分频率系数,包括:
    将所述目标预测模式对应的残差块划分为至少两个子残差块,并分别获取所述至少两个子残差块的DC系数;
    所述根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,包括:
    将所述当前待编码块划分为与所述至少两个子残差块一一对应的至少两个待编码子块;以及,将所述目标预测模式对应的预测图像块划分为与所述至少两个子残差块一一对应的至少两个子预测图像块;
    对于每一个待编码子块,根据相应的子预测图像块和相应的子残差块的DC系数生成所述待编码子块的参考像素;
    根据所有待编码子块的参考像素获得所述当前待编码图像块的参考像素。
  14. 根据权利要求1所述的方法,其特征在于,所述当前待编码图像块为被划分为L×L个图像块的图像中的其中一个,L为自然数。
  15. 根据权利要求1所述的方法,其特征在于,所述方法应用于包括多个流水级的芯片上;且所述方法在其中一个流水级中执行。
  16. 一种视频编码装置,其特征在于,包括一个或多个处理器,单独的或共同的工作,所述处理器包括多个流水级;以及用于存储可执行指令的存储器;
    所述处理器在执行所述可执行指令时,在其中一个流水级中执行以下步骤:
    根据帧内预测的M种预测模式,利用当前待编码图像块的相邻图像块的参考像素生成所述当前待编码图像块在所述M种预测模块下分别对应的M个预测图像块;M为自然数;
    基于所述当前待编码图像块分别与所述M个预测图像块的差异,获取M个残差块;
    根据所述M个残差块分别获取所述M种预测模块对应的编码代价,并从所述M种预测模式中确定所述编码代价最小的目标预测模式;
    获取所述目标预测模式对应的残差块的部分频率系数;
    根据所述目标预测模式对应的预测图像块和所述部分频率系数生成所述当前待编码图像块的参考像素,供下一个待编码图像块使用。
  17. 根据权利要求16所述的装置,其特征在于,所述处理器具体用于:在时域上获取所述目标预测模式对应的残差块的DC系数。
  18. 根据权利要求16所述的装置,其特征在于,所述处理器具体用于:在时域上,统计所述目标预测模式对应的残差块中所有残差值之和、以及残差值个数,并通过两者的比值获取DC系数。
  19. 根据权利要求17或18所述的装置,其特征在于,所述处理器具体用于:
    将获取到的DC系数直接用于生成所述当前待编码图像块的参考像素;或者
    将获取到的DC系数进行量化和反量化处理,并且将已量化和反量化的DC系数生成所述当前待编码图像块的参考像素。
  20. 根据权利要求19所述的装置,其特征在于,所述处理器具体用于:根据所述 目标预测模式对应的预测块和所述获取到的DC系数,生成所述当前待编码图像块的参考像素;或者
    根据所述目标预测模式对应的预测块和所述已量化和反量化的DC系数,生成所述当前待编码图像块的参考像素。
  21. 根据权利要求19所述的装置,其特征在于,所述处理器具体用于:
    仅将所述目标预测模式对应的预测块上的各像素的像素值分别加上所述DC系数的系数值,得到多个估计的重建像素的多个重建像素值;以及
    将所述多个估计的重建像素,作为所述当前待编码图像块的参考像素。
  22. 根据权利要求16所述的装置,其特征在于,所述部分频率系数包括DC系数,或者所述部分频率系数包括DC系数和部分AC系数。
  23. 根据权利要求16所述的装置,其特征在于,所述处理器还用于:将所述部分频率系数进行量化处理以及反量化处理,得到处理后的频率系数;根据所述目标预测模式对应的预测块和处理后的频率系数生成所述当前待编码图像块的参考像素。
  24. 根据权利要求16所述的装置,其特征在于,所述M个残差块分别为所述M个预测图像块分别和所述当前待编码图像块相减的结果。
  25. 根据权利要求16所述的装置,其特征在于,所述M种预测模块为帧内预测对应的所有预测模式或者部分预测模式。
  26. 根据权利要求16所述的装置,其特征在于,所述M种预测模式包括帧内预测的亮度分量预测模式和/或色度分量预测模式。
  27. 根据权利要求16所述的装置,其特征在于,所述编码代价包括以下至少一种:所述残差块的率失真优化、绝对误差、变换后的绝对误差和、平均平方误差(MSE)、差值的平方和、平均绝对差值或者编码比特数。
  28. 根据权利要求16所述的装置,其特征在于,所述处理器具体用于:
    将所述目标预测模式对应的残差块划分为至少两个子残差块,并分别获取所述至少两个子残差块的DC系数;
    将所述当前待编码块划分为与所述至少两个子残差块一一对应的至少两个待编码子块;以及,将所述目标预测模式对应的预测图像块划分为与所述至少两个子残差块一一对应的至少两个子预测图像块;
    对于每一个待编码子块,根据相应的子预测图像块和相应的子残差块的DC系数生成所述待编码子块的参考像素;
    根据所有待编码子块的参考像素获得所述当前待编码图像块的参考像素。
  29. 根据权利要求16所述的装置,其特征在于,所述当前待编码图像块为被划分为L×L个图像块的图像中的其中一个,L为自然数。
  30. 一种计算机可读存储介质,其特征在于,其上存储有计算机指令,该指令被处理器执行时实现权利要求1至15任意一项所述的方法。
PCT/CN2020/117038 2020-09-23 2020-09-23 视频编码方法、装置及计算机可读存储介质 WO2022061563A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/117038 WO2022061563A1 (zh) 2020-09-23 2020-09-23 视频编码方法、装置及计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/117038 WO2022061563A1 (zh) 2020-09-23 2020-09-23 视频编码方法、装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022061563A1 true WO2022061563A1 (zh) 2022-03-31

Family

ID=80844527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117038 WO2022061563A1 (zh) 2020-09-23 2020-09-23 视频编码方法、装置及计算机可读存储介质

Country Status (1)

Country Link
WO (1) WO2022061563A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116916033A (zh) * 2022-06-24 2023-10-20 澳门大学 一种基于随机自适应傅里叶分解的联合时空视频压缩方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109547783A (zh) * 2018-10-26 2019-03-29 西安科锐盛创新科技有限公司 基于帧内预测的视频压缩方法及其设备
CN110419214A (zh) * 2018-07-27 2019-11-05 深圳市大疆创新科技有限公司 帧内预测模式搜索方法及装置、视频编码方法及装置以及记录介质
US20200162758A1 (en) * 2018-11-19 2020-05-21 Sony Corporation Embedded codec circuitry for sub-block based encoding of quantized prediction residual levels

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110419214A (zh) * 2018-07-27 2019-11-05 深圳市大疆创新科技有限公司 帧内预测模式搜索方法及装置、视频编码方法及装置以及记录介质
CN109547783A (zh) * 2018-10-26 2019-03-29 西安科锐盛创新科技有限公司 基于帧内预测的视频压缩方法及其设备
US20200162758A1 (en) * 2018-11-19 2020-05-21 Sony Corporation Embedded codec circuitry for sub-block based encoding of quantized prediction residual levels

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116916033A (zh) * 2022-06-24 2023-10-20 澳门大学 一种基于随机自适应傅里叶分解的联合时空视频压缩方法
CN116916033B (zh) * 2022-06-24 2024-05-10 澳门大学 一种基于随机自适应傅里叶分解的联合时空视频压缩方法

Similar Documents

Publication Publication Date Title
TWI759389B (zh) 用於視訊寫碼之低複雜度符號預測
US10051273B2 (en) Video decoder and video decoding method
JP7483035B2 (ja) ビデオ復号方法並びにその、ビデオ符号化方法、装置、コンピュータデバイス及びコンピュータプログラム
CN111819852B (zh) 用于变换域中残差符号预测的方法及装置
JP5266342B2 (ja) 映像イントラ予測方法及び装置
RU2551794C2 (ru) Способ и устройство для кодирования и декодирования изображения с использованием крупной единицы преобразования
JP2019508969A (ja) 最適化されたコンパクト多重パス変換を使用する効率的な変換コーディング
JP2017192147A (ja) 有意性マップ符号化の計算量低減
WO2020103593A1 (zh) 一种帧间预测的方法及装置
KR20200096227A (ko) 블록 형상에 기초한 비디오 인코딩 및 디코딩을 위한 방법 및 장치
TW202002654A (zh) 具有群組分流位元子之係數寫碼
WO2022061563A1 (zh) 视频编码方法、装置及计算机可读存储介质
TWI825751B (zh) 視訊編碼方法和裝置
USRE47004E1 (en) Moving image coding device and method
KR20160105203A (ko) 멀티미디어 코덱, 상기 멀티미디어 코덱을 포함하는 애플리케이션 프로세서, 및 상기 애플리케이션 프로세서의 동작 방법
KR20160067580A (ko) 영상 데이터의 인코딩 방법, 상기 방법을 이용한 인코더, 및 상기 인코더를 포함하는 애플리케이션 프로세서
JP2007243859A (ja) 画像符号化装置及び画像符号化プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20954417

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20954417

Country of ref document: EP

Kind code of ref document: A1