CN117998091A

CN117998091A - Video coding method, device, equipment and storage medium

Info

Publication number: CN117998091A
Application number: CN202410209694.3A
Authority: CN
Inventors: 高敏; 陈靖
Original assignee: Shuhang Technology Beijing Co ltd
Current assignee: Shuhang Technology Beijing Co ltd
Priority date: 2024-02-26
Filing date: 2024-02-26
Publication date: 2024-05-07

Abstract

The embodiment of the application discloses a video coding method, a video coding device, video coding equipment and a storage medium. The method comprises the following steps: acquiring the length of a picture group where a target frame is located and a time mark of the target frame, wherein the time mark is used for representing a referenced grade of the target frame; determining a candidate transformation mode of the target frame according to the length of the picture group and the time mark; selecting a target prediction mode and a target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode; and compressing the target frame based on the target prediction mode and the target transformation mode. The embodiment of the application can improve the coding speed of the coder.

Description

Video coding method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer application technologies, and in particular, to a video encoding method, apparatus, device, and storage medium.

Background

Video coding is a process of converting a video signal into digital data and performing data compression while maintaining as high visual quality as possible. The goal of video coding is to reduce the storage space and transmission bandwidth requirements of video data in order to efficiently store, transmit and process video content under limited resource conditions. In video coding, a video signal is decomposed into a series of image frames or inter-frame differences (prediction residuals) and then processed and compressed by various techniques and algorithms.

In the video coding process, on the premise of ensuring the compression efficiency, the video is expected to finish coding at the fastest speed, so that play clamping is reduced, and the play is smoother. Therefore, how to effectively improve the encoding speed is a technical problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a video coding method, a video coding device, video coding equipment and a storage medium, which can improve the coding speed of an encoder.

In one aspect, an embodiment of the present invention provides a video encoding method, including:

Acquiring the length of a picture group where a target frame is located and a time mark of the target frame, wherein the time mark is used for representing a referenced grade of the target frame;

determining a candidate transformation mode of the target frame according to the length of the picture group and the time mark;

Selecting a target prediction mode and a target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode;

and compressing the target frame based on the target prediction mode and the target transformation mode.

In one embodiment, the determining the candidate transformation mode of the target frame according to the group of pictures length and the temporal identification includes:

And if the length of the picture group is greater than or equal to a first preset threshold value and the time mark is greater than a second preset threshold value, determining a non-sub-block transformation mode as a candidate transformation mode of the target frame.

if the length of the picture group is smaller than a first preset threshold value or the time mark is smaller than or equal to a second preset threshold value, a target prediction mode of a father node of a target coding unit in the target frame is obtained, wherein the target coding unit refers to any coding unit in the target frame;

And determining a candidate transformation mode of the target coding unit according to the target prediction mode of the father node.

In one embodiment, the determining the candidate transformation mode of the target coding unit according to the target prediction mode of the parent node includes:

and if the target prediction mode of the father node is the skip prediction mode, determining the non-sub-block transformation mode as the candidate transformation mode of the target coding unit.

If the length of the picture group is smaller than a first preset threshold value or the time mark is smaller than or equal to a second preset threshold value, determining a non-sub-block transformation mode and at least one sub-block transformation mode as candidate transformation modes of a target coding unit in the target frame, wherein the target coding unit refers to any coding unit in the target frame;

And aiming at any prediction mode in the at least one preset prediction mode, if the traversing sequence of the non-sub-block transformation mode is prior to the traversing sequence of any sub-block transformation mode in the process of traversing the candidate transformation mode in the any prediction mode, and the number of non-zero coefficients in the non-sub-block transformation mode is smaller than a third preset threshold, stopping traversing the at least one sub-block transformation mode.

In one embodiment, the method further comprises:

And aiming at any one of the at least one preset prediction mode, if the traversing sequence of the non-sub-block transformation mode is prior to the traversing sequence of any sub-block transformation mode in the process of traversing the candidate transformation mode under the any one prediction mode, the number of non-zero coefficients in the non-sub-block transformation mode is greater than or equal to the third preset threshold, and the last non-zero coefficient in the non-sub-block transformation mode is positioned in the preset area of the target coding unit, stopping traversing the at least one sub-block transformation mode.

In one embodiment, the method further comprises:

if the ratio of the abscissa of the last non-zero coefficient in the non-sub-block transformation mode to the width of the target coding unit is smaller than a fourth preset threshold value, and the ratio of the ordinate of the last non-zero coefficient in the non-sub-block transformation mode to the height of the target coding unit is smaller than a fifth preset threshold value, determining that the last non-zero coefficient in the non-sub-block transformation mode is located in a preset area of the target coding unit.

If the length of the picture group is smaller than a first preset threshold value or the time mark is smaller than or equal to a second preset threshold value, traversing a non-sub-block conversion mode and at least one sub-block conversion mode under the at least one preset prediction mode;

Based on the at least two prediction modes that have been traversed, it is determined whether to stop traversing at least one sub-block transform mode in the other prediction modes.

In one embodiment, the determining whether to stop traversing at least one sub-block transform mode in other prediction modes based on the traversed at least two prediction modes includes:

if the prediction mode with the minimum rate distortion cost in the traversed at least two prediction modes is skip prediction, stopping traversing at least one sub-block transformation mode in other prediction modes.

If the prediction mode with the minimum rate distortion cost in the traversed at least two prediction modes is not skip prediction, the target transformation mode in the prediction mode with the minimum rate distortion cost is a non-sub-block transformation mode, and the number of non-zero coefficients in the prediction mode with the minimum rate distortion cost is smaller than a sixth preset threshold, stopping traversing at least one sub-block transformation mode in the preset prediction mode.

In one embodiment, the preset prediction modes include one or more of a radiometric transform prediction mode and a geometrically differentiated prediction mode.

If the length of the picture group is smaller than a first preset threshold value or the time mark is smaller than or equal to a second preset threshold value, acquiring the width of a target coding unit in the target frame and the height of the target coding unit, wherein the target coding unit refers to any coding unit in the target frame;

And if the width of the target coding unit is larger than the height of the target coding unit, determining at least one vertically divided sub-block transformation mode and non-sub-block transformation mode as candidate transformation modes of the target frame.

And if the width of the target coding unit is smaller than the height of the target coding unit, determining at least one horizontal-division sub-block transformation mode and non-sub-block transformation mode as candidate transformation modes of the target frame.

In another aspect, an embodiment of the present application provides a video encoding apparatus, including:

The acquisition unit is used for acquiring the picture group length of the picture group where the target frame is located and the time mark of the target frame, wherein the time mark is used for representing the referenced grade of the target frame;

A processing unit, configured to determine a candidate transformation mode of the target frame according to the frame group length and the time identifier; selecting a target prediction mode and a target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode; and compressing the target frame based on the target prediction mode and the target transformation mode.

In another aspect, an embodiment of the present application provides a computer device, including a processor, a storage device, and a communication interface, where the processor, the storage device, and the communication interface are connected to each other, where the storage device is configured to store a computer program that supports the computer device to execute the method, the computer program includes program instructions, and the processor is configured to invoke the program instructions to perform the following steps:

In another aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the video encoding method described above.

In the embodiment of the application, the candidate transformation mode of the target frame is determined according to the picture group length of the picture group where the target frame is located and the time identification of the target frame, the target prediction mode and the target transformation mode of the target frame are selected from the candidate transformation modes in at least one preset prediction mode, and compared with the traditional traversal of the non-sub-block transformation mode and the 8-seed-block transformation mode (namely 9 transformation modes) in each prediction mode, the target prediction mode and the target transformation mode are selected.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an SBT mode provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of block partitioning provided by an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a coding tree divided into a plurality of coding units according to an embodiment of the present application;

Fig. 4 is a schematic diagram of temporal_id of each video frame in a GOP of length 16 provided by an embodiment of the application;

FIG. 5 is a schematic diagram of a skip prediction mode provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of an affine prediction mode provided by an embodiment of the application;

FIG. 7 is a schematic diagram of a partition manner of a coding unit in a geo-prediction mode according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an inter prediction process in a geo-prediction mode provided by an embodiment of the present application;

Fig. 9 is a schematic flow chart of a video encoding method according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following describes the background art related to the embodiments of the present application so as to better understand the technical solutions of the embodiments of the present application.

1. Sub-block transform

In order to improve the coding efficiency of inter prediction blocks (i.e., coding units employing inter prediction modes), the latest video coding standard multi-function video coding (VERSATILE VIDEO CODING, VVC)/H266 introduces a Sub-Block Transform (SBT) mode. The sub-block transform mode is one of transform modes, unlike the conventional transform mode, in which an encoder encodes only a part of residuals of a coding unit, and the residuals of the remaining parts are all defaults to 0. To determine the residual region that needs to be encoded, the VVC/H266 introduces 8 SBT modes. Fig. 1 gives a schematic representation of these 8 SBT modes, where the gray areas represent the residual areas to be encoded, the residuals of the white areas do not need to be encoded, they are set to 0 by default. Therefore, when a certain inter prediction block is to use the sub-block transform mode, the encoder may select an optimal SBT mode from the 8 SBT modes as the SBT mode that the inter prediction block finally uses.

Alternatively, during encoding, the encoder may check a conventional transform mode (i.e., a non-sub-block transform mode, which is referred to herein as an SBT-OFF mode) for a given block of prediction residues. In SBT-OFF mode, the encoder will encode the entire block of prediction residues, i.e. the encoder encodes all residual regions of the coding unit and calculates its rate-distortion cost. In addition, the encoder may also check the 8 SBT modes, and specifically, the encoder may encode the corresponding residual regions of the encoding unit according to the 8 SBT modes, respectively, and calculate the rate distortion cost thereof. Finally, the encoder may select a transform mode with the minimum rate distortion cost from the SBT-OFF mode and the 8 SBT modes as an optimal transform mode of the encoding unit.

2. Block partitioning in VVC

To increase the flexibility of block partitioning, VVC introduces Quad Tree (QT), binary Tree (BT) and trigeminal Tree (Ternary-Tree, TT). The binary tree and the trigeminal tree may be divided in two directions, i.e., a block division diagram shown in fig. 2 is taken as an example, a first diagram from left to right in fig. 2, i.e., the binary tree division is performed in the vertical direction, a second diagram from left to right in fig. 2, i.e., the binary tree division is performed in the horizontal direction, a third diagram from left to right in fig. 2, i.e., the trigeminal tree division is performed in the vertical direction, and a fourth diagram from left to right in fig. 2, i.e., the trigeminal tree division is performed in the horizontal direction.

In the partitioning process, each coding unit may select QT partitioning, and may also select BT partitioning or TT partitioning. Fig. 3 shows a schematic diagram of a Coding Tree Unit (CTU) divided into a plurality of Coding units (Coding units, CUs), wherein bold block edges represent QT partitions and the remaining edges represent TT and BT partitions.

In the block division process, the encoder uses the rate-distortion cost to search for the optimal block division, i.e., selects the block division with the smallest rate-distortion cost. The rate Distortion cost is D+lambda x R, D (Distortion) represents Distortion and is the difference between the original pixel and the reconstructed pixel; r (Rate) represents the code Rate, which is the bit consumed to represent the transform coefficient and mode information; lambda is a constant.

In order to calculate the distortion and code rate of each coding unit, the encoder needs to traverse all possible prediction modes for each coding unit, e.g. intra prediction modes, inter prediction modes, which may include Skip prediction modes, merge prediction modes, affine prediction modes, GEO prediction modes, etc. And transforming, quantizing, dequantizing and inversely transforming the prediction residual of each prediction mode. Since the sub-block transform mode is one of the transform modes, the encoder compares SBT-OFF with the 8 SBT modes for each prediction mode to select the optimal transform mode from among them. The 8-subblock transform mode introduced by VVC may greatly increase the temporal complexity of the encoder.

3. Picture group (Group of Pictures, GOP)

A GOP, a group of pictures, is made up of a series of consecutive video frames, where the first frame may be a key frame, such as an intra-predicted frame (I-frame) or a forward predicted frame (P-frame), and the subsequent frames may be bi-predicted frames (B-frames). In video coding, GOP is a basic unit of inter prediction. Each video frame in the GOP has a temporal_id (i.e., time stamp) to indicate the position and timing relationship of the frame in the GOP. the smaller the temporal_id is used to characterize the referenced level of the frame, the more important the frames are, because they are key frames or reference frames, which have an important impact on both video quality and compression efficiency, i.e., the smaller the temporal_id, the higher the referenced level is, the more number of frames the frame is referenced, i.e., the more important the frame is; conversely, the larger the temporal_id, the lower the referenced level that is characterized, the fewer number of frames that are referenced, i.e., the less important the frame. In the encoding process, the encoder determines the encoding sequence and the reference relation of each frame according to the size of the temporal_id so as to achieve the optimal compression effect. Meanwhile, in the decoding process, the decoder also restores the time sequence relation of each frame according to the size of the temporal_id so as to correctly play the video. Illustratively, a given GOP is divided in a dichotomy to obtain temporal_id for each frame. Fig. 4 shows a schematic diagram of temporal_id of each video frame in a GOP of length equal to 16, where the left level represents temporal_id, that is, temporal_id of 16 th frame in the GOP is 0 (i.e., level0 in fig. 4), temporal_id of 8 th frame is 1 (i.e., level1 in fig. 4), temporal_id of 4 th and 12 th frames are 2 (i.e., level2 in fig. 4), temporal_id of 2 nd, 6 th, 10 th and 14 th frames are 3 (i.e., level3 in fig. 4), temporal_id of 1 st, 3 rd, 5 th, 7 th, 9 th, 11 th, 13 th and 15 th frames are 4 (i.e., level4 in fig. 4). The 0 th frame in fig. 4 is the last frame in the last GOP.

4. Skip prediction mode

Skip prediction mode is a prediction mode used to skip the encoding process. If the difference between the current coding unit and the reference coding unit is small, the current coding unit may select skip prediction mode, skip the coding process of the current coding unit, and instead directly reference the pixel value of the coding unit as the pixel value of the current coding unit. That is, in skip prediction mode, the encoder does not transform and quantize the current coding unit, but directly copies the pixel values of the reference coding unit into the output bitstream. At the decoding end, the decoder reconstructs the pixel values of the current coding unit from the reference coding unit and the motion vectors. Because the skip prediction mode does not need to encode the current coding unit, the coding time and the code rate can be greatly reduced, and the coding efficiency is improved.

As shown in fig. 5, when the skip prediction mode is used by the coding unit indicated by the gray region, the pixel value of the reference coding unit pointed by the motion vector MV may be directly used as the reconstructed pixel value of the current coding unit.

5. Affine prediction mode

The affine prediction mode is also referred to as affine transformation prediction mode. The schematic diagram of the affine prediction mode shown in fig. 6 is an example, which is a prediction mode for describing complex motions such as rotation and scaling, in which a rectangular area of gray lines is a current coding unit and a rectangular area of black lines is a reference coding unit.

Since the motion described by the affine transformation prediction mode is relatively complex, the distribution of prediction residuals in the prediction mode is often complex. That is, in such a prediction mode, the prediction residual of each region is equally important, and therefore, SBT transform is often not an optimal transform mode.

6. Geo prediction mode

The geo prediction mode is also referred to as a geometric partition prediction mode. As shown in fig. 7, for example, a schematic diagram of a division manner of a coding unit in a geo-prediction mode is shown, and when the geo-prediction mode is used, the coding unit is divided into two irregularly shaped regions.

As an example, a schematic diagram of an inter prediction process in a geo prediction mode in which each region generates a predictive coding unit from a different reference frame is shown in fig. 8. Therefore, the prediction residual distribution in the geo mode tends to be irregular as well. Therefore, for the geo prediction mode, the transformation mode in which the SBT is very standard in the divided region is often not the optimal transformation mode.

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The video coding method provided by the embodiment of the application can be applied to a client, a server or computer equipment. The client or server may include a video player, an image encoding plug-in, etc., and may be installed or integrated in a content distribution platform or browser that may be running in a computer device. Computer devices include, but are not limited to, smartphones, cameras, wearable devices, or computers, among others.

In an example, the video encoding method provided by the embodiment of the application can be applied to a short video service scene, and the video uploaded by an creator needs to be transcoded by a server, and the transcoded video is sent to a user for watching. In order to enable a user to watch video of an creator as soon as possible, the present application expects that the video can be encoded at the fastest speed while ensuring compression efficiency in the video encoding process. Specifically, the server may obtain the video uploaded by the creator and encode the video. In the process of encoding a target frame (i.e., any frame in the video), the server may acquire a frame group length of a frame group in which the target frame is located, and a time identifier of the target frame; determining a candidate transformation mode of the target frame according to the length of the picture group and the time mark; selecting a target prediction mode and a target transformation mode of a target frame from candidate transformation modes in at least one preset prediction mode; and compressing the target frame based on the target prediction mode and the target transformation mode, thereby realizing the encoding of the target frame.

In another example, the video encoding method provided by the embodiment of the application can be applied to live broadcast service scenes, so that in order to reduce the jamming, live broadcast is smoother, and an encoder is required to encode the acquired video at the fastest encoding speed. Therefore, improving the encoding speed of the video encoder is a technical problem that needs to be solved. Specifically, the anchor client can collect the video of the environment where the anchor is located in real time, and the anchor client can send the video to the server. The server may encode the video and transmit the encoded video to the viewer client. In the process of encoding a target frame (i.e., any frame in the video), the server may acquire a frame group length of a frame group in which the target frame is located, and a time identifier of the target frame; determining a candidate transformation mode of the target frame according to the length of the picture group and the time mark; selecting a target prediction mode and a target transformation mode of a target frame from candidate transformation modes in at least one preset prediction mode; and compressing the target frame based on the target prediction mode and the target transformation mode, thereby realizing the encoding of the target frame.

In particular embodiments of the present application, where user-related data, such as video or images, is involved, when embodiments of the present application are applied to particular products or technologies, user approval or consent is required, and the collection, use and processing of the related data is required to comply with local laws and regulations and standards.

Based on the above description, please refer to fig. 9, fig. 9 is a schematic flow chart of a video encoding method provided in an embodiment of the present application, where the video encoding method may be executed by a client, a server or a computer device; the video encoding method as shown in fig. 9 includes, but is not limited to, steps S901 to S904, in which:

s901, acquiring the length of a picture group where a target frame is located and a time mark of the target frame, wherein the time mark is used for representing a referenced grade of the target frame.

The target frame may refer to any frame in the received code stream, or may be any frame in the received video.

The frame group length of the frame group where the target frame is located may refer to the number of video frames included in the frame group. Taking fig. 4 as an example, assuming that the target frame is any one of the 1 st to 16 th frames in fig. 4, the GOP in which the target frame is located is the GOP shown in fig. 4, since the GOP includes 16 frames, that is, the group of pictures length of the GOP is 16.

Wherein the temporal identification of the target frame, i.e., temporal_id mentioned in the above embodiment, is one for each video frame in the GOP, the temporal_id being used to indicate the position and timing relationship of the frame in the GOP. the smaller temporal_id, the higher the characterized referenced level, the more number of frames that are referenced as reference frames, the more likely that a frame is a key frame or a reference frame, which has an important impact on both video quality and compression efficiency, and therefore the more important the frame. Conversely, the larger the temporal_id, the lower the referenced level that is characterized, the fewer number of frames that are referenced, i.e., the less important the frame. the temporal_id may be represented as the left level in fig. 4, where the temporal_id of the 16 th frame in the GOP is 0 (i.e., level0 in fig. 4), the temporal_id of the 8 th frame is 1 (i.e., level1 in fig. 4), the temporal_ids of the 4 th and 12 th frames are each 2 (i.e., level2 in fig. 4), the temporal_ids of the 2 nd, 6 th, 10 th and 14 th frames are each 3 (i.e., level3 in fig. 4), and the temporal_ids of the 1 st, 3 rd, 5 th, 7 th, 9 th, 11 th, 13 th and 15 th frames are each 4 (i.e., level4 in fig. 4).

S902, determining a candidate conversion mode of the target frame according to the length of the picture group and the time mark.

In this embodiment, intermediate information (e.g., group of pictures length and temporal identification) in the encoding process can be fully utilized to adaptively skip sub-block transform modes that are not likely to be optimal, and transform modes other than the skipped sub-block transform mode are determined as candidate transform modes for the target frame.

Optionally, the method for determining the candidate transformation modes of the target frame according to the length of the picture group and the time identifier may include: if the length of the picture group is greater than or equal to the first preset threshold value and the time mark is greater than the second preset threshold value, determining the non-sub-block transformation mode as a candidate transformation mode of the target frame.

The first preset threshold value and the second preset threshold value may be preset values, for example, the first preset threshold value may be 32 or 16, etc., and the second preset threshold value may be 4 or 3, etc.

In this embodiment, the length of the group of pictures of the GOP in which the target frame is located is greater than or equal to the first preset threshold, which indicates that the continuous video frames included in the GOP are very smooth video segments, and the prediction effect is better, so that the SBT mode is less likely to be the optimal mode. In addition, the temporal identity of the target frame being greater than the second preset threshold indicates that the referenced level of the target frame is less, i.e., the target frame is less likely to be a key frame or a reference frame, i.e., the target frame is less important. The temporal identity of the target frame being greater than the second preset threshold indicates that the referenced level of the target frame is less, i.e. the target frame is less likely to be a key frame or a reference frame, i.e. the target frame is less important, and therefore the SBT mode is less likely to be the optimal mode. Based on this, if the picture group length is greater than or equal to the first preset threshold and the temporal identification is greater than the second preset threshold, the above 8 SBT modes may be skipped, i.e., the non-sub-block transform mode is determined as a candidate transform mode for the target frame, that is, only the non-sub-block transform modes in the various possible prediction modes are traversed to determine an optimal prediction mode (i.e., the target prediction mode) and an optimal transform mode (i.e., the target transform mode) in the optimal prediction mode, which is the non-sub-block transform mode in this embodiment. The embodiment of the application can improve the encoding speed because it is not necessary to traverse 9 transform modes in each prediction mode, but only one transform mode (i.e., a non-sub-block transform mode) in each prediction mode.

On the contrary, the length of the group of pictures of the GOP where the target frame is located is smaller than the first preset threshold value, which indicates that the inter-frame prediction effect of each video frame contained in the GOP is not good, the prediction residual of each video frame is more, and the processing needs to be performed by using an SBT mode. Or the time identifier of the target frame is smaller than or equal to a second preset threshold value, which indicates that the referenced level of the target frame is larger, that is, the possibility that the target frame is a key frame or a reference frame is larger, the target frame has important influence on video quality and compression efficiency, that is, the target frame is important, and then in the process of encoding the target frame, the optimal transformation mode needs to be determined from the non-sub-block transformation mode and the 8 SBT modes so as to ensure the encoding quality of the target frame. Based on this, if the group of pictures length of the GOP in which the target frame is located is smaller than the first preset threshold value, or the time identifier of the target frame is smaller than or equal to the second preset threshold value, the SBT mode is not skipped when determining the optimal conversion mode of the target frame, and encoding is performed in any one of the following eight modes.

1. The SBT mode of the target coding unit is skipped according to the mode information of the parent node of the target coding unit.

Specifically, if the length of the picture group is smaller than a first preset threshold value or the time identifier is smaller than or equal to a second preset threshold value, a target prediction mode of a father node of a target coding unit in a target frame is obtained, wherein the target coding unit refers to any coding unit in the target frame; and determining a candidate transformation mode of the target coding unit according to the target prediction mode of the father node.

Wherein, the father node of the target coding unit refers to the coding block to which the target coding unit belongs. Since the target frame may be recursively divided into smaller sized code blocks in one or more of quadtree, binary, or trigeminal tree partitions, up to the smallest sized code block, i.e., the coding unit. As shown in fig. 3, the 64x64 super-block is partitioned into 4 32x32 encoded blocks, the first 32x32 encoded block is partitioned into 4 16x16 encoded blocks, wherein the first 16x16 encoded block is partitioned into 48 x8 encoded blocks (labeled 1-4 in fig. 3), each 8x8 encoded block no longer continues to partition; the second 16x16 coding block is partitioned into 28 x16 coding blocks (labeled 5, 6 in fig. 3); the third 16x16 coded block (labeled 7 in fig. 3) is no longer partitioned; the fourth 16x16 coding block is partitioned into 24 x16 coding blocks (labeled 8, 10 in fig. 3) and 18 x16 coding block (labeled 9 in fig. 3). Assuming that the target coding unit is coding block 1, coding block 2, coding block 3 or coding block 4 in fig. 3, the parent node of the target coding unit refers to the first 16x16 coding block; assuming that the target coding unit is coding block 5 or coding block 6 in fig. 3, the parent node of the target coding unit refers to the second 16x16 coding block; assuming that the target coding unit is coding block 7 in fig. 3, the parent node of the target coding unit refers to the first 32x32 coding block; assuming that the target coding unit is coding block 8, coding block 9 or coding block 10 in fig. 3, the parent node of the target coding unit refers to the fourth 16x16 coding block.

In this embodiment, the candidate transformation mode of the target coding unit may be determined according to the optimal prediction mode of the parent node of the target coding unit, i.e., whether to skip the SBT mode of the target coding unit according to whether the optimal prediction mode of the parent node of the target coding unit is the first prediction mode. Specifically, if the target prediction mode of the parent node of the target coding unit is the first prediction mode, it may be determined to skip the SBT mode of the target coding unit, that is, the candidate transform mode of the target coding unit is the non-sub-block transform mode; if the target prediction mode of the parent node of the target coding unit is not the first prediction mode, then the encoding may be performed in any of the following seven ways.

Alternatively, the first prediction mode may be a skip prediction mode, that is, the manner of determining the candidate transform mode of the target coding unit according to the target prediction mode of the parent node may include: if the target prediction mode of the parent node is the skip prediction mode, the non-sub-block transform mode is determined as a candidate transform mode of the target coding unit.

In this embodiment, since skip prediction mode refers to directly taking the reconstructed pixel value of the reference coding block as the reconstructed pixel value of the current coding block, no prediction residual is required to be added, i.e. the residual is 0. Therefore, when the optimal prediction mode of the father node of the target coding unit is the skip prediction mode, the content of the father node is indicated to be simpler, and the residual error is concentrated at one corner, so that the target coding unit does not need to check the SBT mode, that is, the SBT mode is less likely to be the optimal transformation mode of the target coding unit, and therefore, in the case that the target prediction mode of the father node of the target coding unit is the skip prediction mode, the candidate transformation mode of the target coding unit can be determined to be the non-sub-block transformation mode. That is, only non-sub-block transform modes in the various possible prediction modes are traversed to determine an optimal prediction mode (i.e., a target prediction mode) and an optimal transform mode (i.e., a target transform mode) in the optimal prediction mode, which in this embodiment is the non-sub-block transform mode. The embodiment of the application can improve the encoding speed because it is not necessary to traverse 9 transform modes in each prediction mode, but only one transform mode (i.e., a non-sub-block transform mode) in each prediction mode.

2. In traversing each prediction mode by the target coding unit, if the encoder has checked a non-sub-block transform mode before checking the SBT mode and the number of non-zero coefficients in the non-sub-block transform mode is less than a third preset threshold, the target coding unit will not check the SBT mode.

Specifically, if the length of the picture group is smaller than a first preset threshold value or the time identifier is smaller than or equal to a second preset threshold value, determining a non-sub-block transformation mode and at least one sub-block transformation mode as candidate transformation modes of a target coding unit in a target frame, wherein the target coding unit refers to any coding unit in the target frame; and aiming at any one of at least one preset prediction mode, in the process of traversing the candidate conversion modes in any one prediction mode, stopping traversing at least one sub-block conversion mode if the traversing sequence of the non-sub-block conversion modes is prior to the traversing sequence of any sub-block conversion mode and the number of non-zero coefficients in the non-sub-block conversion modes is smaller than a third preset threshold value.

For example, the at least one preset prediction mode refers to at least one prediction mode to be traversed by the target coding unit, such as an intra prediction mode, skip prediction mode, merge prediction mode, affine prediction mode, geo prediction mode, and so on. In the process that the target coding unit traverses various possible prediction modes, if the target coding unit firstly checks a non-sub-block transformation mode in the prediction mode for any prediction mode, then checks an 8-sub-block transformation mode in the prediction mode, namely, firstly codes all residual areas of the target coding unit and calculates the rate distortion cost, and then codes corresponding residual areas of the target coding unit according to the 8-sub-block transformation mode and calculates the rate distortion cost. Then in the process of checking the non-sub-block transform mode in the prediction mode, it may be determined whether the number of non-zero coefficients in the non-sub-block transform mode is less than a third preset threshold, and if the number of non-zero coefficients in the non-sub-block transform mode is less than the third preset threshold, the checking of the 8-seed block transform mode is stopped. If the number of non-zero coefficients in the non-sub-block transform mode is greater than or equal to the third preset threshold, then encoding may be performed in any of seven other ways.

Wherein non-zero coefficients refer to varying quantized coefficients that are not zero. After the prediction residual is transformed and quantized, a transformed and quantized coefficient is generated, and the transformed and quantized coefficient is a non-zero coefficient, which indicates that the residual exists in the region where the transformed and quantized coefficient is located in the target coding unit. The number of non-zero coefficients is smaller than the third preset threshold, which indicates that there is a smaller area where the residual exists, i.e. the content of the target coding unit is simpler, then the target coding unit does not need to check the SBT mode, i.e. the SBT mode is less likely to be the optimal transform mode of the target coding unit, so that in the case that the traversal order of the non-sub-block transform modes precedes the traversal order of any sub-block transform mode, and the number of non-zero coefficients in the non-sub-block transform modes is smaller than the third preset threshold, it can be determined that the candidate transform mode of the target coding unit in the prediction mode is the non-sub-block transform mode. That is, only the non-sub-block transform mode in the prediction mode is traversed. Since it is not necessary to traverse the 9 transform modes in the prediction mode, but only one transform mode (i.e., a non-sub-block transform mode) in the prediction mode, the embodiment of the present application can increase the encoding speed.

3. During the traversal of each prediction mode by the target coding unit, if the encoder has checked the non-sub-block transform mode before checking the SBT mode, the number of non-zero coefficients in the non-sub-block transform mode is greater than or equal to a third preset threshold, and the last non-zero coefficient in the non-sub-block transform mode is located in a preset area of the target coding unit, the SBT mode will not be checked any more.

Specifically, for any one of at least one preset prediction mode, if the traversal sequence of the non-sub-block transform mode precedes the traversal sequence of any sub-block transform mode in the process of traversing the candidate transform mode in any one prediction mode, the number of non-zero coefficients in the non-sub-block transform mode is greater than or equal to the third preset threshold, and the last non-zero coefficient in the non-sub-block transform mode is located in a preset area of the target coding unit, the traversal of at least one sub-block transform mode is stopped.

For example, the at least one preset prediction mode refers to at least one prediction mode to be traversed by the target coding unit, such as an intra prediction mode, skip prediction mode, merge prediction mode, affine prediction mode, gEO prediction mode, and so on. In the process of traversing various possible prediction modes by the target coding unit, for any prediction mode, if the target coding unit firstly checks a non-sub-block transformation mode in the prediction mode and then checks an 8-sub-block transformation mode in the prediction mode, in the process of checking the non-sub-block transformation mode in the prediction mode, whether the number of non-zero coefficients in the non-sub-block transformation mode is smaller than a third preset threshold value can be judged, if the number of non-zero coefficients in the non-sub-block transformation mode is larger than or equal to the third preset threshold value, whether the last non-zero coefficient in the non-sub-block transformation mode is located in a preset area of the target coding unit can be further judged, and if the last non-zero coefficient in the non-sub-block transformation mode is located in the preset area of the target coding unit, the check of the 8-sub-block transformation mode can be stopped. If the last non-zero coefficient in the non-sub-block transform mode is not located in the preset area of the target coding unit, any one of the other seven modes may be used for coding.

The number of non-zero coefficients is greater than or equal to a third preset threshold value, which indicates that there are more areas where residues exist, but if the last non-zero coefficient in the non-sub-block transform mode is located in a preset area of the target coding unit, which indicates that the residues are concentrated in the preset area of the target coding unit, that is, the content of the target coding unit is simpler, the target coding unit does not need to check the SBT mode, that is, the SBT mode is less likely to be the optimal transform mode of the target coding unit, so that in the case that the traversal order of the non-sub-block transform mode precedes the traversal order of any sub-block transform mode, the number of non-zero coefficients in the non-sub-block transform mode is greater than or equal to the third preset threshold value, and the last non-zero coefficient in the non-sub-block transform mode is located in the preset area of the target coding unit, the candidate transform mode of the target coding unit in the prediction mode can be determined to be the non-sub-block transform mode. That is, only the non-sub-block transform mode in the prediction mode is traversed. Since it is not necessary to traverse the 9 transform modes in the prediction mode, but only one transform mode (i.e., a non-sub-block transform mode) in the prediction mode, the embodiment of the present application can increase the encoding speed.

Optionally, if the ratio of the abscissa of the last non-zero coefficient in the non-sub-block transform mode to the width of the target coding unit is smaller than a fourth preset threshold, and the ratio of the ordinate of the last non-zero coefficient in the non-sub-block transform mode to the height of the target coding unit is smaller than a fifth preset threshold, determining that the last non-zero coefficient in the non-sub-block transform mode is located in the preset area of the target coding unit.

Exemplary, the position (x, y) of the last non-zero coefficient in the non-sub-block transform mode satisfiesIt indicates that the last non-zero coefficient in the non-sub-block transform mode is located in a preset region of the target coding unit, where W represents the width of the target coding unit, H represents the height of the target coding unit, x represents the abscissa of the last non-zero coefficient in the non-sub-block transform mode, and y represents the ordinate of the last non-zero coefficient in the non-sub-block transform mode.

4. The SBT mode in the subsequent prediction modes is skipped according to the coding information in the prediction modes that have been checked.

Specifically, if the length of the picture group is smaller than a first preset threshold value or the time mark is smaller than or equal to a second preset threshold value, traversing at least one non-sub-block conversion mode and at least one sub-block conversion mode under a preset prediction mode; based on the at least two prediction modes that have been traversed, it is determined whether to stop traversing at least one sub-block transform mode in the other prediction modes.

Optionally, if the prediction mode with the minimum rate distortion cost in the traversed at least two prediction modes is the skip prediction mode, the traversing of at least one sub-block transformation mode in other prediction modes is stopped. That is, if the optimal prediction mode among the prediction modes that have been checked is a skip prediction mode, the target encoding unit will skip the SBT mode of the subsequent prediction modes.

For example, the at least one preset prediction mode refers to at least one prediction mode to be traversed by the target coding unit, such as an intra prediction mode, skip prediction mode, merge prediction mode, affine prediction mode, geo prediction mode, and so on. At least one prediction mode may be traversed according to a traversing order, and a rate-distortion cost thereof may be calculated, if at least two prediction modes have been traversed and a skip prediction mode is included in the traversed prediction modes, the rate-distortion costs corresponding to the respective prediction modes that have been traversed may be compared, and if the rate-distortion cost is minimum, the skip prediction mode indicates that the prediction mode with the minimum rate-distortion cost of the at least two prediction modes that have been traversed is the skip prediction mode, then for other prediction modes that have not been traversed, only non-sub-block transform modes in the other prediction modes may be traversed.

Since skip prediction mode refers to directly taking the reconstructed pixel value of the reference coding block as the reconstructed pixel value of the current coding block, no prediction residual is required to be added, i.e. the residual is 0. Therefore, if the prediction mode with the minimum rate distortion cost of the traversed at least two prediction modes is the skip prediction mode, which indicates that the content of the target coding unit is simpler, the target coding unit does not need to check the SBT mode in other prediction modes, that is, the SBT mode is less likely to be the optimal transformation mode of the target coding unit, so that in the case that the prediction mode with the minimum rate distortion cost of the traversed at least two prediction modes is the skip prediction mode, only the non-sub-block transformation modes in other non-traversed prediction modes can be traversed. The embodiment of the application can improve the encoding speed because the 9 transformation modes in other prediction modes are not required to be traversed, but only one transformation mode (namely, a non-sub-block transformation mode) in other prediction modes is required to be traversed.

Alternatively, if the prediction mode with the minimum rate distortion cost of the traversed at least two prediction modes is not the skip prediction mode, any one of the other seven modes may be used for encoding.

5. If the optimal prediction mode of the already checked prediction modes is not skip prediction mode, the transform mode in the optimal prediction mode of the already checked prediction modes is not SBT mode, and the number of non-zero coefficients in the optimal prediction mode of the already checked prediction modes is smaller than the sixth preset threshold, the target encoding unit will skip the SBT mode in the preset prediction mode.

Specifically, if the prediction mode with the minimum rate distortion cost in the traversed at least two prediction modes is not the skip prediction mode, the target transformation mode in the prediction mode with the minimum rate distortion cost is a non-sub-block transformation mode, and the number of non-zero coefficients in the prediction mode with the minimum rate distortion cost is smaller than a sixth preset threshold, stopping traversing at least one sub-block transformation mode in the preset prediction mode.

For example, the at least one preset prediction mode refers to at least one prediction mode to be traversed by the target coding unit, such as an intra prediction mode, skip prediction mode, merge prediction mode, affine prediction mode, geo prediction mode, and so on. At least one prediction mode may be traversed according to a traversing order, and a rate-distortion cost thereof may be calculated, if at least two prediction modes have been traversed and a skip prediction mode is included in the traversed prediction modes, the rate-distortion costs corresponding to the respective prediction modes that have been traversed may be compared, if the rate-distortion cost is not the skip prediction mode, the prediction mode with the minimum rate-distortion cost of the at least two prediction modes that have been traversed may be determined, and a transform mode (i.e., a target transform mode) with the minimum rate-distortion cost in the determined prediction mode may be determined, if the transform mode with the minimum rate-distortion cost in the determined prediction mode is a non-sub-block transform mode, it may be determined whether the number of non-zero coefficients in the prediction mode with the minimum rate-distortion cost is smaller than a sixth preset threshold, and if the number of non-zero coefficients in the prediction mode with the minimum rate-distortion cost is smaller than the sixth preset threshold, the traversing of at least one sub-block transform mode in the preset prediction mode may be stopped.

Alternatively, the preset prediction modes may include one or more of a radiation transform prediction mode and a geometry discrimination prediction mode.

In this embodiment, since the motion described by the affine transformation prediction mode is relatively complex, the distribution of prediction residuals in this prediction mode tends to be also complex. That is, in such a prediction mode, the prediction residual of each region is equally important, and therefore, SBT transform is often not an optimal transform mode. In addition, in the geo prediction mode, each region generates a prediction encoding unit from a different reference frame. Therefore, the prediction residual distribution in the geo mode tends to be irregular as well. Therefore, for the geo prediction mode, the transformation mode in which the SBT is very standard in the divided region is often not the optimal transformation mode. Based on this, if the prediction mode with the minimum rate-distortion cost among the at least two traversed prediction modes is not the skip prediction mode, the target transform mode in the prediction mode with the minimum rate-distortion cost is a non-sub-block transform mode, and the number of non-zero coefficients in the prediction mode with the minimum rate-distortion cost is smaller than the sixth preset threshold, the SBT mode is less likely to be the optimal transform mode of the target coding unit in the radiation transform prediction mode and the geometry discrimination prediction mode, so that the prediction mode with the minimum rate-distortion cost among the at least two traversed prediction modes is not the skip prediction mode, the target transform mode in the prediction mode with the minimum rate-distortion cost is a non-sub-block transform mode, and the number of non-zero coefficients in the prediction mode with the minimum rate-distortion cost is smaller than the sixth preset threshold, only the non-sub-block transform mode in the radiation transform prediction mode and the geometry discrimination prediction mode can be traversed. The embodiment of the application can improve the encoding speed because it is not necessary to traverse 9 transform modes in the radiation transform prediction mode and the geometry discrimination prediction mode, but only one transform mode (i.e., a non-sub-block transform mode) in the radiation transform prediction mode and the geometry discrimination prediction mode.

Alternatively, if the optimal prediction mode of the prediction modes that have been checked is not the skip prediction mode and the transform mode in the optimal prediction mode of the prediction modes that have been checked is the SBT mode, the target encoding unit may encode in any one of the following three ways. Or if the optimal prediction mode of the checked prediction modes is not skip prediction mode, the transform mode in the optimal prediction mode of the checked prediction modes is not SBT mode, and the number of non-zero coefficients in the optimal prediction mode of the checked prediction modes is greater than or equal to a sixth preset threshold, the target encoding unit may encode in any one of seven other modes.

6. If the width W of the target coding unit is greater than the height H of the target coding unit, i.e. the target coding unit is a horizontal rectangular block, the target coding unit will skip the horizontally split SBT modes, i.e. the sbt_hor_half_pos0 mode, sbt_hor_half_pos1 mode and sbt_hor_quad_pos0 mode, sbt_hor_quad_pos1 mode in fig. 1.

Specifically, if the length of the frame group is smaller than a first preset threshold value or the time identifier is smaller than or equal to a second preset threshold value, the width of a target coding unit in the target frame and the height of the target coding unit are obtained, wherein the target coding unit refers to any coding unit in the target frame; and if the width of the target coding unit is larger than the height of the target coding unit, determining at least one vertically divided sub-block transformation mode and non-sub-block transformation mode as candidate transformation modes of the target frame.

The inventors have found that the target coding unit is originally horizontally divided, and thus it is not necessary to horizontally divide again at the time of transformation. Based on this, in the case where the width of the target coding unit is greater than the height of the target coding unit, the horizontally split SBT mode in each prediction mode is skipped. Since it is not necessary to traverse 9 transform modes in each prediction mode, but only five transform modes (i.e., a non-sub-block transform mode and four vertically partitioned SBT modes) in each prediction mode, the embodiment of the present application can improve the encoding speed.

Alternatively, if the width of the target coding unit is less than or equal to the height of the target coding unit, the target coding unit may be coded in any of the other seven manners.

7. If the width W of the target coding unit is smaller than the height H of the target coding unit, i.e. the target coding unit is a vertical rectangular block, the target coding unit will skip the vertically split SBT modes, i.e. SBT_VER_HALF_POS0 mode, SBT_VER_HALF_POS1 mode and SBT_VER_QUAD_POS0 mode, SBT_VER_QUAD_POS1 mode in FIG. 1.

Specifically, if the length of the frame group is smaller than a first preset threshold value, or the time identifier is smaller than or equal to a second preset threshold value, the width of a target coding unit in the target frame and the height of the target coding unit are obtained, wherein the target coding unit refers to any coding unit in the target frame; and if the width of the target coding unit is smaller than the height of the target coding unit, determining at least one horizontal-division sub-block transformation mode and non-sub-block transformation mode as candidate transformation modes of the target frame.

The inventors have found that the original target coding unit is vertically divided, and thus it is not necessary to vertically divide again at the time of transformation. Based on this, in the case where the width of the target coding unit is smaller than the height of the target coding unit, the vertically-divided SBT mode in each prediction mode is skipped. Since it is not necessary to traverse 9 transform modes in each prediction mode, but only five transform modes (i.e., a non-sub-block transform mode and four horizontally partitioned SBT modes) in each prediction mode, the embodiment of the present application can improve the encoding speed.

Alternatively, if the width of the target coding unit is greater than or equal to the height of the target coding unit, the target coding unit may be coded in any of seven other manners.

8. If the length of the group of pictures of the GOP where the target frame is located is smaller than a first preset threshold value, or if the time mark of the target frame is smaller than or equal to a second preset threshold value, further judging whether the target coding unit of the target frame meets any one of the seven modes, if the target coding unit does not meet the seven modes, determining a target prediction mode and a target transformation mode in a traditional mode, namely traversing 9 transformation modes in at least one preset prediction mode, and determining a prediction mode with the minimum rate-distortion cost and a transformation mode with the minimum rate-distortion cost in the prediction mode.

S903, selecting a target prediction mode and a target transformation mode of the target frame from candidate transformation modes in at least one preset prediction mode.

In one implementation, if the group of pictures length is greater than or equal to a first preset threshold and the temporal identification is greater than a second preset threshold, the non-sub-block transform mode is determined to be a candidate transform mode for the target frame. Therefore, it is possible to traverse at least one non-sub-block transform mode in a preset prediction mode and calculate a rate-distortion cost thereof, thereby determining a prediction mode having the smallest rate-distortion cost as a target prediction mode of a target frame and determining the non-sub-block transform mode as a target transform mode. And further the target frame may be encoded based on the non-sub-block transform mode in the target prediction mode.

In one implementation, if the optimal prediction mode of the parent node of the target coding unit is the first prediction mode, the SBT mode of the target coding unit may be skipped. Therefore, it is possible to traverse at least one non-sub-block transform mode in a preset prediction mode and calculate the rate-distortion cost thereof, thereby determining a prediction mode having the smallest rate-distortion cost as a target prediction mode of the target coding unit and determining the non-sub-block transform mode as a target transform mode. The target coding unit may then be encoded based on the non-sub-block transform mode in the target prediction mode.

In one implementation, during the target encoding unit traversing each prediction mode, if the encoder has checked a non-sub-block transform mode before checking the SBT mode and the number of non-zero coefficients in the non-sub-block transform mode is less than a third preset threshold, the target encoding unit will not check the SBT mode anymore. Therefore, after all the prediction modes are traversed, the prediction mode with the minimum rate-distortion cost can be determined as the target prediction mode of the target coding unit, and the transformation mode with the minimum rate-distortion cost in the target prediction mode can be determined as the target transformation mode. And further, the target coding unit may be coded based on the target transform mode in the target prediction mode.

In one implementation, if the encoder has checked the non-sub-block transform mode before checking the SBT mode during traversal of each prediction mode by the target coding unit, the number of non-zero coefficients in the non-sub-block transform mode is greater than or equal to a third preset threshold, and the last non-zero coefficient in the non-sub-block transform mode is located in a preset area of the target coding unit, the SBT mode will not be checked any more. Therefore, after all the prediction modes are traversed, the prediction mode with the minimum rate-distortion cost can be determined as the target prediction mode of the target coding unit, and the transformation mode with the minimum rate-distortion cost in the target prediction mode can be determined as the target transformation mode. And further, the target coding unit may be coded based on the target transform mode in the target prediction mode.

In one implementation, if the length of the frame group is smaller than a first preset threshold value or the time identifier is smaller than or equal to a second preset threshold value, traversing the non-sub-block conversion mode and the at least one sub-block conversion mode in at least one preset prediction mode; if the prediction mode with the minimum rate distortion cost in the traversed at least two prediction modes is a skip prediction mode, stopping traversing at least one sub-block transformation mode in other prediction modes. Therefore, after all the prediction modes are traversed, the prediction mode with the minimum rate-distortion cost can be determined as the target prediction mode of the target coding unit, and the transformation mode with the minimum rate-distortion cost in the target prediction mode can be determined as the target transformation mode. And further, the target coding unit may be coded based on the target transform mode in the target prediction mode.

In one implementation, if the optimal prediction mode of the already checked prediction modes is not skip prediction mode, the transform mode in the optimal prediction mode of the already checked prediction modes is not SBT mode, and the number of non-zero coefficients in the optimal prediction mode of the already checked prediction modes is less than the sixth preset threshold, the target encoding unit will skip the SBT mode in the preset prediction mode. Therefore, after all the prediction modes are traversed, the prediction mode with the minimum rate-distortion cost can be determined as the target prediction mode of the target coding unit, and the transformation mode with the minimum rate-distortion cost in the target prediction mode can be determined as the target transformation mode. And further, the target coding unit may be coded based on the target transform mode in the target prediction mode.

In one implementation, if the width W of the target coding unit is greater than the height H of the target coding unit, i.e., the target coding unit is a horizontal rectangular block, then the target coding unit will skip the horizontally partitioned SBT mode in all prediction modes. Therefore, after all the prediction modes are traversed, the prediction mode with the minimum rate-distortion cost can be determined as the target prediction mode of the target coding unit, and the transformation mode with the minimum rate-distortion cost in the target prediction mode can be determined as the target transformation mode. And further, the target coding unit may be coded based on the target transform mode in the target prediction mode.

In one implementation, if the width W of the target coding unit is smaller than the height H of the target coding unit, i.e. the target coding unit is a vertical rectangular block, the target coding unit will skip the vertically partitioned SBT mode in all prediction modes. Therefore, after all the prediction modes are traversed, the prediction mode with the minimum rate-distortion cost can be determined as the target prediction mode of the target coding unit, and the transformation mode with the minimum rate-distortion cost in the target prediction mode can be determined as the target transformation mode. And further, the target coding unit may be coded based on the target transform mode in the target prediction mode.

In one implementation, if the target coding unit determines the target prediction mode and the target transform mode in a conventional manner, that is, traverses 9 transform modes in at least one preset prediction mode, thereby determining a prediction mode with the minimum rate-distortion cost and a transform mode with the minimum rate-distortion cost in the prediction mode, the target coding unit may be encoded based on the target transform mode in the target prediction mode.

S904, compressing the target frame based on the target prediction mode and the target transform mode.

In the embodiment of the application, the candidate transformation mode of the target frame is determined according to the picture group length of the picture group where the target frame is located and the time identification of the target frame, and the target prediction mode and the target transformation mode of the target frame are selected from the candidate transformation modes in at least one preset prediction mode, so that the picture group length of the picture group where the target frame is located and the time identification of the target frame are utilized to adaptively skip the transformation modes which cannot become the optimal mode, thereby improving the coding speed.

The embodiment of the present application also provides a computer storage medium having stored therein program instructions for implementing the corresponding method described in the above embodiment when executed.

Referring to fig. 10 again, fig. 10 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application.

In one implementation of the video encoding apparatus of the embodiment of the present application, the video encoding apparatus includes the following structure.

An obtaining unit 1001, configured to obtain a frame group length of a frame group where a target frame is located, and a time identifier of the target frame, where the time identifier is used to represent a referenced level of the target frame;

A processing unit 1002, configured to determine a candidate transformation mode of the target frame according to the frame group length and the time identifier; selecting a target prediction mode and a target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode; and compressing the target frame based on the target prediction mode and the target transformation mode.

In one embodiment, the processing unit 1002 determines a candidate transform mode for the target frame according to the group of pictures length and the temporal identification, including:

In one embodiment, the processing unit 1002 determines a candidate transform mode for the target coding unit based on the target prediction mode of the parent node, comprising:

In an embodiment, the processing unit 1002 is further configured to, for any one of the at least one preset prediction modes, in traversing the candidate transform modes in the any one prediction mode, stop traversing the at least one sub-block transform mode if the traversal order of the non-sub-block transform modes precedes the traversal order of any one sub-block transform mode, the number of non-zero coefficients in the non-sub-block transform modes is greater than or equal to the third preset threshold, and the last non-zero coefficient in the non-sub-block transform mode is located in the preset area of the target coding unit.

In an embodiment, the processing unit 1002 is further configured to determine that the last non-zero coefficient in the non-sub-block transform mode is located in the preset area of the target coding unit if the ratio of the abscissa of the last non-zero coefficient in the non-sub-block transform mode to the width of the target coding unit is less than a fourth preset threshold and the ratio of the ordinate of the last non-zero coefficient in the non-sub-block transform mode to the height of the target coding unit is less than a fifth preset threshold.

In one embodiment, the processing unit 1002 determines whether to stop traversing at least one sub-block transform mode in other prediction modes based on the at least two prediction modes that have been traversed, including:

if the prediction mode with the minimum rate distortion cost in the traversed at least two prediction modes is a skip prediction mode, stopping traversing at least one sub-block transformation mode in other prediction modes.

In the embodiment of the present application, the processing unit 1002 determines the candidate transformation mode of the target frame according to the frame group length of the frame group where the target frame is located and the time identifier of the target frame, and selects the target prediction mode and the target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode, thereby implementing to adaptively skip those transformation modes which cannot become the optimal mode by using the frame group length of the frame group where the target frame is located and the time identifier of the target frame, so as to increase the encoding speed.

Referring to fig. 11 again, fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device according to an embodiment of the present application includes a power supply module and other structures, and includes a processor 1101, a storage device 1102, and a communication interface 1103. Data may be interacted between the processor 1101, the storage device 1102 and the communication interface 1103, and a corresponding video encoding method may be implemented by the processor 1101.

The storage 1102 may include volatile memory (RAM), such as random-access memory (RAM); the storage 1102 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid state disk (solid-state drive-STATE DRIVE, SSD), etc.; the storage 1102 may also include a combination of the above types of memory.

The processor 1101 may be a central processing unit (central processing unit, CPU). The processor 1101 may also be a combination of a CPU and a GPU. In the server, a plurality of CPUs and GPUs can be included as required to perform corresponding video encoding. In one embodiment, storage 1102 is used to store program instructions. The processor 1101 may invoke program instructions to implement the various methods as referred to above in embodiments of the present application.

In a first possible implementation manner, the processor 1101 of the computer device invokes the program instructions stored in the storage 1102, to obtain a frame group length of a frame group in which the target frame is located, and a time identifier of the target frame, where the time identifier is used to characterize a referenced level of the target frame; determining a candidate transformation mode of the target frame according to the length of the picture group and the time mark; selecting a target prediction mode and a target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode; and compressing the target frame based on the target prediction mode and the target transformation mode.

In one embodiment, the processor 1101 is configured to, when determining a candidate transform mode for the target frame according to the group of pictures length and the time identifier, perform the following operations:

In one embodiment, the processor 1101 is configured to, when determining a candidate transform mode for the target coding unit according to a target prediction mode of the parent node, perform the following operations:

In one embodiment, the processor 1101 is further configured to perform the following:

In one embodiment, the processor 1101 is configured to, when determining whether to stop traversing at least one sub-block transform mode in other prediction modes based on at least two prediction modes that have been traversed, perform the following operations:

In the embodiment of the present application, the processor 1101 determines a candidate transformation mode of a target frame according to a frame group length of a frame group in which the target frame is located and a time identifier of the target frame, and selects a target prediction mode and a target transformation mode of the target frame from the candidate transformation modes in at least one preset prediction mode, thereby implementing to adaptively skip transformation modes which cannot become an optimal mode by using the frame group length of the frame group in which the target frame is located and the time identifier of the target frame, so as to increase the encoding speed.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the present application, but it is understood by those skilled in the art that all or a part of the above embodiments may be implemented and equivalent changes may be made in the claims of the present application.

Claims

1. A video encoding method, comprising:

2. The method of claim 1, wherein said determining a candidate transform mode for the target frame based on the group of pictures length and the temporal identification comprises:

3. The method of claim 1, wherein said determining a candidate transform mode for the target frame based on the group of pictures length and the temporal identification comprises:

4. The method of claim 3, wherein the determining the candidate transform mode for the target coding unit based on the target prediction mode for the parent node comprises:

5. The method of claim 1, wherein said determining a candidate transform mode for the target frame based on the group of pictures length and the temporal identification comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 6, wherein the method further comprises:

8. The method of claim 1, wherein said determining a candidate transform mode for the target frame based on the group of pictures length and the temporal identification comprises:

9. The method of claim 8, wherein the determining whether to stop traversing at least one sub-block transform mode of the other prediction modes based on the at least two prediction modes traversed comprises:

10. The method of claim 8, wherein the determining whether to stop traversing at least one sub-block transform mode of the other prediction modes based on the at least two prediction modes traversed comprises:

11. The method of claim 10, wherein the pre-set prediction modes include one or more of a radiometric transform prediction mode and a geometrically differentiated prediction mode.

12. The method of claim 1, wherein said determining a candidate transform mode for the target frame based on the group of pictures length and the temporal identification comprises:

13. The method of claim 1, wherein said determining a candidate transform mode for the target frame based on the group of pictures length and the temporal identification comprises:

14. A video encoding device, the device comprising:

15. A computer device comprising a processor, a storage device, and a communication interface, the processor, storage device, and communication interface being interconnected, wherein:

the storage device is used for storing a computer program, and the computer program comprises program instructions;

the processor being operative to invoke the program instructions to perform the video encoding method of any of claims 1 to 13.

16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program comprising program instructions for performing the video encoding method according to any of claims 1 to 13 when executed by a processor.