CN113438479B

CN113438479B - Code rate control based on CTU information entropy

Info

Publication number: CN113438479B
Application number: CN202110676346.3A
Authority: CN
Inventors: 刘志; 张萌萌
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2021-06-13
Filing date: 2021-06-13
Publication date: 2022-09-20
Anticipated expiration: 2041-06-13
Also published as: CN113438479A

Abstract

The invention provides a method for video coding and decoding, which comprises the following steps: determining first target bit allocation of a current CTU according to prediction block information of each CU and original block information of each CU in the current CTU; determining a second target bit allocation of a current CTU according to the information entropy of the current CTU; determining a third target bit allocation of the current CTU based on a weighted average of the first target bit allocation and the second target bit allocation of the current CTU; determining a Quantization Parameter (QP) to be used for the CTU according to a third target bit allocation for the current CTU.

Description

Code rate control based on CTU information entropy

Technical Field

The present invention relates to the field of image and video processing, and more particularly, to a method, apparatus and computer program product for rate control based on CTU information entropy, wherein CTU bit weights are calculated according to the information entropy, and bit allocation is implemented for CTUs according to the calculated weights.

Background

Digital video functionality may be incorporated into a variety of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called "smart phones", video teleconferencing devices, video streaming devices, and the like.

Digital Video devices implement Video Coding (Coding) techniques such as those described in standards defined by MPEG-2, MPEG-4, ITU-T h.263, ITU-T h.264/MPEG-4, part 10, Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC) standard, ITU-T h.265/High Efficiency Video Coding (HEVC), multi-function Video Coding (Versatile Video Coding) VVC (h.266), and extensions of such standards. By implementing such video encoding techniques, video devices may more efficiently transmit, receive, encode, decode, and/or store digital video information.

In 4 months 2010, two international Video coding standards organizations, VCEG and MPEG, form a Video compression joint Team JCT-VC (Joint Video coding), and together develop a high-efficiency Video coding standard.

In 2013, JCT-VC completed the development of the hevc (high efficiency video coding) standard (also known as h.265), and subsequently released multiple versions in succession.

HEVC proposes a completely new syntax element: a Coding Unit (CU) is a basic unit that performs prediction, transform, quantization, and entropy coding, a Prediction Unit (PU) is a basic unit that performs intra inter prediction, and a Transform Unit (TU) is a basic unit that performs transform and quantization. In addition, each CU defines an area that shares the same prediction mode (intra or inter).

As shown in fig. 1, in HEVC, switching between intra-prediction mode and inter-prediction mode may be performed. In both intra prediction mode and inter prediction mode, HEVC adopts a coding structure of a Coding Tree Unit (CTU), which is a basic processing unit of HEVC coding and decoding. The CTU consists of 1 luma CTB, 2 chroma CTBs and corresponding syntax elements. Fig. 2 shows the CTU structure after one LCU (largest coding unit) coding. In HEVC, an LCU may contain only one Coding Unit (CU), or may be partitioned into CUs of different sizes using a CTU quadtree structure.

There are four sizes CU in HEVC, the sizes being: 64x64, 32x32, 16x16, and 8x 8. The smaller the CU block, the deeper it is located in the CTU tree. Referred to as 2Nx2N mode (indicating that partitioning into smaller CUs is possible) when the CUs are 64x64, 32x32, and 16x16, and referred to as NxN mode (indicating that no further partitioning is possible) when the CU is 8x 8. For intra prediction, a CU is split into two partmodes (2Nx2N and NxN) depending on whether it can be split into smaller CUs or not. CUs of sizes 64x64, 32x32, and 16x16 belong to 2N × 2N, and CUs of sizes 8 × 8 belong to N × N.

In HEVC, a PU is the basic unit of intra inter prediction, the partition of the PU is CU-based, with five regular sizes 64x64, 32x32, 16x16, 8x8, and 4x 4. More specifically, the PU size is based on PartMode: the PartMode PU size for 2nx2N is the same as the CU, and the PartMode CU for N × N can be divided into four 4 × 4 sub-PUs. For CU mode of 2Nx2N, the selectable modes of intra prediction PU include 2Nx2N and NxN, and the selectable modes of inter prediction PU include 8, including 4 symmetric modes (2Nx2N, Nx2N, 2NxN, NxN) and 4 asymmetric modes (2NxnU, 2NxnD, nLx2N, nRx2N), where 2NxnU and 2NxnD are divided by a ratio of 1: 3 and 3: 1 up and down, respectively, and nLx2N and nRx2N are divided by a ratio of 1: 3 and 3: 1 left and right, respectively.

In HEVC, the lagrangian rate-distortion optimization (RDO) of h.264/AVC is still continuously used for mode selection, with its RDO calculated for each intra mode:

J＝D+λR (1)

where J is the lagrangian cost (i.e., RD-cost), D represents the distortion of the current intra mode, R represents the number of bits needed to encode all information in the current prediction mode, and λ is the lagrangian factor. Where D is typically implemented using the sum of absolute hadamard transform differences (SATD).

Processing a frame of video image requires first dividing it into multiple LCUs (64x64) and then encoding each LCU in turn. Each LCU is recursively divided in turn, which determines whether to continue the division by calculating the RD-cost for the current depth. An LCU may be divided into a minimum of 8x8 size units, as shown in fig. 2. The encoder judges whether to continue dividing or not by comparing RD-cost values of the depths, and if the sum of coding costs of 4 sub-CUs in the current depth is larger than that of the current CU, the dividing is not continued; otherwise, continuing the division until the division is finished.

Those skilled in the art will readily appreciate that since the CTU is a tree coding structure that CU partitions the LCU, the manner of CU partitioning in the CTU begins with the LCU, and thus these two terms are often used interchangeably in the art.

In intra prediction, a total of 35 prediction modes are used per PU. Using coarse mode decision (RMD), we can obtain three candidate modes for 64x64, 32x32, and 16x16 blocks and eight candidate modes for 8x8 and 4x4 blocks. The best candidate list for each PU size is obtained by merging the Most Probable Modes (MPMs) from neighboring blocks. Then, the best intra prediction mode for the current PU is selected by RDO. When intra prediction of all PUs included in the current CU is completed, intra prediction of the current CU is completed. The sub-optimal CU inner prediction completion with smaller RD-cost is selected by a comparison between the RD-cost of the current CU and the total RD-cost of the current CU and the four sub-CUs of the 4 sub-CUs thereof. When all CU partitions are completed, the current CTU intra prediction is completed. For HEVC, when coding an LCU, intra prediction of 85 CUs (one 64 × 64CU, four 32 × 32 CUs, sixteen 16 × 16 CUs, and sixty-four 8 × 8 CUs) should be performed. When a CU is encoded, intra prediction of one PU or four sub-PUs should be performed. The large number of CUs and PUs results in high complexity of intra prediction.

To develop new technologies beyond HEVC, a new organization established in 2015, the Joint Video Exploration Term, and renamed to the Joint Video Experts Term (jmet) in 2018. On the basis of HEVC, a research on multifunctional Video Coding (Versatile Video Coding) VVC (h.266) is proposed by jfet organization in san diego meeting, san diego conference, 10/4/2018, and a new generation of Video Coding technology improved on the basis of h.265/HEVC is aimed at improving existing HEVC, providing higher compression performance, and simultaneously optimizing for emerging applications (360 ° panoramic Video and High Dynamic Range (HDR) Video). The first edition of VVC was completed in 8 months of 2020 and was officially released in the ITU-T website as h.266 standard.

Relevant files and test platforms on HEVC and VVC may be obtained from https: // jvet. hhi. fraunhofer. de/, and relevant proposals for VVC can be found in http: it-sudparis. eu/jvet/get.

VVC still continues the hybrid encoding framework adopted since h.264, and the general block diagram of its VTM8 encoder is shown in fig. 1. Inter and intra prediction coding: the correlation between the time domain and the spatial domain is eliminated. Transform coding: the residual is transform coded to remove spatial correlation. Entropy coding: eliminating statistical redundancy. The VVC will focus on researching new coding tools or techniques to improve the video compression efficiency in a hybrid coding framework.

Although both VVC and HEVC use a tree structure for CTU partitioning, a tree structure CTU partitioning method different from HEVC is used for VVC. Also, compared to HEVC, the maximum size of CTUs in VVC reaches 128x 128.

As described above, in HEVC, the CTUs are partitioned into CUs (i.e., coding trees) using a quadtree structure. Decisions regarding intra-coding and inter-coding are made at leaf node CUs. In other words, one leaf node CU defines one area that shares the same prediction mode (e.g., intra prediction or inter prediction). Then, each leaf-CU may be further divided into 1, 2, or 4 prediction units PU according to the PU partition type. Within each PU, the same prediction process is used and the relevant information is sent to the decoder section on a PU basis. After the residual block is obtained by the PU-based prediction process, the leaf-CU may be divided into TUs according to another quadtree-like structure that is similar to the coding tree of the CU.

In VVC, a quadtree splitting structure with nested multi-type trees using binary trees and ternary trees is employed. One example of such a nested multi-type tree for VVCs is the quadtree-binary tree (QTBT) structure. The QTBT structure comprises two levels: a first level partitioned according to a quadtree partition, and a second level partitioned according to a binary tree partition. The root node of the QTBT structure corresponds to the CTU. Leaf nodes of the binary tree correspond to Coding Units (CUs). Different versions of CU, PU and TU are deleted in VVC. A CTU is first partitioned by a quadtree and then further partitioned by a polytype tree. As shown in fig. 3, VVC specifies 4 multi-type tree partitioning patterns: horizontal binary tree division, vertical binary tree division, horizontal ternary tree division and vertical ternary tree division. The leaf nodes of a multi-type tree are called Coding Units (CUs) and unless a CU is too large for the maximum transform length, the CU partition is used for prediction and transform processing without further partitioning. This means that in most cases, the CU, PU and TU have the same block size in the quadtree splitting structure with nested multi-type trees. The exception is that the maximum transform length supported is smaller than the width or height of the color components of the CU. Fig. 4 illustrates a particular embodiment of CTU-to-CU partitioning of a quad-tree partitioning structure with nested multi-type trees for VVC, where bold boxes represent quad-tree partitioning and the remaining edges represent multi-type tree partitioning.

After CU partitioning, the video data of the CU, which represents prediction and/or residual information, as well as other information, is encoded. The prediction information indicates how the CU is to be predicted in order to form a prediction block for the CU. The residual information typically represents the sample-by-sample difference between the samples of the CU before encoding and the samples of the prediction block.

To predict a CU, a prediction block for the CU may be typically formed by inter prediction or intra prediction. Inter-prediction typically refers to predicting a CU from data of a previously coded picture, while intra-prediction typically refers to predicting a CU from previously coded data of the same picture. To perform inter prediction, a prediction block may be generated using one or more motion vectors. A motion search may typically be performed, for example, in terms of the difference between a CU and a reference block, to identify a reference block that closely matches the CU. A difference metric may be computed using Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Mean Absolute Differences (MAD), Mean Squared Differences (MSD), or other such difference computations to determine whether the reference block closely matches the current CU. In some examples, the current CU may be predicted using unidirectional prediction or bidirectional prediction.

VVC also provides an affine motion compensation mode, which can be considered as an inter prediction mode. In affine motion compensation mode, two or more motion vectors representing non-translational motion (such as zoom-in or zoom-out, rotation, perspective motion, or other irregular motion types) may be determined.

To perform intra prediction, an intra prediction mode for generating a prediction block may be selected. VVC provides 67 intra prediction modes, including various directional modes, as well as planar and DC modes. Typically, an intra-prediction mode is selected that describes neighboring samples to a current block (e.g., a block of a CU), from which samples of the current block are predicted. Assuming that the CTUs and CUs are coded in raster scan order (left-to-right, top-to-bottom coding order, or right-to-left, top-to-bottom coding order), these samples may typically be above, and to the left, or left, of the current block in the same picture as the current block.

Data representing a prediction mode of the current block is encoded. For example, for an inter prediction mode, the video encoder 200 may encode data indicating which of various available inter prediction modes is used, as well as motion information for the corresponding mode. For uni-directional or bi-directional inter prediction, a motion vector may be encoded using Advanced Motion Vector Prediction (AMVP) or merge mode, for example. Similar modes may be used to encode motion vectors for the affine motion compensation mode.

After prediction, such as intra prediction or inter prediction of a block, residual data of the block may be calculated. Residual data, such as a residual block, represents the sample-by-sample difference between the block and a prediction block of the block formed using the corresponding prediction mode. One or more transforms may be applied to the residual block to produce transformed data in the transform domain, rather than the sample domain. For example, a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform may be applied to the residual video data. In addition, video encoder 200 may apply a second-order transform after the first-order transform, e.g., a mode-dependent non-separable second-order transform (mdsnst), a signal-dependent transform, a Karhunen-Loeve transform (KLT), etc. Transform coefficients are generated after applying one or more transforms.

As described above, after any transform to produce transform coefficients, quantization of the transform coefficients may be performed based on quantization coefficients (QPs). Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. By performing the quantization process, the bit depth associated with some or all of the coefficients may be reduced. For example, an n-bit value may be rounded to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, a bit-wise right shift of the value to be quantized may be performed. The quantized coefficients (QP) are typically included in the header information using a syntax element for the run.

After quantization, the transform coefficients may be scanned, resulting in a one-dimensional vector from a two-dimensional matrix comprising the quantized transform coefficients. The scanning may be designed to place the higher energy (and therefore lower frequency) coefficients in front of the vector and the lower energy (and therefore higher frequency) transform coefficients behind the vector. In some examples, the quantized transform coefficients may be scanned with a predefined scan order to produce a serialized vector, and then entropy encoded on the quantized transform coefficients of the vector. In other examples, an adaptive scan may be performed. After scanning the quantized transform coefficients to form a one-dimensional vector, the one-dimensional vector may also be entropy encoded, e.g., according to Context Adaptive Binary Arithmetic Coding (CABAC), a value for a syntax element that describes metadata associated with the encoded video data for use by video decoder 300 in decoding the video data.

During the encoding process, syntax data, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, or other syntax data, such as Sequence Parameter Sets (SPS), Picture Parameter Sets (PPS), or Video Parameter Sets (VPS), may be generated, for example, in a picture header, a block header, a slice header. A video decoder may similarly decode such syntax data to determine how to decode the corresponding video data. These pieces of information may be referred to as "header information".

In this way, a bitstream may be generated that includes encoded video data (e.g., syntax elements that describe a division from a picture to a block (e.g., a CU) and prediction and/or residual information for the block).

An important feature of VVC is the support for 360 degree video. The 360-degree video is mainly applied to the field of virtual reality. The virtual reality video is image information of the whole scene captured by a professional camera, and the video is spliced by software and played by special equipment. It also provides various functions for the viewer to manipulate the image, and can zoom in, zoom out, and move in various directions to view the scene, thereby simulating and reproducing the real environment of the scene. In a virtual reality system, multiple cameras capture 360 degree scenes and stitch all the scenes together into a spherical video, creating a 360 degree video. When we encode 360 degree video, we have to project spherical video into flat video to accommodate widely used coding standards such as h.264/AVC and h.265/High Efficiency Video Coding (HEVC). Various projection formats, such as equal-rectangular projection (ERP), adjusted equal-Area (AEP), Cube Map (CMP), equiangular cube map (EAC), Truncated Square Pyramid (TSP), Compact Octahedron (COHP), Compact Icosahedron (CISP) have been proposed. Of these formats, ERP is a simple and widely used format. The method maps warps to vertical lines with constant spacing, maps wefts to horizontal lines with constant spacing, and further converts spherical videos into planar videos. When the VVC supports an encoding tool called Horizontal surround-motion compensation (Horizontal-surround-motion compensation), the subjective quality of an ERP projection format can be remarkably improved.

In this study, we use the ERP projection format to illustrate various aspects of our proposed method. It should be apparent that these aspects are equally applicable to other projection formats and video coding standards for creating 360 degree video.

The concept of entropy (entrypy) originally originated in the 19 th century and was first proposed as a thermodynamic concept in 1865 by the german physicist clausius, the nature of entropy at that time not yet being explained. With the development of statistical physics, austria physicist boltzmann (l.boltzmann) combines entropy with statistical theory to further explore its essence, boltzmann believes that: entropy is a measure of the degree of disordered molecular movement disorder. In the late 40 s of the 20 th century, shannon (c.e. shannon) used for the reference of the concept of thermodynamics, and the average information quantity without redundancy in information is called information entropy, and at this time, the nature of entropy is gradually clear, and the nature of entropy is a mathematical measure of uncertainty in the system. With the development and application of entropy, it has important contributions in the fields of number theory, probability theory, control theory, life medicine and the like. The concept of entropy is wide in coverage, and the entropy can be used as a measure for measuring the degree of disorder, non-uniformity or disorder in an object or a system for different application occasions.

The information entropy is a measurement index of the information quantity or uncertainty in the system, and the uncertainty degree of the random variable is measured through probability distribution. The information entropy is put forward to lay a theoretical foundation for modern information theory. For the discrete random variable X, the total number of n possible values is n, and the outline of each value appearsA ratio of p _i The uncertainty of the random variable can be quantitatively described by using the information entropy, which is defined as:

in equation (2), when the base is 2, e, 10, the information entropy units are bit (bit), nit (nat), and decitex (dit), respectively. Typically k is 1 and the base is 2. In the calculation process, the size of the entropy value is closely related to the probability of different events, and when the event is a determination event, the entropy value takes the minimum value of 0; when the probability of all events is equal, the probability of each event occurring is p-1/n, the uncertainty is the highest, and the entropy value takes the maximum value log n. In other states, the higher the degree of deviation from the equiprobable state, the smaller the entropy value, and conversely, the higher the degree of approach of each state to the equiprobable state, the larger the entropy value.

The entropy of information, which is a mathematical measure of uncertainty, has its fundamental properties:

(1) nonnegativity:

the information amount is positive, so there are:

H(X)＝H(p ₁ ，p ₂ ，…，p _n )≥0 (3)

(2) symmetry:

variables in the information entropy are independent of the order, and changing the order does not affect the entropy value:

H(p ₁ ，p ₂ ，…，p _n )＝H(p _a1 ，p _a2 ，…，p _an ) (4)

in the formula (4), (p) _a1 ，p _a2 ，…，p _an ) Is (p) ₁ ，p ₂ ，…，p _n ) In any arrangement.

(3) Extreme property:

H(X)＝H(p ₁ ，p ₂ ，…，p _n )≤log n (5)

in the formula (5), only p is _i The equal sign is satisfied when 1/n, and the entropy value is maximum when the equal probability is distributed.

(4) Certainty:

H(1，0)＝H(1，0，0)＝H(1，0…，0)＝0 (6)

in (p) ₁ ，p ₂ ，…，p _n ) When the component p is _i When 1 is p _i log p _i This illustrates that when any one case occurs inevitably, the other case cannot occur, and the entropy value is 0.

(5) Adding property:

H(XY)＝H(X)+H(Y) (7)

the joint source entropy of the two statistically independent sources X and Y is equal to the sum of the separate entropies.

In conventional methods including HEVC and VVC, for CTU-level rate control, target bits are allocated according to Mean Absolute Difference (MAD), which is obtained through prediction, but in the case of drastic changes in image content or scene changes, large errors may occur in the prediction of the MAD.

Disclosure of Invention

Aiming at the technical problem in the CTU-level code rate control in the traditional method at present, the method is designed according to the CTU-level code rate control characteristic, and finally bit distribution is realized on the CTU according to weight.

The given target bitrate and the content characteristics of the video itself are the main factors that affect the bitrate allocation performance. The present disclosure therefore proposes a new CTU-level rate allocation scheme to improve coding performance by considering video content characteristics.

In the present disclosure, the terms "target bit allocation", "allocated bits", "target bits" are used interchangeably, and those skilled in the art will understand that these terms and other similar terms have similar meanings and refer to rate control for CTUs, frames and/or GOPs.

According to an aspect of the present invention, a method for video coding and decoding is provided, including: obtaining a current Coding Tree Unit (CTU) of a current frame; performing Coding Unit (CU) partitioning on the current CTU; determining a coding type for each CU, and obtaining prediction block information of each CU according to the determined coding type;

and determining a first target bit allocation of the current CTU according to the prediction block information of each CU and the original block information of each CU in the current CTU.

In this aspect, it is advantageous that the first target bit allocation determination level is a CTU rather than a frame, thereby reducing the granularity of the target bit allocation determination so that the determination of the target bit allocation is more consistent with local features in the video frame.

In this respect, it is advantageous that the determination level of the first target bit allocation is CTU instead of CU, so that the amount of calculation can be reduced, avoiding an unnecessarily large increase in the amount of calculation

In this aspect, it is advantageous that the first target bit allocation is made based on a degree of difference between the predicted and original frames of the current CTU, so that the determination of the target bit allocation for each CTU corresponds to a degree of coding distortion for that CTU, thereby enabling an adaptive determination of the target bit allocation for the CTU based on actual intra or inter coding conditions.

According to a further aspect, the method further comprises: determining the information entropy of the current CTU; determining a second target bit allocation of the current CTU according to the information entropy of the current CTU; determining a third target bit allocation for the current CTU based on a weighted average of the first target bit allocation and the second target bit allocation for the current CTU; determining a Quantization Parameter (QP) to be used for the CTU according to a third target bit allocation for the current CTU; applying the QP for the current CTU to respective CUs in the current CTU.

In this respect, it is advantageous that the information entropy of the CTU is taken into account in the target bit allocation determination for the CTU, in addition to the element of the difference between the original frame and the predicted frame (referred to herein as CTU), to improve the accuracy of the target bit allocation determination.

In this aspect, it is advantageous that the level of determination of the target bit allocation is CTU rather than frame, thereby reducing the granularity of the target bit allocation determination so that the determination of the target bit allocation is more consistent with local features in the video frame.

In this respect, it is advantageous that the determination level of the target bit allocation is CTU instead of CU, so that the amount of calculation can be reduced, avoiding an unnecessarily large increase in the amount of calculation.

According to another aspect of the invention, the video is a 360 degree video, and wherein determining the second target bit allocation for the current CTU is further based on an ERP video weight of the 360 degree video.

According to another aspect of the invention, the video is a 360 degree video, and wherein determining the second target bit allocation for the current CTU further takes into account a degree of distortion of the current CTU in a latitudinal direction.

According to another aspect of the invention, the video is a 360 degree video, and determining the second target bit allocation for the current CTU is further based on: a metric corresponding to a latitudinal distortion of the current CTU in the 360 degree video.

According to another aspect of the invention, the metric corresponding to the latitudinal distortion of the current CTU in the 360 degree video is based on the ERP video weight of the 360 degree video.

In this respect, it is advantageous to take into account the position of the current CTU in the latitudinal direction, and thus the degree of distortion of said current CTU in the latitudinal direction, when determining the target bit allocation.

According to another aspect of the invention, determining the first target bit allocation for the current CTU is further based on target bits of a current frame.

According to another aspect of the invention, determining the first target bit allocation for the current CTU is further based on target bits of a current GOP.

According to another aspect of the present invention, determining the first target bit allocation of the current CTU according to the prediction block information of each CU and the original block information of each CU in the current CTU further comprises: determining an average absolute difference (MAD) of the current CTU according to prediction block information of each CU and original block information of each CU in the current CTU; determining bit allocation weights for the current CTU based on the determined MAD; determining a weight of the current CTU relative to all uncoded CTUs based on bit allocation weights of the current CTU and bit allocation weights of all uncoded CTUs; determining a first target bit of the current CTU based on a weight of the current CTU relative to all uncoded CTUs.

In this aspect, it is advantageous that statistically optimal target bit allocation results can be obtained when determining the first target bit of the current CTU.

According to another aspect of the present invention, determining a Quantization Parameter (QP) to be used for the current CTU according to a third target bit allocation of the CTU further comprises: determining a bit per pixel value (bpp) of the current CTU according to a third target bit allocation of the current CTU; determining a Lagrangian factor of the current CTU according to a bit per pixel value (bpp) of the current CTU; determining the QP according to a Lagrangian factor of the current CTU.

According to an aspect of the present invention, a computing device capable of performing video coding is presented, comprising: a memory storing executable code for video coding; one or more processors or video codecs to execute executable code stored in memory for video coding to perform the video coding operations of: obtaining a current Coding Tree Unit (CTU) of a current frame; coding Unit (CU) partitioning the current CTU; determining a coding type for each CU, and obtaining prediction block information of each CU according to the determined coding type; determining first target bit allocation of the current CTU according to prediction block information of each CU and original block information of each CU in the current CTU; determining the information entropy of the current CTU; determining a second target bit allocation of the current CTU according to the information entropy of the current CTU; determining a third target bit allocation for the current CTU based on a weighted average of the first target bit allocation and the first target bit allocation for the current CTU; determining a Quantization Parameter (QP) to be used for the current CTU according to a third target bit allocation for the CTU; applying the QP for the current CTU to respective CUs in the current CTU.

According to another aspect of the invention, the computing device is one or more of a system on a chip (SOC), a computer, a server, a cloud server.

According to one aspect of the invention, a computer-readable medium is provided having stored thereon processor-executable instructions that, when executed by a processor, implement the following video encoding operations: obtaining a current Coding Tree Unit (CTU) of a current frame; coding Unit (CU) partitioning the current CTU; determining a coding type for each CU, and obtaining prediction block information of each CU according to the determined coding type; determining first target bit allocation of the current CTU according to prediction block information of each CU and original block information of each CU in the current CTU; determining the information entropy of the current CTU; determining a second target bit allocation of the current CTU according to the information entropy of the current CTU; determining a third target bit allocation for the current CTU based on a weighted average of the first target bit allocation and the first target bit allocation for the current CTU; determining a Quantization Parameter (QP) to be used for the current CTU according to a third target bit allocation for the CTU; applying the QP for the current CTU to respective CUs in the current CTU.

According to another aspect of the invention, the coefficients used to compute the lagrangian factor are adaptively updated for each frame and each CTU. Advantageously, such real-time updating enables adaptive adjustment of the lagrangian factor calculation for the real-time content of the encoded frames and CTUs, thus optimizing the calculation of QP.

Drawings

Fig. 1 shows an embodiment of a general block diagram of a generic encoder for HEVC/VVC.

Fig. 2 shows a schematic diagram of a Coding Tree (CTU) in HEVC.

Fig. 3 illustrates a multi-type tree partitioning pattern for VVC.

Fig. 4 illustrates a particular embodiment of CTU-to-CU partitioning of a quad-tree partitioning structure of a VVC with nested multi-type trees.

Fig. 5 shows a flowchart of a method for rate control at CTU level based on both information entropy and the difference between a predicted block and an original block at CTU level according to an embodiment of the present invention.

Fig. 6 shows a schematic diagram of a device for implementing the encoding method of an embodiment of the present invention.

Detailed Description

Various aspects are now described with reference to the drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It may be evident, however, that such aspect(s) may be practiced without these specific details.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, such as but not limited to hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets, e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal.

The invention provides a method, a device, a codec and a processor-readable storage medium for VVC (H.266). The invention provides a new CTU-level code rate allocation scheme by considering the video content characteristics so as to improve the coding performance. More particularly, the present invention relates to a method, an apparatus, a codec and a computer program product for rate control based on CTU information entropy, wherein CTU bit weights are calculated according to the information entropy and bit allocation is implemented to CTUs according to the calculated weights.

As detailed below, according to one embodiment of the present invention, a third target bit allocation for the current CTU may be determined based on a weighted average of the first target bit allocation and the second target bit allocation for the current CTU; and may determine a Quantization Parameter (QP) to be used for the current CTU according to a third target bit allocation for the CTU. Wherein the first target bit allocation may be based at least in part on the prediction block information of the CU and the original block information of the respective CU, and the second target bit allocation may be based at least in part on the information entropy of the current CTU.

Advantageously, the first target bit allocation is made based on the degree of difference between the predicted and original frames of the current CTU, so that the determination of the target bit allocation for each CTU corresponds to the degree of coding distortion of that CTU, thereby enabling an adaptive CTU target bit allocation determination based on actual intra or inter coding conditions. The information entropy of the CTU is taken into account in the second target bit allocation determination for the CTU, which improves the accuracy of the target bit allocation determination in addition to the element of the difference between the original frame and the predicted frame (referred to herein as CTU).

In one embodiment, the encoded video may be 360 degree video. In a preferred embodiment, the second target bit allocation may also be advantageously based at least in part on a degree of distortion of the current CTU in the latitudinal direction. In a particular embodiment, the degree of distortion of the current CTU in the latitudinal direction may be by a metric corresponding to the latitudinal distortion of the current CTU in the 360 degree video.

In a preferred embodiment, the coefficients used to compute the lagrangian factor are adaptively updated for each frame and each CTU. Advantageously, such real-time updating enables adaptive adjustment of the lagrangian factor calculation for the real-time content of the encoded frames and CTUs, thus optimizing the calculation of QP.

In VVC, the classical R- λ rate control model in HEVC is still extended, as shown in equation (1) above. The basis of the R- λ model is a hyperbolic model between R and D in video coding, as in equation (8), where C and K are model parameters.

D(R)＝CR ^-K (8)

In the R-lambda model, lambda is the slope of the R-D curve, and the derivation of equation (8) can be obtained:

where alpha and beta are parameters related to the video content. Generally, the bits to be allocated for the first frame of the video sequence need to be considered first, and the subsequent coding quality and efficiency can be greatly affected by the quality of the first frame. In practical applications, the researcher may use some a priori knowledge to set the encoding parameters of the first frame. The first frame is difficult to predict the required bits due to lack of necessary a priori information, and in order to guarantee video quality, generally, a strategy of allocating more bits to the first frame is adopted, and firstly, average bits per frame (R) is defined according to equation (10) _picAvg ) Wherein R is _tar FR is the frame rate, the target bit.

R _picAvg ＝R _tar /FR (10)

According to an embodiment of the present invention, for GOP (group of pictures) level bit allocation, generally the allocated bits for each GOP should be the same, but since the actually coded bits and the target bits always have differences, the GOP level bit allocation introduces a sliding window SW to eliminate the error between the actually coded bits and the target bits of each GOP, and the allocated bits for each GOP are calculated as shown in equation (11):

here, T _GOP For current GOP target bits (also referred to as "current GOP allocated bits" in this disclosure), R _picAvg Is the target code rate, which is equal to that in equation (4)Average bits per frame, N _coded For the number of coded frames, R _coded Number of bits consumed for encoded frame, N _GOP The number of frames contained for each GOP. The numerator of the equation is the frame target bit introduced with SW smoothing correction multiplied by the number of frames in GOP, and the denominator is the smoothing window coefficient size SW. R obtained by simplification _picAvg ·N _coded -R _coded For the error term, a correction with a length SW is required. In one embodiment, SW is a predefined sliding window size, which may be set to 40, for example.

According to one embodiment of the present invention, for frame-level bit allocation, the allocated bits for the current frame are calculated as equation (12):

here, T _pic Represents the allocated bits of the current frame (also referred to as "current frame target bits" in this disclosure), T _GOP Coded for current GOP target bit _GOP For the consumed bits of the current GOP (coded frame), ω _pic Is the bit allocation weight, omega, of a particular frame _currpic Is the bit allocation weight of the current frame. In this equation, the calculated combination of the two ω's represents the specific gravity of each frame of the picture. "allnotcoded pic" in the denominator indicates all uncoded pictures.

Note that the allocated bits of the I frame are determined by bpp (bits per pixel).

According to a specific embodiment of the present invention, for bit allocation at the CTU level, similar to the frame-level bit allocation strategy, the bits allocated to each CTU are calculated by equation (13).

Here, T _currpic Represents the current frame target bits (also referred to as "current frame allocated bits" in this disclosure), Coded _pic Representing the consumed bits, Bit, of the current frame (coded CTU) _H The bits required for encoding all header information are predicted from the bits used for the corresponding encoded frames. Omega _CTU Is the bit allocation weight of a particular CTU, and ω _currCTU The bits of the current CTU are assigned weights and calculated from the prediction error (mean absolute difference (MAD) of the original and predicted signals) of the CTU at the corresponding position of the coded frame belonging to the same hierarchical level.

According to an embodiment of the present invention, the calculation of the MAD may be as shown in equation (14). According to an embodiment of the invention, ω may be derived from the square of the computed MAD _currCTU 。

Here, P _pre And P _orl The pixel values of the predicted image CTU and the original image CTU, respectively, and N is the number of pixels in the CTU.

In the present disclosure, equation (13) considers the weights of the current CTU in all the unencoded CTUs in the frame, so it can statistically obtain the best target bit allocation result.

In this disclosure, the allocated bits T 'for each CTU derived from equation (13) may be' _CTU (alternatively, may also be referred to herein as "target bit allocation") is referred to as a first target bit allocation of the MAD-based CTU. The main advantages of calculating and using this first target bit allocation include, first, that the determination level of the target bit allocation is CTU instead of frame, thereby reducing the granularity of the target bit allocation determination, making the target bit allocation determination more consistent with local features in the video frame; second, the determination level of the target bit allocation is CTU instead of CU, so that the amount of calculation can be reduced, avoiding an unnecessarily large increase in the amount of calculation. The applicant has known from experimental results that if the determination level of the target bit allocation is reduced to CU, the average amount of computation at the time of encoding is increased by more than 10 times, while the amount of encoding bits in case of similar rate distortion is reduced by only less than 1%; third, firstThe target bit allocation is made based on the degree of difference between the predicted and original frames of the current CTU, so that the determination of the target bit allocation for each CTU corresponds to the degree of coding distortion of that CTU, thereby enabling the determination of the target bit allocation for the CTU based on the adaptation of the actual intra or inter coding situation.

Next, the information entropy of each CTU will be considered in the determination of the target bit allocation for the CTU.

The CTU is an upper level unit block of coding units in the VVC, and is typically 128 × 128 in size. When the texture of the macroblock is smooth, the distribution of the gray levels is concentrated around a certain gray level, and the entropy of the CTU is smaller. When the macro block contains a lot of fine textures and rich image details, the gray distribution of the macro block is disordered and is closer to the gray uniform distribution than the smooth block, so that the entropy value of the macro block is larger.

In one embodiment of the present invention, the formula defining the information entropy of each CTU is as follows:

wherein p is _i B is the number of bits of the video sequence, when b is 8, the CTU entropy of the 8-bit video is between 0 and 8, and when b is 10, the CTU entropy of the 10-bit video is between 0 and 10. Since 360 degree video is often encoded with 10 bits, the grey value is between 0 and 1023.

In one embodiment of the invention, the bit allocation of the CTU level is designed according to the weight of the information entropy, and is defined as T ″ _CTU It is also referred to herein as a second target bit allocation for the CTU based on CTU information entropy.

In equation (16), the weight H 'of the current CTU is designed by using the ERP projection weight and the CTU information entropy' _CTU (as in formula (17))L is a general formula II), wherein H 'in the molecule' _CTU Represents the weight of the current CTU, and H 'in the denominator' _CTU Then represents the weight of each CTU, so the denominator is all CTU weights H' _CTU And, T _Pic Being the total number of bits of a frame of picture (i.e., the number of bits of the target frame), BitH is the number of bits required for encoding all header information, and is predicted from the bits used for the corresponding encoded frame. Since the size of the CTU in the VVC is 128x128, in order to omit a pixel traversal process and reduce the encoding complexity, the point weight participation weight H 'in the pixel in the vertical direction of the CTU is taken' _CTU And (4) calculating. P in the formula (17) is the number of CTU in the vertical direction, and h in the formula (18) _pos And the ordinate of the position of the upper left corner of the CTU is the width of a 360-degree video test sequence which is an integral multiple of 128, p is an integer, N is the pixel height of a frame, and cos corresponds to the ERP projection weight.

Thus, the allocated bit T ″' for each CTU is obtained according to equation (16) _CTL′ (alternatively, also referred to herein as "target bit allocation") based on the information entropy of each CTU. And in one embodiment, T ″ _CTU Or further based on the ERP projection weights of the 360 videos. It is referred to herein as a second target bit allocation of the CTU based on the information entropy of the CTU.

The advantage of using equation (17) includes fully taking into account the latitudinal distortion of the CTU in a 360 degree video frame.

In addition, the main advantages of calculating and using this second target bit allocation include, first, taking into account the entropy of the CTU information in the target bit allocation determination for the CTU, as a supplement to the element of the difference between the original frame and the predicted frame (referred to herein as the CTU), improving the accuracy of the target bit allocation determination; the determination level of the target bit allocation is CTU instead of frame, thereby reducing the granularity of the target bit allocation determination and leading the determination of the target bit allocation to be more consistent with the local characteristics in the video frame; third, the determination level of the target bit allocation is CTU instead of CU, so that the amount of calculation can be reduced, avoiding an unnecessarily large increase in the amount of calculation.

According to an embodiment of the present invention, the final bit allocation scheme T can be designed as shown in equation (19) _CTU Referred to herein as a third target bit allocation for the CTU. Mu is a weight coefficient, the value of the weight coefficient is between 0 and 1, and the initial value is set to be 0.5, so that the error between an original image and a predicted image is considered, and the distribution condition of frame information and the ERP video weight are considered.

T _CTU ＝μ·T′ _CTU +(1-μ)·T″ _CTU (19)

According to a specific embodiment of the present invention, T is allocated according to the determined third target bit of each CTU _CTU To determine the quantization coefficients (QP) to be used in the encoding of the CTU.

According to an embodiment of the present invention, the coding parameters are updated according to the obtained target code rate, and in the code rate control model, according to the relationship between λ and R in equation (9), i.e. λ can be directly calculated from R through α and β. However, α and θ are parameters related to the characteristics of the content of the sequence, and the value of the different contents is significantly different. λ of the frame and CTU is calculated using equation (21).

λ＝αbpp _w ^β (21)

α _new ＝α+δ _α ×(lnλ _real -lnλ)×α (22)

β _new ＝θ+δ _β ×(lnλ _real -lnλ)×lnbpp _w (23)

Wherein the bit per pixel value bpp of each CTU is obtained according to equation (20) _w ，N _pi Is the number of pixels of the CTU, T _CTU Is the third target bit allocation for each CTU. According to a specific embodiment of the invention, the parameters α and β are different for each frame and each CTU. According to an embodiment of the present invention, α and β may be continuously updated using equations (22) and (23) during the encoding process in order to implement content adaptation. In the formulae (22) and (23), the parameters α and β are the original values, and the parameter α _new And beta _new For updated values, λ _real Is the original lagrangian factor actually used by the encoded frame, and λ is the lagrangian factor calculated according to equation (21). In one embodiment, δ _α And delta _β Constant, for example, it may take values of 0.1 and 0.05, respectively.

According to one embodiment of the present invention, the quantized coefficient (QP) of the CTU is determined using equation (24) when λ is determined, where c ₁ And c ₂ Are constant parameters.

QP＝c ₁ ×lnλ+c ₂ (24)

The calculated QP may then be used for the quantization reference of each CU in the current CTU.

Aiming at the technical problem in the CTU-level code rate control in the traditional method at present, the invention designs a code rate control method according to the CTU-level code rate control characteristic, and finally realizes bit distribution to the CTU according to weight.

Fig. 5 shows a flow chart of a video coding and decoding method for rate control at CTU level based on both information entropy and the difference between a predicted block and an original block at CTU level according to an embodiment of the present invention. The method can be applied to a video encoding end and can also be similarly applied to a video decoding end. In a particular embodiment, the method may be implemented by the processor of FIG. 6 executing executable instructions in a memory.

As shown in block 501, the method may include: a current frame is obtained, and a current Coding Tree Unit (CTU) of the current frame is obtained. In the example of VVC, the maximum size of the CTU is 128x128, and this maximum size is used by the present invention for detailed discussion, but those skilled in the art will readily appreciate that any other number of maximum CTU sizes may be used.

As shown in block 503, the method may include: a Coding Unit (CU) partition is performed on a current CTU, and a coding type is determined for each CU.

As shown in block 505, the method may include: prediction block information for each CU is obtained according to the determined coding type.

In blocks 503 and 505, in one embodiment, the type of encoding may be intra-coding or inter-coding. When intra-coded, prediction block information for a CU is formed from the coded blocks within the same frame as the current CTU. When inter-coded, a CU within one or more frames different from the current CTU and the corresponding one or more motion vectors are used to form prediction block information for the CU. In some examples, the prediction block information for a CU may be formed using unidirectional prediction or bi-directional prediction.

As shown in block 507, the method may include: and determining a first target bit allocation of the current CTU according to the prediction block information of each CU and the original block information of each CU in the current CTU. In a specific embodiment, equation (13) may be used to calculate the first target bit allocation for the current CTU.

In block 507, the first target bit allocation determination level is advantageously a CTU rather than a frame, thereby reducing the granularity of the target bit allocation determination so that the determination of the target bit allocation is more consistent with local features in the video frame.

In block 507, the first target bit allocation is advantageously determined at a level of CTU rather than CU, thereby reducing the amount of computation and avoiding unnecessarily large increases in the amount of computation

In block 507, advantageously, the first target bit allocation is made based on the degree of difference between the predicted and original frames of the current CTU, so that the determination of the target bit allocation for each CTU corresponds to the degree of coding distortion of that CTU, thereby enabling the determination of the target bit allocation for the CTU based on the adaptation of the actual intra-or inter-frame coding case.

According to a specific embodiment of the present invention, determining the first target bit allocation of the current CTU in block 507 may further comprise: determining the average absolute difference (MAD) of the current CTU according to the prediction block information of each CU and the original block information of each CU in the current CTU; determining bit allocation weights for the current CTU based on the determined MAD; determining a weight of the current CTU relative to all the unencoded CTUs based on the bit allocation weights of the current CTU and the bit allocation weights of all the unencoded CTUs; a first target bit of the current CTU is determined based on a weight of the current CTU relative to all uncoded CTUs. In this aspect, it is advantageous that the optimal target bit allocation result can be statistically obtained when determining the first target bit of the current CTU. In one particular embodiment, the MAD may be calculated using equation (14).

In block 507, determining a first target bit allocation for the current CTU is further based on the target bits of the current frame, according to a specific embodiment of the present invention.

In block 507, according to another aspect of the invention, determining the first target bit allocation for the current CTU is further based on the target bits of the current GOP, according to a specific embodiment of the invention.

As shown in block 509, the method may include: determining the information entropy of the current CTU; and determining a second target bit allocation of the current CTU according to the information entropy of the current CTU. In a particular embodiment, equation (16) may be used to calculate the first target bit allocation for the current CTU.

In block 509, the information entropy of the CTU is advantageously taken into account in the target bit allocation determination for the CTU, which, in addition to the element of the difference between the original frame and the predicted frame (referred to herein as the CTU), improves the accuracy of the target bit allocation determination.

In block 509, it is advantageous that the level of determination of the target bit allocation is CTU rather than frame, thereby reducing the granularity of the target bit allocation determination so that the determination of the target bit allocation is more consistent with local features in the video frame.

In block 509, it is advantageous that the determined level of target bit allocation is CTU rather than CU, so that the amount of computation can be reduced and unnecessarily large increases in computation are avoided.

In a particular embodiment, the video is a 360 degree video, and wherein determining the second target bit allocation for the current CTU is further based on an ERP video weight of the 360 degree video. In a particular embodiment, determining the second target bit allocation for the current CTU further takes into account a degree of distortion of the current CTU in the latitudinal direction. In a particular embodiment, determining the second target bit allocation for the current CTU is further based on: a metric corresponding to a latitudinal distortion of the current CTU in the 360 degree video. In a specific embodiment, the metric corresponding to the latitudinal distortion of the current CTU in the 360-degree video is based on an ERP video weight of the 360-degree video. Advantageously, the position of the current CTU in the latitudinal direction is taken into full account when determining the target bit allocation, thereby taking into account the degree of distortion of the current CTU in the latitudinal direction. In a specific embodiment, the information entropy considering the degree of distortion of the current CTU in the latitudinal direction may be calculated using equation (17).

As shown in block 511, the method may include: a third target bit allocation for the current CTU is determined based on a weighted average of the first target bit allocation and the second target bit allocation for the current CTU.

As shown in block 513, the method may include: determining a Quantization Parameter (QP) to be used for a current CTU according to a third target bit allocation for the CTU.

In a specific embodiment, determining a Quantization Parameter (QP) to be used for the current CTU according to a third target bit allocation of the CTU may further include: determining a bit per pixel value (bpp) of the current CTU according to a third target bit allocation of the current CTU; determining a Lagrangian factor of the current CTU according to a bit per pixel value (bpp) of the current CTU; the QP is determined according to the lagrangian factor of the current CTU. In one particular embodiment, equations (20) - (21) and (24) may be used to determine the QP.

As shown in block 515, the method may include: applying the QP for the current CTU to respective CUs in the current CTU.

In one embodiment, after prediction, such as intra prediction or inter prediction, residual data for a block may be calculated. One or more transforms may be applied to the residual block to produce transformed data in the transform domain, rather than the sample domain. Transform coefficients are generated after applying one or more transforms. After any transform to produce transform coefficients, quantization of the transform coefficients may be performed according to quantization coefficients (QPs). Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. By performing the quantization process, the bit depth associated with some or all of the coefficients may be reduced. For example, an n-bit value may be rounded to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, a bit-wise right shift of the value to be quantized may be performed. The quantized coefficients (QP) are typically included in the header information using a syntax element for the run.

In a preferred embodiment of the present invention, the coefficients used to compute the lagrangian factors are adaptively updated for each frame and each CTU. Advantageously, such real-time updating enables adaptive adjustment of the lagrangian factor calculation for the real-time content of the encoded frames and CTUs, thus optimizing the calculation of QP. In one particular embodiment, this adaptive adjustment may be performed using equations (22) and (23).

An apparatus usable for video coding is shown in fig. 6, the apparatus comprising: a processor and memory including processor executable code for implementing the various methods of the present invention in the memory.

According to another aspect, the present disclosure may also relate to an encoder for implementing the above-described encoding method. The encoder may be dedicated hardware. According to another aspect, the disclosure may also relate to a corresponding decoder for decoding an encoded video stream. According to another aspect, the present disclosure may also relate to a video codec for the above encoding method or decoding method.

According to a specific embodiment of the present invention, the device may be one or more of a System On Chip (SOC), a computer, a server, and a cloud server.

According to another aspect, the present disclosure may also relate to a computer program product for performing the methods described herein. According to a further aspect, the computer program product has a non-transitory storage medium having stored thereon computer code/instructions that, when executed by a processor, may implement the various operations described herein.

Although the above is mainly discussed with respect to VVC, it is easily understood by those skilled in the art that the present invention can be obviously applied to other video coding standards as long as the video coding standards adopt 360 video support and rate control measures similar to VVC.

In this disclosure, the terms "picture" and "frame" are often used interchangeably unless it is explicitly indicated that a certain feature or operation is specific to a "frame".

When implemented in hardware, the video encoder may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Additionally, at least one processor may include one or more modules operable to perform one or more of the steps and/or operations described above.

When implemented in hardware, the video encoder or the device containing the video codec may be a System On Chip (SOC).

When the video encoder is implemented in hardware circuitry, such as an ASIC, FPGA, or the like, it may include various circuit blocks configured to perform various functions. Those skilled in the art can design and implement the circuits in various ways to achieve the various functions disclosed herein, depending on various constraints imposed on the overall system.

While the foregoing disclosure discusses illustrative aspects and/or embodiments, it should be noted that many changes and modifications could be made herein without departing from the scope of the described aspects and/or embodiments as defined by the appended claims. Furthermore, although elements of the described aspects and/or embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated. Additionally, all or a portion of any aspect and/or embodiment may be utilized with all or a portion of any other aspect and/or embodiment, unless stated to the contrary.

Claims

1. A method for video coding, comprising:

obtaining a current Coding Tree Unit (CTU) of a current frame;

performing Coding Unit (CU) segmentation on the current CTU;

determining a coding type for each CU, and obtaining prediction block information for each CU according to the determined coding type;

determining first target bit allocation of the current CTU according to prediction block information of each CU and original block information of each CU in the current CTU;

determining the information entropy of the current CTU;

determining a second target bit allocation of the current CTU according to the information entropy of the current CTU;

determining a third target bit allocation for the current CTU based on a weighted average of the first target bit allocation and the second target bit allocation for the current CTU;

determining a quantization parameter QP to be used for the CTU according to a third target bit allocation of the current CTU;

applying the QP for the current CTU to respective CUs in the current CTU.

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of a red light source, a green light source, and a blue light source,

wherein the video is a 360 degree video, and wherein determining the second target bit allocation for the current CTU further takes into account a degree of distortion of the current CTU in a latitudinal direction.

3. The method according to claim 1 or 2,

wherein determining the first target bit allocation for the current CTU is further based on target bits of the current frame.

4. The method of any one of claims 1 or 2,

wherein determining the first target bit allocation for the current CTU is further based on target bits of a current GOP.

5. The method of any one of claims 1 or 2,

wherein determining the first target bit allocation of the current CTU according to the prediction block information of each CU and the original block information of each CU in the current CTU further comprises:

determining an average absolute difference (MAD) of the current CTU according to prediction block information of each CU and original block information of each CU in the current CTU;

determining a bit allocation weight of the current CTU based on the determined MAD;

determining a weight of the current CTU relative to all uncoded CTUs based on bit allocation weights of the current CTU and bit allocation weights of all uncoded CTUs;

determining a first target bit of the current CTU based on a weight of the current CTU relative to all uncoded CTUs.

6. The method of any of claims 1 or 2, wherein determining a quantization parameter, QP, to be used for the current CTU according to a third target bit allocation for the CTU further comprises:

determining a bit per pixel value bpp of the current CTU according to a third target bit allocation of the current CTU;

determining a Lagrangian factor of the current CTU according to a bit per pixel value bpp of the current CTU;

determining the QP according to a Lagrangian factor of the current CTU.

7. A computing device capable of performing video coding, comprising:

a memory storing executable code for video coding;

one or more processors or video codecs to execute executable code stored in memory for video codec to perform the following video encoding operations:

obtaining a current Coding Tree Unit (CTU) of a current frame;

performing Coding Unit (CU) segmentation on the current CTU;

determining a coding type for each CU, and obtaining prediction block information of each CU according to the determined coding type;

determining the information entropy of the current CTU;

applying the QP for the current CTU to respective CUs in the current CTU.

8. The computing device of claim 7, wherein the computing device,

wherein the computing device is one or more of a system on a chip (SOC), a computer, a server, a cloud server, a programmable logic device.

9. A computer readable medium having stored thereon processor-executable instructions that, when executed by a processor, implement video encoding operations comprising:

obtaining a current Coding Tree Unit (CTU) of a current frame;

performing Coding Unit (CU) segmentation on the current CTU;

determining a first target bit allocation of the current CTU according to the prediction block information of each CU and the original block information of each CU in the current CTU;

determining the information entropy of the current CTU;

applying the QP for the current CTU to respective CUs in the current CTU.