CN113453019A

CN113453019A - Method and apparatus for decoding video data

Info

Publication number: CN113453019A
Application number: CN202110709660.7A
Authority: CN
Inventors: V.谢廖金; 赵欣; 陈建乐; A.赛义德; M.卡切维奇
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2016-05-03
Filing date: 2017-05-03
Publication date: 2021-09-28
Also published as: CN109076230A; JP6960943B2; KR102575798B1; EP3453176B1; US20200236030A1; US10708164B2; US11496385B2; KR20190003950A; CN109076230B; CA3018197A1; JP2019515561A; EP3453176A1; TWI755394B; TW201742458A; BR112018072617A2; WO2017192705A1; US20170324643A1

Abstract

This disclosure provides an example device for decoding video data, comprising: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a maximum possible value for a quadratic transform syntax element for a block of video data; entropy decoding a value of the quadratic transform syntax element of the block to form a binarized value representing a quadratic transform for the block; inverse binarizing the value of the quadratic transform syntax element using a common binarization scheme to determine the quadratic transform for the block regardless of the maximum possible value; and inverse transforming transform coefficients of the block using the determined quadratic transform.

Description

Method and apparatus for decoding video data

The application is a divisional application of an invention patent application with the application date of 2017, 5 and 3, and the application number of 201780026951.8, and the name of the invention of a method and a device for decoding video data.

This application claims the benefit of each of the following U.S. provisional applications:

united states provisional application No. 62/331,290, filed on 5/3/2016;

united states provisional application No. 62/332,425, filed 5/2016;

united states provisional application No. 62/337,310, filed on 5, 16, 2016;

united states provisional application No. 62/340,949, filed on 24/5/2016; and

united states provisional application No. 62/365,853 filed on 2016, 7, 22,

the entire contents of each of the U.S. provisional applications are hereby incorporated by reference.

Technical Field

The present disclosure relates to video coding.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones (so-called "smart phones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard, and extensions of these standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing these video coding techniques.

Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent to video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as Coding Tree Units (CTUs), Coding Units (CUs) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in inter-coded (P or B) slices of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture, or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Spatial or temporal prediction results in a predictive block for the block to be coded. The residual data represents the pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, producing residual transform coefficients, which may then be quantized. Quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.

Disclosure of Invention

In general, techniques are described related to entropy coding (encoding or decoding) a quadratic transform syntax element of a block of video data. The quadratic transform syntax elements may include, for example, non-separable quadratic transform (NSST) syntax elements, rotational transform syntax elements, and so on. In general, entropy coding of these syntax elements may include binarization or inverse binarization. The binarization or inverse binarization scheme may be unified such that the same binarization or inverse binarization scheme is applied regardless of a maximum possible value of the quadratic transform syntax element. The techniques of this disclosure may further include coding (encoding or decoding) the signaling unit syntax element, where the signaling unit may include two or more neighboring blocks. The signaling unit syntax elements may precede each of the blocks, or be placed immediately before (in coding order) the block to which the signaling unit syntax elements are applied.

In one example, a method of decoding video data includes: determining a maximum possible value for a quadratic transform syntax element for a block of video data; entropy decoding a value of the quadratic transform syntax element of the block to form a binarized value representing a quadratic transform for the block; inverse binarizing the value of the quadratic transform syntax element using a common inverse binarization scheme to determine the quadratic transform for the block regardless of the maximum possible value; and inverse transforming transform coefficients of the block using the determined quadratic transform.

In another example, a device for decoding video data includes: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a maximum possible value for a quadratic transform syntax element for a block of video data; entropy decoding a value of the quadratic transform syntax element of the block to form a binarized value representing a quadratic transform for the block; inverse binarizing the value of the quadratic transform syntax element using a common binarization scheme to determine the quadratic transform for the block regardless of the maximum possible value; and inverse transforming transform coefficients of the block using the determined quadratic transform.

In another example, a device for decoding video data includes: means for determining a maximum possible value for a quadratic transform syntax element for a block of video data; means for entropy decoding a value of the quadratic transform syntax element for the block to form a binarized value representing a quadratic transform for the block; means for inverse binarizing the value of the quadratic transform syntax element using a common inverse binarization scheme to determine the quadratic transform for the block regardless of the maximum possible value; and means for inverse transforming transform coefficients of the block using the determined quadratic transform.

In another example, a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium) has stored thereon instructions that, when executed, cause one or more processors to: determining a maximum possible value for a quadratic transform syntax element for a block of video data; entropy decoding a value of the quadratic transform syntax element of the block to form a binarized value representing a quadratic transform for the block; inverse binarizing the value of the quadratic transform syntax element using a common inverse binarization scheme to determine the quadratic transform for the block regardless of the maximum possible value; and inverse transforming transform coefficients of the block using the determined quadratic transform.

In another example, a method of encoding video data includes: transforming intermediate transform coefficients of the block of video data using a quadratic transform; determining a maximum possible value for a quadratic transform syntax element for the block, a value of the quadratic transform syntax element representing the quadratic transform; binarizing the values of the quadratic transform syntax elements using a common binarization scheme regardless of the maximum possible value; and entropy encoding the binarized value for the quadratic transform syntax element for the block to form a binarized value representing the quadratic transform for the block.

In another example, a device for encoding video data includes: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: transforming intermediate transform coefficients of the block of video data using a quadratic transform; determining a maximum possible value for a quadratic transform syntax element for the block, a value of the quadratic transform syntax element representing the quadratic transform; binarizing the values of the quadratic transform syntax elements using a common binarization scheme regardless of the maximum possible value; and entropy encoding the binarized value for the quadratic transform syntax element for the block to form a binarized value representing the quadratic transform for the block.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram depicting an example video encoding and decoding system that may utilize techniques for binarizing quadratic transform indices.

Fig. 2 is a block diagram showing an example of a video encoder that may implement techniques for binarizing a quadratic transform index.

Fig. 3 is a block diagram of an example entropy encoding unit that may be configured to perform CABAC in accordance with the techniques of this disclosure.

Fig. 4 is a block diagram showing an example of a video decoder that may implement techniques for binarizing a quadratic transform index.

Fig. 5 is a block diagram of an example entropy encoding unit that may be configured to perform CABAC in accordance with the techniques of this disclosure.

Fig. 6 is a flow diagram depicting an example method of encoding video data, in accordance with the techniques of this disclosure.

Fig. 7 is a flow diagram depicting an example of a method of decoding video data, in accordance with the techniques of this disclosure.

Detailed Description

Video coding standards include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262, or ISO/IEC MPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, ITU-T H.264 (also known as ISO/IEC MPEG-4AVC (advanced video coding)), ITU-T H.265 (also known as HEVC or "high efficiency video coding"), including extensions such as Scalable Video Coding (SVC), Multiview Video Coding (MVC), and Screen Content Coding (SCC). The techniques of this disclosure may be applied to these or future video coding standards, such as the joint video exploration group (jfet) test model (which may also be referred to as joint exploration model-JEM), which is experiencing development activities other than HEVC. Video coding standards also include proprietary video codecs such as Google VP8, VP9, VP10, and video codecs developed by other organizations (e.g., the open media alliance).

In the jfet test model, there is an intra prediction method called position dependent intra prediction combining (PDPC). The JFET test model also includes a non-separable quadratic transformation (NSST) tool. Both the PDPC tool and the NSST tool use syntax elements (e.g., indices) to indicate whether the corresponding tool is applied and which variation is used. For example, an index of 0 may mean that no tools are used.

The maximum number of NSST indices for a block of video data may depend on the intra-prediction mode or partition size of the block. In one example, if the intra prediction mode is PLANAR or DC and the partition size is 2N × 2N, then the maximum number of NSST indices is 3, otherwise the maximum number of NSST indices is 4. Under the jfet test model, two types of binarization were used to represent NSST indices. In the jfet test model, truncated unary binarization is used if the maximum value is 3, otherwise fixed binary binarization is applied. In the jfet test model, no NSST is applied and no NSST index is signaled if the PDPC index is not equal to 0.

This disclosure describes a variety of techniques that may be applied, alone or in any combination, to improve coding of, for example, NSST syntax elements, such as NSST indices and/or NSST flags. For example, these techniques may improve the operation of a video encoder/video decoder, and thereby improve bitstream efficiency, in that these techniques may reduce the bitrate of the bitstream relative to current jfet test models.

Fig. 1 is a block diagram showing an example video encoding and decoding system 10 that may utilize techniques for binarizing quadratic transform indices. As shown in fig. 1, system 10 includes a source device 12, source device 12 providing encoded video data to be decoded by a destination device 14 at a later time. In particular, source device 12 provides video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" tablets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and so forth. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the internet. The communication medium may include a router, switch, base station, or any other equipment that may be useful for facilitating communication from source device 12 to destination device 14.

In some examples, the encoded data may be output from output interface 22 to a storage device. Similarly, encoded data may be accessed from the storage device by the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In a further example, the storage device may correspond to a file server or another intermediate storage device that may store the encoded video generated by source device 12. Destination device 14 may access stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 14 may access the encoded video data via any standard data connection, including an internet connection. Such a connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions, such as dynamic adaptive streaming over HTTP (DASH), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes input interface 28, video decoder 30, and display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply the techniques for binarizing a quadratic transform index. In other examples, the source device and the destination device may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device.

The depicted system 10 of FIG. 1 is merely one example. The techniques for binarizing the quadratic transform index may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are typically performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, commonly referred to as a "CODEC". Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices that generate coded video data for source device 12 for transmission to destination device 14. In some examples,

devices

12, 14 may operate in a substantially symmetric manner such that each of

devices

12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between

video devices

12, 14, e.g., for video streaming processing, video playback, video broadcasting, or video telephony.

Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface to receive video from a video content provider. As a further alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 onto computer-readable medium 16.

Computer-readable medium 16 may include: transitory media, such as wireless broadcast or wired network transmission; or a storage medium (i.e., a non-transitory storage medium) such as a hard disk, a flash drive, a compact disc, a digital video disc, a blu-ray disc, or other computer-readable medium. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, e.g., transmit over a network. Similarly, a computing device of a media production facility (e.g., a disc stamping facility) may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.

Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units. Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the High Efficiency Video Coding (HEVC) standard, also known as ITU-T h.265. Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T h.264 standard (alternatively referred to as MPEG-4), Part 10, Advanced Video Coding (AVC), or extensions of these standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. Where applicable, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (codec) in a respective device.

In general, according to ITU-T h.265, a video picture may be divided into a series of Coding Tree Units (CTUs) (or Largest Coding Units (LCUs)) that may include both luma samples and chroma samples. Alternatively, the CTU may include monochrome data (i.e., only luma samples). Syntax data within the bitstream may define a size of a CTU, which is the largest coding unit in terms of a number of pixels. A slice includes a number of consecutive CTUs in coding order. A video picture may be partitioned into one or more slices. Each CTU may be split into Coding Units (CUs) according to a quad-tree. In general, a quad-tree data structure contains one node per CU, with the root node corresponding to the CTU. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of the four leaf nodes corresponding to one of the sub-CUs.

Each node of the quad-tree data structure may provide syntax data for the corresponding CU. For example, a node in a quadtree may include a split flag that indicates whether the CU corresponding to the node is split into sub-CUs. Syntax elements for a CU may be defined recursively and may depend on whether the CU is split into sub-CUs. If a CU is not split further, it is called a leaf CU. In the present invention, four sub-CUs of a leaf CU will be referred to as leaf CUs even if there is no explicit split of the original leaf CU. For example, if a CU at size 16 × 16 is not further split, then the four 8 × 8 sub-CUs will also be referred to as leaf CUs, but the 16 × 16CU is never split.

A CU has a similar purpose to a macroblock of the h.264 standard, except that a CU does not have a size difference. For example, a CTU may be split into four child nodes (also referred to as child CUs), and each child node may in turn be a parent node and split into another four child nodes. The final non-split child nodes, referred to as leaf nodes of the quadtree, comprise coding nodes, also referred to as leaf-CUs. Syntax data associated with a coded bitstream may define the maximum number of times a CTU may be split (referred to as the maximum CU depth), and may also define the minimum size of the coding node. Thus, the bitstream may also define a minimum coding unit (SCU). This disclosure uses the term "block" to refer to any of a CU, Prediction Unit (PU), or Transform Unit (TU) in the context of HEVC, or similar data structures (e.g., macroblocks and sub-blocks thereof in h.264/AVC) in the context of other standards.

A CU includes a coding node and Prediction Units (PUs) and Transform Units (TUs) associated with the coding node. The size of a CU corresponds to the size of a coding node and is substantially square in shape. The size of a CU may range from 8 × 8 pixels up to the size of the CTU having the largest size, e.g., 64 × 64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning the CU into one or more PUs. The partition mode may be different between whether the CU is skipped or direct mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The PU may be partitioned into non-square shapes. Syntax data associated with a CU may also describe partitioning the CU into one or more TUs, e.g., according to a quadtree. The TU may be square or non-square (e.g., rectangular) in shape.

The HEVC standard allows for a transform according to a TU, which may be different for different CUs. A TU is typically sized based on the size of a PU within a given CU defined for a partitioned CTU, although this may not always be the case. TUs are typically the same size or smaller than a PU. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). The pixel difference values associated with TUs may be transformed to produce transform coefficients that may be quantized.

A leaf CU may include one or more Prediction Units (PUs). In general, a PU represents a spatial region corresponding to all or part of a corresponding CU, and may include data used to retrieve and/or generate reference samples for the PU. In addition, the PU contains data related to prediction. For example, when a PU is intra-mode encoded, data for the PU may be included in a residual quad-tree (RQT), which may include data describing an intra-prediction mode for a TU corresponding to the PU. The RQT may also be referred to as a transform tree. In some examples, the intra-prediction mode may be signaled in the leaf CU syntax instead of the RQT. As another example, when a PU is inter-mode encoded, the PU may include data defining motion information (e.g., one or more motion vectors) for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list for the motion vector (e.g., list 0, list 1, or list C).

A leaf-CU having one or more PUs may also include one or more Transform Units (TUs). The transform units may be specified using RQTs (also referred to as TU quad-tree structures), as discussed above. For example, the split flag may indicate whether a leaf CU is split into four transform units. Each transform unit may then be further split into additional sub-TUs. When a TU is not further split, it may be referred to as a leaf-TU. Typically, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, the same intra-prediction mode is typically applied to calculate predicted values for all TUs of a leaf-CU. For intra coding, a video encoder may use an intra-prediction mode to calculate a residual value for each leaf-TU as the difference between the portion of the CU corresponding to the TU and the original block. TUs are not necessarily limited to the size of a PU. Thus, TU may be larger or smaller than PU. For intra coding, a PU may be co-located with a corresponding leaf-TU for the same CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.

Furthermore, the TUs of a leaf-CU may also be associated with a respective quad-tree data structure, referred to as a residual quad-tree (RQT). That is, a leaf-CU may include a quadtree that indicates how the leaf-CU is partitioned into TUs. The root node of a TU quad-tree typically corresponds to a leaf CU, while the root node of a CU quad-tree typically corresponds to a CTU (or LCU). TUs of an RQT that are not split are referred to as leaf-TUs. In general, unless otherwise noted, this disclosure uses the terms CU and TU to refer to leaf-CU and leaf-TU, respectively.

A video sequence typically comprises a series of video frames or pictures that start with a Random Access Point (RAP) picture. A video sequence may include syntax data in a Sequence Parameter Set (SPS) that represents characteristics of the video sequence. Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes, and their sizes may differ according to a specified coding standard.

As an example, prediction may be performed for PUs of various sizes. Assuming that the size of a particular CU is 2 nx 2N, intra prediction may be performed on PU sizes of 2 nx 2N or nxn, and inter prediction may be performed on symmetric PU sizes of 2 nx 2N, 2 nx N, N x 2N, or nxn. Asymmetric partitioning for inter prediction may also be performed for PU sizes of 2 nxnu, 2 nxnd, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a 2N × 2N CU that is horizontally partitioned such that 2N × 0.5N PUs are on top and such that 2N × 1.5N PUs are on bottom.

In this disclosure, "N × N" and "N by N" may be used interchangeably to refer to the pixel size of a video block in the vertical and horizontal dimensions, e.g., 16 × 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.

After using intra-predictive or inter-predictive coding of PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. The PU may comprise syntax data describing a method or mode of generating predictive pixel data in the spatial domain, also referred to as the pixel domain, and the TU may comprise coefficients in the transform domain after applying a transform, such as a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform, to the residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form the TUs to include quantized transform coefficients that represent residual data for the CU. That is, video encoder 20 may calculate residual data (in the form of residual blocks), transform the residual blocks to generate blocks of transform coefficients, and then quantize the transform coefficients to form quantized transform coefficients. Video encoder 20 may form TUs that include the quantized transform coefficients, as well as other syntax information (e.g., split information for the TUs).

As mentioned above, video encoder 20 may perform quantization of the transform coefficients after any transform used to generate the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be downscaled to an m-bit value during quantization, where n is greater than m.

After quantization, the video encoder may scan the transform coefficients, producing a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients in front of the array, and lower energy (and therefore higher frequency) coefficients behind the array. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

To perform CABAC, video encoder 20 may assign contexts within the context model to symbols to be transmitted. The context may be related to, for example, whether neighboring values of the symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings over, for example, using codewords of equal length for each symbol to be transmitted. The probability determination may be based on the context assigned to the symbol.

In general, video decoder 30 performs a substantially similar, but reciprocal, process to that performed by video encoder 20 to decode the encoded data. For example, video decoder 30 inverse quantizes and inverse transforms the coefficients of the received TU to reproduce the residual block. Video decoder 30 uses the signaled prediction mode (either intra-prediction or inter-prediction) to form a predicted block. Video decoder 30 then combines the predicted block and the residual block (pixel-by-pixel) to reproduce the original block. Additional processing may be performed, such as performing a deblocking process to reduce visual artifacts along block boundaries. In addition, video decoder 30 may use CABAC to decode syntax elements in a substantially similar, but reciprocal manner to the CABAC encoding process of video encoder 20.

In accordance with the techniques of this disclosure, a video coder (e.g., video encoder 20 or video decoder 30) may unify binarization of NSST syntax elements. For example, a video coder may be configured to use only one binarization (e.g., truncate or truncate unary binarization). For blocks in which NSST syntax elements are coded, the maximum value of the NSST syntax element may be defined (and thus determined by the video coder) according to the intra mode and optionally according to the block size condition. For example, the video coder may apply truncated unary binarization for NSST indices, where the maximum value equals 3 if the current intra mode is non-angular (e.g., PLANAR or DC for the chroma components, or optionally LM mode), otherwise the maximum value equals 4. In addition, the video coder may apply a block size condition. For example, the video coder may determine: if the current block is square or width x height is less than some threshold (e.g., 64), then the maximum value is equal to 3.

In one example, the video coder may context-entropy code each bin or only certain predetermined bins (e.g., the ordinal first number of bins) from the binarized codeword. The video coder may entropy code bins other than the predetermined bin without employing context modeling (e.g., in bypass mode). If NSST is applied separately to luma and chroma, context modeling may be applied separately for luma and chroma. Alternatively, bins from the binarized codeword may share the context for luma and chroma, for example, the context for the first bin indicating whether the NSST exponent is 0 (meaning that NSST is not applied) may be shared between the luma and chroma components, and other bins may have separate contexts for luma and chroma.

In another example, the context modeling for the NSST index may depend on the maximum value the NSST index may have. For example, if the maximum value may be 3 or 4, one context set may be used to signal an NSST index for a maximum value of 3 and another context set may be used to signal an NSST index for a maximum value of 4. Similar context sets may be defined for other maxima that the NSST index may have, and more than two maxima may be used.

Optionally, the context for the first bin (which indicates that the NSST index is equal to 0 or not equal to 0) may be shared across all sets of contexts, or may be shared across sets of contexts corresponding to the same color component (e.g., for luma, chroma or both chroma components, or all color components).

In the current jvt test model, no NSST is applied and no NSST index is signaled if the PDPC index is not equal to 0. This process of avoiding NSST and not signaling NSST indices may reduce coding complexity. However, this disclosure recognizes that the processes currently implemented in the jfet test model do not necessarily achieve the best coding results and may not achieve the desired tradeoff between coder complexity and bit rate.

According to the techniques of this disclosure, when the NSST index of a block has a non-zero value (i.e., in other words, the NSST method is applied to the current block), a video coder (e.g., video encoder 20 or video decoder 30) need not apply and/or code (e.g., signal) a position-dependent intra prediction combination (PDPC) syntax element for the block. This may cause similar decoder complexity, but the resulting compression efficiency may be higher, since the NSST method generally has better efficiency compared to PDPC. In this case, the PDPC index may be signaled at a location in the bitstream that follows the NSST index.

Additionally or alternatively, the NSST index context may be based on the PDPC index. For example, if the PDPC index is 0, one context may be used to entropy code the NSST index, and if the PDPC index is not 0, another context may be used to entropy code the NSST index. In another example, each PDPC index may have its own context used to entropy code the NSST index. Additionally or alternatively, the context of the NSST index may jointly depend on the PDPC index of the current block and other elements, such as prediction mode, block size, and so on. Similarly, the context of the PDPC index may jointly depend on the NSST index of the current block and other elements, such as prediction mode, block size, and so on.

Alternatively, the same method may be applied if the NSST index is coded in the bitstream before the PDPC index. In this case, in the above method, NSST and PDPC are exchanged in the description. For example, if the NSST index is 0, one context may be used to entropy code the PDPC index, and if the NSST index is not 0, another context may be used to entropy code the PDPC index. In another example, each NSST index may have its own context used to entropy code the PDPC index. Additionally or alternatively, the context of the PDPC index may jointly depend on the NSST index of the current block and other elements, such as prediction mode, block size, and so on. Similarly, the context of the NSST index may jointly depend on the PDPC index of the current block and other elements, such as prediction mode, block size, and so on.

The PDPC techniques mentioned herein may be extended to any other techniques related to intra/inter prediction techniques, and/or the NSST techniques mentioned herein may be extended to any techniques related to transform techniques. Syntax element (index/flag/mode) signaling for prediction techniques may interact with syntax element (index/flag/mode) signaling for transform techniques. The interaction may be, but is not limited to, the context of the prediction technology syntax depending on the context of the transform technology syntax, or vice versa.

In addition, the video coder may be configured to apply the techniques discussed above to other coding modes, including but not limited to PDPC or Motion Parameter Inheritance (MPI) modes.

The NSST index may be signaled and shared for multiple components. For example, one NSST index may be signaled and shared for a luma (Y) component, a blue-tone chroma (Cb) component, and a red-tone chroma (Cr) component. Alternatively, one NSST index may be signaled and shared for Cb and Cr components (a separate NSST index may be signaled for Y component). In some examples, NSST index signaling is dependent on some conditions when one NSST index is shared for multiple components, and the NSST index is not signaled but is derived as a default value (e.g., 0) when these conditions are met for each of the included components, or when these conditions are met for several (not all) of the included components, or when these conditions are met for any included components.

These conditions may include, but are not limited to: the number of non-zero coefficients (or the sum of the absolute values of the non-zero coefficients) when a block is not coded by certain coding modes, and these certain coding modes include, but are not limited to, transform skip mode and/or LM mode and/or cross component prediction mode.

The block in the above example may be a block for each component considered independently, or it may be a related block for some color components (e.g., a related block for Cb and Cr), or it may be a block for all available components (e.g., a block for Y, Cb and Cr). In one example, the conditions may be jointly applied to those blocks together.

For example, when a condition applies to multiple components (e.g., Cb and Cr), then the condition may include, but is not limited to, the sum of the number of non-zero coefficients (or the sum of the absolute values of the non-zero coefficients) for each included component block not being coded by certain coding modes, and these certain coding modes include, but are not limited to, transform skip mode and/or LM mode and/or cross component prediction mode, among others.

In some examples, when multiple NSST indices are signaled, and each NSST index is signaled for one or more components, the multiple NSST indices may be jointly binarized into one syntax element, and one binarization and/or context modeling may be applied to this jointly coded one syntax element. For example, a flag may first be coded to indicate whether there is at least one non-zero NSST index (meaning that NSST is applied to at least one component). After the flag, multiple NSST indices are binarized into one syntax element and coded. Some signaling redundancy may be removed in this example. For example, if the flag indicates that there is at least one non-zero NSST index, the last signaled NSST index may be inferred to be non-zero if all previous indices have values equal to 0.

In the above example, a joint NSST index signaling technique may be applied to signal the NSST index for a group of blocks. A flag may be signaled for a group to indicate that there is at least one block using a non-zero NSST index (in this case, the flag is equal to 1), or that all blocks have a zero NSST index (in this case, the flag is equal to 0). Signaling redundancy may also be removed for the last NSST index in the group, taking into account that the last NSST index cannot equal 0. In another example, if only two NSST indices (0 or 1) are possible, the last index may not be signaled if all previous indices are equal to 0, which may be inferred to be equal to 1. In another example, if more than two NSST exponent values are possible, the last exponent may be reduced by 1 if all previous exponents are equal to 0.

The above techniques may be used in any combination.

The NSST index is used as an example. The same techniques may be applied to any transform or quadratic transform index, flag, or syntax element signaling. For example, these techniques may be applied to signaling rotation transform (ROT) indices.

Likewise, the PDPC index is also used as an example. The same techniques may be applied to any intra or inter prediction index, flag, or syntax element signaling. For example, these techniques may be applied to signal a Motion Parameter Inheritance (MPI) index.

In some examples, video encoder 20 and/or video decoder 30 may perform transform-related syntax coding (e.g., encoding/signaling or decoding/interpretation) at a special structural unit, which may be referred to as a Signaling Unit (SU). Generally, a signaling unit comprises a plurality of blocks. For example, the signaling unit may correspond to a single quad-binary tree (QTBT) of a QTBT architecture. Alternatively, the signaling units may correspond to groups of blocks, each of the blocks corresponding to a different respective QTBT.

In the QTBT architecture, the signaling unit may be partitioned according to a polytype tree comprising a first portion partitioned according to a quadtree (in which each node is partitioned into zero or four child nodes), each leaf node of the quadtree being further partitioned using a binary tree partition (in which each node is partitioned into zero or two child nodes). Each node that is partitioned into zero child nodes is considered a leaf node of the corresponding tree.

As discussed above, various syntax elements (e.g., NSST index, PDPC index, prediction mode, block size, etc.) may be jointly signaled for a group of blocks. Such joint signaling may be generally described as "signaling data at the signaling unit level," where a signaling unit includes multiple blocks to which data is signaled at the signaling unit level, and this data is applied to each block included in the signaling unit.

A problem may arise when the signaling units form part of a non-I slice, such as a P slice or a B slice. In these or other non-I slices, the slices may include some blocks predicted using intra-mode and other blocks predicted using inter-mode. However, some tools may be applied to only one of intra or inter modes, but not both. Thus, signaling some syntax at the signaling unit level for mixed blocks (intra versus inter) may be inefficient, especially when the tools are not applied to a certain prediction mode.

Accordingly, this disclosure also describes a variety of techniques that may be used alone or in combination with each other and/or with the techniques discussed above. Certain techniques of this disclosure may be applied to resolve a mix of inter-predicted blocks and intra-predicted blocks in non-I slices, yet still have signaling for signaling unit blocks. The video coder may use blocks arranged in the signaling unit in such a way that the signaling unit contains only blocks affected by signaling performed at the signaling unit level.

For example, the transform may be of two types: a first (or first) transformation and a second transformation. According to the jfet model, the first transform may be a Discrete Cosine Transform (DCT) or an Enhanced Multiple Transform (EMT), and the second-order transforms may be, for example, NSST and ROT. It should be understood that DCT, EMT, NSST, and ROT are merely examples, and the techniques of this disclosure are not limited to these transforms, but other transforms may also be used (in addition or in the alternative).

Assuming, for purposes of example, that an EMT flag or EMT index is signaled at the signaling unit level, those syntax elements have values that identify which particular transform is used for the block included in the signaling unit. Blocks may be predicted in intra, inter, or skip mode. The signaled EMT flag or EMT index may be valid for intra-predicted blocks, but may be less valid or inefficient for inter-predicted blocks. In this case, the signaling unit may further include any one or both of the following types of blocks: 1) intra-predicted blocks and skipped predicted blocks; and/or 2) inter-predicted blocks and skipped predicted blocks.

According to this example, transform-related syntax signaled at the signaling unit level would be valid for intra-coded blocks, but the skip mode is based on the assumption that the residue is 0 and no transform is needed, so the signaled transform would not affect the skipped prediction block and there would be no inter-coded blocks in this signaling unit block. Similarly, according to the signaling unit composition, transform-related syntax signaled at the signaling unit level for inter-predicted blocks will be valid for inter-predicted blocks, but it does not affect skip mode, and no intra-coded block will be present in this signaling unit block.

By arranging signaling units in accordance with the techniques of this disclosure, certain syntax elements may become redundant. In the above example, it is clear that the prediction mode is not needed if the signaling unit type (#1 or #2) is signaled at the signaling unit level in addition to the transform syntax elements. In this case, the prediction mode need not be signaled for each block included in the signaling unit, and the prediction mode can be inferred from the signaling unit type. In one example, the signaling unit type may be signaled as that syntax element having a context specific to the separate syntax element, or the prediction mode syntax element may be reused and signaled to indicate the signaling unit type.

As another example, the signaling units may include blocks arranged according to either or both of the following arrangements: 1) an intra-predicted block, a skipped predicted block, and an inter-predicted block with residual equal to 0 (zero block); and/or 2) inter-predicted blocks, skipped predicted blocks, and intra-predicted blocks with zero residue.

In the first example discussed above, the Coded Block Flag (CBF) syntax element (indicating whether the block includes zero residual, i.e., whether the block includes one or more non-zero residual values, i.e., whether the block is "coded") need not be signaled as an inter-predicted block for signaling unit type 1, and need not be signaled for an intra-predicted block for signaling unit type 2, since only zero blocks are possible.

In yet another example, the signaling unit may be constructed as follows: (1) an intra-predicted block, a skipped predicted block, and an inter-coded block (zero block) with residual equal to 0, and a block coded with transform skipping; and/or (2) inter-predicted blocks, skipped predicted blocks, and intra-predicted blocks with zero residual, as well as blocks coded with transform skipping.

Similarly, as in the above example, the CBF syntax elements need not be signaled in terms of blocks included in the signaling unit.

In the above example, the signaling unit blocks are classified into two types: an "intra-frame correlation" type and an "inter-frame correlation" type. However, it may still be possible that a mixture of intra and inter blocks may share similar tool decisions, for example, the transform type may be the same for both types of predicted blocks. The signaling unit types can then be further extended to the following three types: (1) an intra-predicted block, and an inter-predicted block with zero residual (a skipped block, an inter block with zero residual, or a transformed skipped inter block); (2) inter-predicting a block, and an intra block with zero residual, or transform-skipping an intra block; and (3) allow inter-frame to intra-frame blending without limitation.

In this example, some redundant syntax elements may not need to be signaled in terms of blocks used for signaling unit types 1 and 2 (i.e., within each block included in the signaling unit), such as prediction mode or CBF syntax. Alternatively, video encoder 20 may encode those syntax elements once at the signaling unit level and video decoder 30 may decode those syntax elements once at the signaling unit level, and the coded values may be applied to each block included in the signaling unit.

In the above examples, EMT or the first transform is used as an example. In a similar manner, a secondary transform (e.g., NSST or ROT) may be signaled at the signaling unit level, and redundant syntax elements (e.g., prediction mode or CBF syntax) may be signaled at the signaling unit level, and those elements need not be signaled at the block level.

Video encoder 20 and video decoder 30 may transform the decision-related syntax elements using context modeling with context coding (e.g., using CABAC). Transform-related syntax elements, such as flags or indices from a transform set, such as, but not limited to, EMT flags, NSST flags, EMT indices, NSST indices, and so forth, may be context coded. The context may be defined according to the number of non-zero transform coefficients in the block, the absolute sum of the non-zero transform coefficients, and/or the position of the non-zero transform coefficients inside the TU (e.g., whether there is only one non-zero DC coefficient).

In addition, the number of non-zero coefficients may be classified into some sub-groups; for example, the number of non-zero coefficients within a range is one subgroup, another range of values is another subgroup, and so on. The context may be defined in terms of subgroups.

Additionally, the context may be defined based on the position of the last non-zero coefficient in the block, may also be defined based on the first non-zero coefficient in the block, and/or may additionally be defined based on the value of the last and/or first coefficient in the block or its sign (negative or positive).

The number of non-zero coefficient signallings is described below. Currently, in HEVC or jfet, the last non-zero coefficient and the position of the significance map (e.g., 0-coefficient is zero, 1-coefficient is non-zero, or vice versa) are signaled for the transform coefficients to indicate which coefficients are non-zero until the last non-zero coefficient.

However, if the block has only a few coefficients, the current signaling of jfet and HEVC may not be valid. For example, if a transform block has only one non-zero coefficient and that coefficient is not in the beginning of the block, then the last position has indicated the position of that coefficient; however, a significance map is still signaled, which contains all zeros.

This disclosure also describes techniques related to signaling an additional syntax element having a value indicating a number of non-zero coefficients in a transform block. Video encoder 20 may signal the value of this syntax element, and video decoder 30 may decode the value of this syntax element to determine the number of non-zero transform coefficients in the transform block. This syntax element value may be signaled using any binarization, such as unary code, truncated unary code, golomb code, exponential golomb code, rice code, fixed-length binary code, truncated binary code, and so on. For truncated binarization, the maximum element may be the number of possible coefficients up to the last position coefficient.

In one example, this new syntax element may be signaled after the last non-zero coefficient number used for the transform block. In another example, this new syntax element may be signaled before the last non-zero coefficient. In the latter case, the flag may indicate whether the block has only one DC coefficient.

Because of the signaling of the last non-zero coefficient and the number of non-zero coefficients, the techniques of this disclosure may result in a reduction in the size of the coded significance map that forms part of the bitstream. For example, when signaling a significance map, the number of non-zero coefficients that have been signaled may be counted; when the number of non-zero coefficients has been signaled equal to the signaled number of non-zero coefficients minus 1, there is no need to continue signaling the significance map for the block, since only the possible next non-zero coefficient is the last coefficient in the block.

In one example, the syntax element mentioned above may be a flag (one coefficient flag) that indicates whether a transform block has only one non-zero coefficient. This flag may be signaled after the position of the last non-zero coefficient and may also depend on that position. For example, if the last non-zero coefficient is the first coefficient (DC) in the block, it is already known that only one coefficient is possible and one coefficient flag is not needed. Similarly, a flag may be signaled only for a case when the position of the last non-zero coefficient is greater than some threshold. For example, if the last non-zero coefficient number is some distance from the beginning of the block, a coefficient flag is signaled.

The context model selection for one coefficient flag may depend on the position of the last non-zero coefficient in the block, the distance that last position is from the beginning of the block, the last non-zero coefficient value, and/or the sign of that value, either alone or in any combination.

One coefficient flag may be signaled after the position of the last non-zero coefficient, in another alternative after the position of the last non-zero coefficient and its value, in yet another alternative after the position of the last non-zero coefficient, its value and sign. This may depend on which context model is applied (see above).

In yet another example, one coefficient flag may be signaled before the last non-zero coefficient number and may indicate whether the block has only one DC (first transform coefficient) coefficient. In this example, the last non-zero coefficient number may depend on that flag, and is signaled when the flag has a value representing "disabled," which means that there is more than one non-zero coefficient or that one coefficient is not a DC coefficient. In addition, the last position signaling may be modified by subtracting 1 from the position coordinates, since if one coefficient flag is disabled, the last position equal to the DC coefficient cannot be signaled; otherwise, that flag will be enabled.

When such one coefficient flag is signaled and has a value representing "enabled" (i.e., the block has only one non-zero coefficient), a significance map may not be needed, and only the position of the last coefficient and its value and sign may be signaled. Accordingly, video encoder 20 may signal only the position of the last coefficient, and video decoder 30 may receive only data representative of the position of the last coefficient and determine that subsequent data of the bitstream applies to a different set of syntax elements (e.g., syntax elements of the same block, but unrelated to the transform coefficient data, or syntax elements of subsequent blocks).

One coefficient flag may be conditionally signaled based on which transform type (e.g., DCT or EMT) is used, and may depend on the EMT flag or EMT index. In addition, one coefficient flag signaling may depend on: whether quadratic transforms (e.g., NSST or ROT) are used in blocks; a quadratic transform syntax, such as NSST flag, NSST index, ROT flag, or ROT index; and so on. For example, if a quadratic transform is used, no flag may be signaled.

The more detailed examples described for one non-zero coefficient flag may apply to the case when more than one non-zero coefficient value is signaled in a block.

Video encoder 20 and video decoder 30 may switch between different transform types based on non-zero coefficients. Two different types of transforms may be used, e.g., one type being a separable transform and the other type being a non-separable transform. For use with each type of transform, some restrictions may be added, i.e., only non-zero coefficients may be present for certain positions inside the transform unit. In this way, the selected type of transform is not signaled, but video decoder 30 may derive the selected type of transform from the positions of non-zero coefficients inside the transform unit after decoding the coefficients. By deriving the transform type rather than receiving explicit signaling, the encoded video bitstream size may be reduced, which may thereby improve bitstream efficiency without introducing excessive complexity into video decoder 30 and without losing the quality of the resulting decoded video data. In addition, providing multiple types of transforms in this manner may result in even further improvements in bitstream efficiency in that the resulting transform types may, on average, better compress the residual data.

In one example, if there is at least one non-zero coefficient following the nth coefficient in scan order (where N may be predefined or derived based on some condition), then applying a separable transform; otherwise (all non-zero coefficients exist only in the first N coefficients in scan order) the non-separable transform is applied.

In another example, the type of transform is still signaled by a flag/exponent, but the context model used for entropy coding (entropy encoding or entropy decoding) the coefficients at different locations may depend on the value of the signaled flag/exponent.

In another example, the flag or index used to indicate the transform selection mentioned above is signaled after the nth coefficient or all coefficients. The flag or index may be context coded, where the context depends on the position of the last non-zero coefficient. For example, the context may depend on whether the last non-zero coefficient occurs before or after the nth coefficient. If the last non-zero coefficient stops at the Nth coefficient itself, a context model may be associated with either group, or possibly a separate context assigned, before or after the Nth coefficient mentioned earlier.

Video encoder 20 may encode/signal syntax elements for a signaling unit, while video decoder 30 may decode and interpret values of the syntax elements of the signaling unit. As described earlier, syntax elements may be signaled at the signaling unit level. However, some syntax elements may not be applicable to each block included into the signaling unit.

For example, a secondary transform (e.g., NSST) may be applied only to intra-predicted blocks, which have non-zero coefficients. The conditions may be: there are no blocks in the signaling unit to which the quadratic transform is to be applied. For these conditions, signaling NSST information (e.g., NSST index or NSST flag) for this signaling unit is not needed and may simply be a wasted bit. In another example, a first transform (e.g., EMT) is applied to a non-zero residual block. The condition may also be: all blocks included in a signaling unit have zero residue, and signaling EMT information (e.g., EMT flag or EMT index) is not required for this signaling unit and may simply waste bits.

In some examples, video encoder 20 may defer signaling unit syntax signaling until the first block included in the signaling unit for which such signaling applies. In other words, the signaling unit syntax is not signaled for blocks at the beginning of the signaling unit in scan order that this signaling is not applicable. Likewise, video decoder 30 will only apply the value of the signaling unit syntax element to blocks following the signaling unit syntax element in the signaling unit.

For example, video encoder 20 may not signal some types of information applicable to all blocks within a signaling unit until there is a block in the signaling unit to which the information applies. Similarly, video decoder 30 may not analyze some types of information applicable to all blocks within the signaling unit until there is a block in the signaling unit to which the information applies. The information may be information identifying a particular coding tool, syntax element, and so forth.

As an example, video encoder 20 may signal and video decoder 30 may receive NSST information (index, flag, etc.) in the first intra block in the transceiver unit with a non-zero residual. In another example, video encoder 20 may signal and video decoder 30 may receive EMT information (exponent, flag, etc.) at the first non-zero block in a transceiver unit. These blocks may not necessarily be at the beginning of the corresponding signaling unit. In some examples, once a syntax element (e.g., information for a coding tool or other type of syntax element) is signaled for a first block using the syntax element, that information may be uniform for all blocks using the syntax element that follow that first block in block scanning order. However, this should not be seen as a requirement in all situations.

By deferring signaling of these syntax elements, bits associated with syntax elements may be saved if there are no blocks in the signaling unit that require these syntax elements or no blocks in the signaling unit for which such signaling is applicable, as compared to signaling and receiving techniques that always signal syntax elements at the signaling unit level regardless of whether the signaling unit includes any blocks for which the signaling unit syntax elements will apply.

Video encoder 20 may utilize similar techniques to defer signaling of other syntax elements (not necessarily transform-related) at the signaling unit level, depending on the signaled information and the block type to which such information included in the signaling unit applies. The above examples of deferring signaling and analyzing the information of the signaling units should not be considered limiting.

Various syntax elements may be considered specific to a signaling unit. Some syntax elements may be introduced for signaling units only and may not exist for other blocks. These syntax elements may be, for example, control flags and coding mode related parameters. In one example, the signaling unit syntax elements include any or all of the first transform (e.g., EMT) and/or quadratic transform syntax elements (e.g., NSST or ROT flags and/or indices) as mentioned earlier, and these syntax elements need not be present for blocks that are larger than or not included in the signaling unit.

Alternatively or additionally, existing syntax elements of a block signaled for a signaling unit may have different range values or different semantics/interpretations than the same syntax elements signaled for blocks that are larger than the signaling unit or not included in the signaling unit. In one example, the non-zero coefficient threshold that identifies when to signal the first transform and the second transform syntax elements may be different for a signaling unit than for other blocks. These thresholds may be greater or less than the corresponding thresholds for the other blocks.

For example, a quadratic transform (e.g., NSST or ROT) index and/or flag may be signaled for a block in a signaling unit that has at least one non-zero transform coefficient, and a quadratic transform index may be signaled for a block that is larger than the signaling unit or not included in the signaling unit if the block has at least two non-zero coefficients. When no quadratic transform index is signaled, video decoder 30 infers the value of the quadratic transform index as, for example, equal to a default value (e.g., 0). The same technique may be applied to the first transformation or any other transformation.

These signaling unit specific parameters may also be different depending on the slice type and/or the frequency block to which the signaling unit belongs. For example, I-slices, P-slices, and B-slices may have different signaling unit parameters, different range values, or different semantics/interpretations.

The signaling unit parameters described above are not limited to transforms, but may be used with any coding mode or introduced to any mode.

Video encoder 20 may further send syntax data, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, to video decoder 30, for example, in a picture header, a block header, a slice header, or other syntax data, such as a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a Video Parameter Set (VPS).

Where applicable, video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder or decoder circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic circuitry, software, hardware, firmware, or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined video encoder/decoder (codec). A device that includes video encoder 20 and/or video decoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone.

Fig. 2 is a block diagram showing an example of a video encoder 20 that may implement techniques for binarizing a quadratic transform index. Video encoder 20 may perform intra-coding and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove video spatial redundancy within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove video temporal redundancy within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of a number of spatial-based coding modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of a number of temporally based coding modes.

As shown in fig. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 2, video encoder 20 includes mode select unit 40, reference picture memory 64, which may also be referred to as a Decoded Picture Buffer (DPB), summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Mode select unit 40, in turn, includes motion compensation unit 44, motion estimation unit 42, intra-prediction unit 46, and partition unit 48. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform unit 60, and summer 62. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62, if desired. In addition to deblocking filters, additional filters (in-loop or post-loop) may be used. These filters are not shown for simplicity, but may filter the output of summer 50 (as an in-loop filter) if desired.

During the encoding process, video encoder 20 receives a video frame or slice to be coded. A frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive encoding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction unit 46 may alternatively perform intra-predictive encoding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.

Furthermore, partition unit 48 may partition the block of video data into sub-blocks based on an evaluation of previous partition schemes in previous coding passes. For example, partitioning unit 48 may initially partition a frame or slice into CTUs and partition each of the CTUs into sub-CUs based on a rate-distortion analysis (e.g., rate-distortion optimization). Mode select unit 40 may further generate a quad-tree data structure indicating the partitioning of the CTUs into sub-CUs. Leaf-node CUs of a quad-tree may include one or more PUs and one or more TUs.

Mode select unit 40 may select one of the prediction modes (intra or inter), e.g., based on the error results, and provide the resulting predicted block to summer 50 to generate residual data and to summer 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are shown separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference frame (or other coded unit) that is relative to a current block being coded within the current frame (or other coded unit). A predictive block is a block that is found to closely match the block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Thus, motion estimation unit 42 may perform a motion search relative to the full pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.

Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing the locations of the PUs to the locations of predictive blocks of the reference picture. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.

The motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vectors determined by motion estimation unit 42. Again, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may find the location of the predictive block to which the motion vector points in one of the reference picture lists. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to the luma component, and motion compensation unit 44 uses motion vectors calculated based on the luma component for both the chroma and luma components. Mode select unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.

As described above, as an alternative to inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction unit 46 may intra-predict the current block. In particular, intra-prediction unit 46 may determine the intra-prediction mode to be used to encode the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode to use from the tested modes.

For example, intra-prediction unit 46 may calculate rate-distortion values using rate-distortion analysis for various tested intra-prediction modes, and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block, which is encoded to produce the encoded block, and the bit rate (i.e., number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate ratios from the distortion and rate of various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

After selecting the intra-prediction mode for the block, intra-prediction unit 46 may provide information to entropy encoding unit 56 indicating the selected intra-prediction for the block. Entropy encoding unit 56 may encode information indicating the selected intra-prediction mode. Video encoder 20 may include the following in the transmitted bitstream: configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables); definition of the context in which the various blocks are encoded; and an indication of a most probable intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table to be used for each of the contexts.

Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Summer 50 represents the component that performs this subtraction operation. Transform processing unit 52 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, producing a video block that includes transform coefficient values. Wavelet transforms, integer transforms, subband transforms, Discrete Sine Transforms (DST), or other types of transforms may be used instead of DCT. In any case, transform processing unit 52 applies a transform to the residual block, producing a block of transform coefficients. The transform may convert the residual information from the pixel domain to a transform domain, such as the frequency domain.

Additionally, in some examples, transform processing unit 52 may apply a quadratic transform, such as a non-separable quadratic transform (NSST), to the transform coefficients produced by the first transform, e.g., when the block is intra predicted. Transform processing unit 52 may also pass one or more values for the secondary transform syntax elements of the block to entropy encoding unit 56 for entropy encoding. Entropy encoding unit 56 may entropy encode these and/or other syntax elements (e.g., a secondary transform syntax element or other signaling unit syntax elements) as discussed in more detail below with respect to fig. 3 in accordance with the techniques of this disclosure.

Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting a quantization parameter.

After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients (and any corresponding values for related syntax elements, such as a secondary transform syntax element, a signaling unit syntax element, a coding tool syntax element, an Enhanced Multiple Transform (EMT) syntax element, and so forth). For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the contexts may be based on neighboring blocks. Following the entropy coding of entropy encoding unit 56, the encoded bitstream may be transmitted to another device (e.g., video decoder 30) or archived for later transmission or retrieval.

In accordance with the techniques of this disclosure, video encoder 20 may encode certain syntax elements at the signaling unit level. A signaling unit typically includes syntax elements relating to two or more blocks of video data, such as a Coding Tree Block (CTB) or a Coding Unit (CU). For example, the blocks may correspond to different branches/nodes of a common QTBT structure, or to distinct QTBT structures.

As discussed above, in one example, video encoder 20 may defer signaling syntax elements of the signaling units until video encoder 20 encounters a block to which those signaling unit syntax elements relate. In this way, video encoder 20 may avoid encoding the signaling unit syntax elements entirely if the signaling unit does not ultimately include any blocks to which the signaling unit syntax elements relate. If the signaling unit does contain a block to which the signaling unit syntax element relates, video encoder 20 may encode these syntax elements to form portions of the bitstream that follow, in encoding/decoding order, blocks to which the signaling unit syntax element is not related and that precede the block to which the signaling unit syntax element relates. The signaling unit syntax elements may include any or all of NSST information (NSST flag and/or index), EMT information (EMT flag and/or index), and so on.

For example, mode select unit 40 may determine whether the intra-predicted block results in zero or a non-zero residual (as calculated by summer 50). Mode select unit 40 may wait for the determination of signaling unit syntax elements for a signaling unit until an intra-predicted block with a non-zero residual (i.e., a residual block with at least one non-zero coefficient) has been encoded. After identifying an intra-predicted block with a non-zero residual, mode select unit 40 may determine one or more signaling unit syntax elements to be encoded for a signaling unit that includes the intra-predicted block, and further, entropy encoding unit 56 may entropy encode values of the signaling unit syntax elements at positions that follow other blocks of the signaling unit in encoding/decoding order but that precede the intra-predicted block of the signaling unit.

Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain. In particular, summer 62 adds the reconstructed residual block to the motion compensated prediction block that was earlier generated by motion compensation unit 44 or intra-prediction unit 46 to generate a reconstructed video block for storage in reference picture memory 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.

Video encoder 20 of fig. 2 represents an example of a video encoder that may be configured to: determining a maximum value of a quadratic transform (e.g., non-separable quadratic transform (NSST)) syntax element for a block of video data; and binarizing a value of a quadratic transform (e.g., NSST) syntax element based on the determined maximum value. Video encoder 20 may further entropy encode the value of a secondary transform (e.g., NSST) syntax element.

Fig. 3 is a block diagram of an example entropy encoding unit 56 that may be configured to perform CABAC in accordance with the techniques of this disclosure. Entropy encoding unit 56 initially receives syntax elements 118. If the syntax element 118 is already a binary-valued syntax element, the binarization step may be skipped. If the syntax element 118 is a non-binary-valued syntax element, the binarizer 120 binarizes the syntax element.

The binarizer 120 performs a mapping of non-binary values to a series of binary decisions. These binary decisions may be referred to as "bins". For example, for transform coefficient levels, the values of the levels may be decomposed into successive bins, each bin indicating whether the absolute value of the coefficient level is greater than a certain value. For example, for transform coefficients, a binary 0 (sometimes referred to as a significance flag) indicates whether the absolute value of the transform coefficient level is greater than 0; bin 1 indicates whether the absolute value of the transform coefficient level is greater than 1; and so on. A unique mapping may be generated for each non-binary value syntax element.

The binarizer 120 passes each bin to the binary arithmetic encoding side of the entropy encoding unit 56. That is, for a set of predetermined non-binary value syntax elements, each bin type (e.g., bin 0) is encoded before the next bin type (e.g., bin 1). In accordance with the techniques of this disclosure, when binarizing values of a quadratic transform syntax element, such as a non-separable quadratic transform (NSST) syntax element, of an intra-predicted block of video data, binarizer 120 may determine a maximum possible value for a quadratic transform (e.g., NSST) syntax element of the block, e.g., based on an intra-prediction mode used to predict the block and/or other parameters, such as the size of the block.

In one example, if the intra prediction mode for the block is DC, planar, or LM mode for the chroma component, the binarizer 120 determines that the maximum possible value of the NSST index is equal to 3, and otherwise the maximum possible value of the NSST index is equal to 4. The binarizer 120 then binarizes the actual value of the NSST index using a common binarization technique based on the determined maximum possible value regardless of the determined maximum possible value (e.g., using truncated unary binarization regardless of whether the determined maximum possible value of the NSST index is 3 or 4).

Entropy encoding can be performed in either the normal mode or the bypass mode. In bypass mode, the bypass coding engine 126 performs arithmetic coding using a fixed probability model (e.g., using golomb-rice or exponential golomb coding). Bypass mode is typically used for more predictable syntax elements.

Entropy coding in conventional mode CABAC involves performing context-based binary arithmetic coding. Conventional mode CABAC is typically performed to encode the probability of the value of a bin given the value of a previously coded bin, which is predictable for the bin value. The context modeler 122 determines the probability that a bin is the Least Probable Symbol (LPS). The context modeler 122 outputs a binary value and a context model (e.g., probability state σ) to the conventional encoding engine 124. The context model may be an initial context model for a series of bins, or the context modeler 122 may determine the context model based on the coded values of previously encoded bins. The context modeler 122 may update the context state based on whether the previously coded bin is an MPS or an LPS.

In accordance with the techniques of this disclosure, the context modeler 122 may be configured to determine a context model for entropy encoding a quadratic transform syntax element (e.g., an NSST syntax element) based on the determined maximum possible value of the quadratic transform syntax element discussed above.

After the context modeler 122 determines the context model and the probability state σ, the conventional coding engine 124 performs BAC on the bin values using the context model. Alternatively, in bypass mode, the bypass encoding engine 126 bypass encodes the bin values from the binarizer 120. In either case, entropy encoding unit 56 outputs an entropy encoded bitstream that includes the entropy encoded data.

In this way, video encoder 20 of fig. 1 and 2 (and entropy encoding unit 56 thereof described with respect to fig. 3) represents an example of a video encoder that includes: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: transforming intermediate transform coefficients of the block of video data using a quadratic transform; determining a maximum possible value for a quadratic transform syntax element for the block, the value of the quadratic transform syntax element representing a quadratic transform; binarizing the values of the quadratic transform syntax elements using a common binarization scheme regardless of the maximum possible value; and entropy encoding the binarized value of the twice-transformed syntax element of the block to form a binarized value representing a twice-transform for the block.

Fig. 4 is a block diagram showing an example of a video decoder 30 that may implement techniques for binarizing a quadratic transform index. In the example of fig. 4, video decoder 30 includes entropy decoding unit 70, motion compensation unit 72, intra prediction unit 74, inverse quantization unit 76, inverse transform unit 78, reference picture memory 82, and summer 80. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 (fig. 2).

In some examples, entropy decoding unit 70 decodes certain syntax elements of the signaling unit. For example, video decoder 30 may determine that two or more blocks of video data correspond to a common signaling unit. Entropy decoding unit 70 may entropy decode syntax elements for a signaling unit in accordance with the techniques of this disclosure. For example, entropy decoding unit 70 may entropy decode a secondary transform syntax element, such as a non-separable secondary transform (NSST) index and/or flag, an Enhanced Multiple Transform (EMT) syntax element, such as an EMT index and/or flag, and so on. Entropy decoding unit 70 may entropy decode signaling unit syntax elements that follow one or more blocks of the signaling unit but precede one or more other blocks of the signaling unit, and only apply the values of the signaling unit syntax elements to blocks that follow the syntax elements in decoding order.

Furthermore, video decoder 30 may infer certain data from the presence of syntax elements, e.g., the blocks immediately following these signaling unit syntax elements are inter-predicted and have non-zero residuals. Accordingly, the video decoder may determine that a relevant block-level syntax element (e.g., indicating that the block is intra-predicted and the block is coded (i.e., has a non-zero residual value)) is not present in the bitstream, and thereby determine that subsequent data of the bitstream applies to other syntax elements.

Additionally, entropy decoding unit 70 may entropy decode the data as discussed in more detail below with respect to fig. 5. For example, in accordance with the techniques of this disclosure, entropy decoding unit 70 may inverse-binarize the quadratic transform syntax element values using a common binarization scheme (e.g., truncated unary binarization) regardless of the maximum possible value of quadratic transform syntax element values.

Motion compensation unit 72 may generate prediction data based on the motion vectors received from entropy decoding unit 70, while intra-prediction unit 74 may generate prediction data based on the intra-prediction mode indicator received from entropy decoding unit 70.

During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, or intra-prediction mode indicators, among other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

When a video slice is coded as an intra-coded (I) slice, intra-prediction unit 74 may generate prediction data for a video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When a video frame is coded as an inter-coded (i.e., B or P) slice, motion compensation unit 72 generates a predictive block for the video block of the current video slice (assuming the video block is inter-predicted) based on the motion vectors and other syntax elements received from entropy decoding unit 70. An inter-predictive block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct the reference frame list using a default construction technique based on the reference pictures stored in reference picture memory 82: list 0 and list 1. Blocks of P and B slices may also be intra predicted.

Motion compensation unit 72 determines prediction information for the video blocks of the current video slice by analyzing the motion vectors and other syntax elements and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine a prediction mode (e.g., intra or inter prediction) to code video blocks of a video slice, an inter-prediction slice type (e.g., a B-slice or a P-slice), construction information for one or more of the reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.

Motion compensation unit 72 may also perform interpolation based on the interpolation filters. Motion compensation unit 72 may use interpolation filters as used by video encoder 20 during encoding of video blocks to calculate interpolated values for sub-integer pixels of a reference block. In this case, motion compensation unit 72 may determine the interpolation filter used by video encoder 20 from the received syntax elements and use the interpolation filter to generate the predictive block.

Inverse quantization unit 76 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include using a quantization parameter QP calculated by video decoder 30 for each video block in the video slice_YTo determine the degree of quantization that should be applied and likewise the degree of inverse quantization.

Inverse transform unit 78 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain.

After motion compensation unit 72 generates the predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform unit 78 with the corresponding predictive block generated by motion compensation unit 72. Summer 80 represents the component that performs this summation operation. Optionally, a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts. Other loop filters may also be used (in or after the coding loop) to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 82, reference picture memory 82 storing reference pictures for subsequent motion compensation. Reference picture memory 82 also stores decoded video for later presentation on a display device, such as display device 32 of fig. 1.

Video decoder 30 of fig. 4 represents an example of a video decoder that may be configured to: determining a maximum value of a quadratic transform (e.g., non-separable quadratic transform (NSST)) syntax element for a block of video data; and binarizing the value of the NSST syntax element based on the determined maximum value. Video decoder 30 may further entropy decode the value of the NSST syntax element.

Fig. 5 is a block diagram of an example entropy decoding unit 70 that may be configured to perform CABAC in accordance with the techniques of this disclosure. Entropy decoding unit 70 of fig. 5 performs CABAC in a reciprocal manner to that of entropy encoding unit 56 described in fig. 5. Entropy decoding unit 70 receives entropy encoded bits from bitstream 218. Entropy decoding unit 70 provides the entropy encoded bits to context modeler 220 or bypass decoding engine 222 based on whether the entropy encoded bits were entropy encoded using bypass mode or regular mode. If the entropy encoded bits are entropy encoded in a bypass mode, the bypass decoding engine 222 uses bypass decoding (e.g., Golomb-Rice or exponential Golomb decoding) to entropy decode the entropy encoded bits.

If the entropy encoded bits are entropy encoded in a conventional mode, the context modeler 220 may determine a probability model for the entropy encoded bits and the conventional decoding engine 224 may entropy decode the entropy encoded bits to generate a bin of the non-binary value syntax element (or the syntax element itself in the case of a binary value).

The context modeler 220 may use the techniques of this disclosure to determine context models and probability states for certain syntax elements, such as quadratic transform syntax elements and/or Enhanced Multiple Transform (EMT) syntax elements (e.g., NSST index, NSST flag, EMT index, EMT flag, etc.). For example, the context modeler 220 may determine the context model based on the determined maximum possible value of the NSST syntax element. Entropy decoding unit 70 may determine the most probable value for the NSST syntax element based on, for example, the intra-prediction mode of the block to which the NSST syntax element corresponds and/or the size of the block.

After the context modeler 220 determines the context model and the probability state σ, the conventional decoding engine 224 performs binary arithmetic decoding on the bin values based on the determined context model.

After entropy decoding the bins by the conventional decoding engine 224 or the bypass decoding engine 222, the inverse binarizer 230 may perform inverse mapping to convert the bins back into values for non-binary value syntax elements. In accordance with the techniques of this disclosure, inverse binarizer 230 may inverse-binarize the twice transformed syntax element values (e.g., NSST, ROT, and/or EMT values) using a common binarization scheme (e.g., truncated unary binarization), regardless of the maximum possible value of the twice transformed syntax element values.

For example, when inverse binarizing values of a quadratic transform syntax element, such as a non-separable quadratic transform (NSST) syntax element, of an intra-predicted block of video data, inverse binarizor 230 may determine a maximum possible value for a quadratic transform (e.g., NSST) syntax element of the block, e.g., based on the intra-prediction mode used to predict the block and/or other parameters, such as the size of the block.

In one example, if the intra prediction mode for the block is a DC, plane, or LM mode for the chroma component, the inverse binarizer 230 determines that the maximum possible value of the NSST index is equal to 3, and otherwise the maximum possible value of the NSST index is equal to 4. The inverse binarizer 230 then inverse binarizes the actual value of the NSST exponent from the entropy decoded bin string using a common binarization technique based on the determined maximum possible value regardless of the determined maximum possible value (e.g., using truncated unary inverse binarization regardless of whether the determined maximum possible value of the NSST exponent is 3 or 4).

In this way, video decoder 30 of fig. 1 and 4 (including entropy decoding unit 70 described with respect to fig. 5) represents an example of a video decoder that includes: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a maximum possible value for a quadratic transform syntax element for a block of video data; entropy decoding a value of a quadratic transform syntax element of a block to form a binarized value representing a quadratic transform for the block; inverse binarizing values of the quadratic transform syntax elements using a common binarization scheme regardless of a maximum possible value to determine a quadratic transform for the block; and inverse transforming transform coefficients of the block using the determined quadratic transform.

Fig. 6 is a flow diagram depicting an example method of encoding video data, in accordance with the techniques of this disclosure. For purposes of example and explanation, the method of fig. 6 is explained with respect to video encoder 20 and its components as discussed above with respect to fig. 1,2, and 3. However, it should be understood that in other examples, other video encoding devices may perform this method or similar methods consistent with the techniques of this disclosure.

Initially, video encoder 20 receives a block to be encoded (250). In this example, assume that mode select unit 40 of video encoder 20 determines to intra-predict a block (252). Although not shown in fig. 6, such a decision may include predicting a block using various prediction modes, including intra or inter prediction modes, and ultimately determining that the block is to be intra predicted using a particular intra prediction mode (e.g., angular mode or non-angular mode, such as DC, planar, or LM mode). Intra-prediction unit 46 of video encoder 20 then intra-predicts the block using the intra-prediction mode, producing a predicted block.

Summer 50 then computes a residual block (254). In particular, summer 50 computes pixel-by-pixel differences between the original block and the predicted block to compute a residual block, where each value (sample) of the residual block represents a corresponding pixel difference.

Transform processing unit 52 then transforms the residual block using a first transform (e.g., DCT or EMT) (256) to generate intermediate transform coefficients. In this example, transform processing unit 52 also applies a quadratic transform (e.g., NSST or ROT) to the intermediate transform coefficients resulting from the first transform (258). In some examples, transform processing unit 52 may select a quadratic transform from a plurality of available quadratic transforms. Accordingly, transform processing unit 52 may generate values for one or more quadratic transform syntax elements (e.g., NSST flag, NSST index, ROT flag, ROT index, EMT flag, and/or EMT index) and provide these syntax element values to entropy encoding unit 56.

Quantization unit 54 quantizes the final transform coefficients produced by the quadratic (or any subsequent) transform, and entropy encoding unit 56 entropy encodes the quantized transform coefficients (260), as well as other syntax elements of the block (e.g., syntax elements representing prediction modes, partition syntax elements representing the size of the block, etc.). In some examples, entropy encoding unit 56 also entropy encodes signaling unit syntax elements of signaling units that include the block. If the block is the first block to which these signaling unit syntax elements are applied, entropy encoding unit 56 may encode the signaling unit syntax elements and output entropy encoded signaling unit syntax elements before outputting other block-based syntax elements for the block, as discussed above.

Entropy encoding unit 56 also entropy encodes the secondary transform syntax as discussed above. Specifically, the binarizer 120 binarizes the twice transformed syntax element according to the techniques of the present invention (264). For example, the binarizer 120 may perform a particular binarization scheme (e.g., truncate unary binarization), regardless of the maximum possible value of the quadratic transform syntax element.

The binarizer 120 may determine the maximum possible value of the secondary transform syntax element based on, for example, the intra prediction mode used to intra predict the block, as discussed above. For example, if the intra-prediction mode is a non-angular mode, the binarizer 120 may determine that the maximum possible value of the quadratic transform syntax element is 3, but if the intra-prediction mode is an angular mode, the binarizer 120 may determine that the maximum possible value of the quadratic transform syntax element is 4. Although this determination may be used during binarization, in some examples, this determination does not affect the actual binarization scheme (e.g., truncating unary binarization) that binarizes the twice-transformed syntax element values that binarizes by binarizer 120.

After binarization, the context modeler 122 may determine a context to entropy encode the twice transformed syntax element (266). In some examples, the context modeler 122 selects a context based on the largest possible value of the quadratic transform syntax element determined as discussed above. Conventional encoding engine 124 may then entropy encode the binarized value for the quadratic transform syntax element using the determined context (268).

In this manner, the method of fig. 6 represents an example of a method of encoding video data, the method comprising: transforming intermediate transform coefficients of the block of video data using a quadratic transform; determining a maximum possible value for a quadratic transform syntax element for the block, the value of the quadratic transform syntax element representing a quadratic transform; binarizing the values of the quadratic transform syntax elements using a common binarization scheme regardless of the maximum possible value; and entropy encoding the binarized value of the twice-transformed syntax element of the block to form a binarized value representing a twice-transform for the block.

Fig. 7 is a flow diagram depicting an example of a method of decoding video data, in accordance with the techniques of this disclosure. For purposes of example and explanation, the method of fig. 7 is explained with respect to video decoder 30 and its components as discussed above with respect to fig. 1, 4, and 5. However, it should be understood that in other examples, other video encoding devices may perform this method or similar methods consistent with the techniques of this disclosure.

Initially, entropy decoding unit 70 entropy decodes the prediction information and quantized transform coefficients of the block of video data (280). In accordance with the techniques of this disclosure, entropy decoding unit 70 also entropy decodes the secondary transform syntax elements for the block. In particular, the context modeler 220 determines a context to entropy decode the quadratic transform syntax element (282). The context modeler 220 may determine the context based on the maximum possible value of the quadratic transform syntax element. For example, if the intra-prediction mode is a non-angular mode (e.g., DC, planar, or LM mode), the context modeler 220 may determine that the maximum possible value of the quadratic transform syntax element is 3, but otherwise, if the intra-prediction mode is an angular mode, the context modeler 220 may determine that the maximum possible value is 4. The context modeler 220 may then determine the context from the maximum possible value of the quadratic transform syntax element. The conventional decoding engine 224 may then entropy decode data of the quadratic transform syntax element using the determined context (284).

The inverse binarizer 230 may then inverse binarize the entropy decoded data for the quadratic transform syntax element (286) to generate a value for the quadratic transform syntax element. This value may represent, for example, whether a quadratic transform is to be applied (e.g., NSST flag or ROT flag), and if so, which of a plurality of quadratic transforms is to be applied (e.g., NSST index or ROT index).

Inverse quantization unit 76 then inverse quantizes the entropy decoded coefficients for the block (288). Inverse transform unit 78 may use the value of the secondary transform syntax element to determine whether to perform a secondary transform, and if so, which of a plurality of secondary transforms to apply. In fig. 7 it is assumed that a quadratic transformation is applied. Thus, inverse transform 78 initially inverse transforms the transform coefficients using a quadratic transform (290) to generate intermediate transform coefficients, then inverse transforms the intermediate transform coefficients using a first transform (e.g., DCT or EMT) (292) to reproduce the residual block for the block.

Intra-prediction unit 74 also intra-predicts the block using the indicated intra-prediction mode (294) to generate a predicted block for the block. Summer 80 then combines the predicted block and the residual block pixel by pixel to generate a decoded block (296). Finally, video decoder 30 outputs the decoded blocks. Video decoder 30 may also store decoded blocks in reference picture memory 82, e.g., blocks used to intra or inter predict subsequently decoded.

In this manner, the method of fig. 7 represents an example of a method that includes the following operations: determining a maximum possible value for a quadratic transform syntax element for a block of video data; entropy decoding a value of a quadratic transform syntax element of a block to form a binarized value representing a quadratic transform for the block; inverse binarizing a value of a quadratic transform syntax element based on the determined maximum possible value to determine a quadratic transform for the block; and inverse transforming transform coefficients of the block using the determined quadratic transform.

It will be recognized that depending on the example, certain acts or events of any of the techniques described herein can be performed in a different sequence, may be added, merged, or left out entirely (e.g., not all described acts or events are necessary for the practice of the techniques). Further, in some examples, actions or events may be performed concurrently (e.g., via multi-threaded processing, interrupt processing, or multiple processors) rather than sequentially.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include: computer-readable storage media, which corresponds to tangible media (e.g., data storage media); or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. The computer process product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather refer to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, Integrated Circuits (ICs), or collections of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperability hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method of decoding video data, the method comprising:

decoding transform coefficients of a transform block of a current block of video data;

determining a position of a non-zero value transform coefficient of a decoded transform coefficient in the transform block;

determining a type of transform to perform based on a location of a non-zero value transform coefficient in the transform block;

transforming the transform coefficients in the transform block using the type of transform to produce a residual block; and

decoding the current block using the residual block.

2. The method of claim 1, wherein determining the type of transform comprises determining the type of transform without decoding a value of a syntax element of video data that indicates the type of transform.

3. The method of claim 1, wherein determining the type of transformation comprises:

determining a number of non-zero-valued transform coefficients that follow an Nth transform coefficient in the transform block in scan order, where N is an integer value; and

determining the type of the transform according to a number of non-zero-valued transform coefficients that follow the Nth transform coefficient in a scan order.

4. The method of claim 3, further comprising determining the value of N to be a predefined value.

5. The method of claim 3, wherein determining the type of transformation comprises: determining the type of the transform as a non-separable transform when the number of non-zero value transform coefficients following the Nth transform coefficient is zero.

6. The method of claim 3, wherein determining the type of transformation comprises:

when the number of non-zero-valued transform coefficients following the Nth transform coefficient is greater than zero:

decoding a value of a syntax element indicating a type of a transform coefficient; and

determining a type of transform coefficient according to a value of the syntax element; and

determining the type of the transform without decoding the value of the syntax element when the number of non-zero value transform coefficients following the Nth transform coefficient is equal to zero.

7. The method of claim 1, wherein decoding the transform coefficients comprises:

determining a context model for entropy decoding a value of one of the transform coefficients according to a value of a syntax element corresponding to a type of the transform; and

entropy decoding a value of one of the transform coefficients using the context model.

8. The method of claim 1, wherein decoding the current block comprises:

forming a prediction block for the current block; and

combining samples of the prediction block with samples of the residual block.

9. The method of claim 1, further comprising encoding the current block prior to decoding the current block.

10. The method of claim 9, wherein transforming the transform coefficients comprises applying a quadratic transform of the type of transform to the transform coefficients.

11. A device for decoding video data, the device comprising:

a memory configured to store video data; and

one or more processors implemented in circuitry and configured to:

transforming transform coefficients in the transform block using the type of transform to produce a residual block; and

decoding the current block using the residual block.

12. The apparatus of claim 11, wherein the one or more processors are configured to determine the type of transform without decoding a value of a syntax element of the video data that indicates the type of transform.

13. The apparatus of claim 11, wherein to determine the type of transformation, the one or more processors are configured to:

14. The apparatus of claim 13, wherein the one or more processors are further configured to determine the value of N to be a predefined value.

15. The apparatus of claim 13, wherein the one or more processors are configured to: determining the type of the transform as a non-separable transform when the number of non-zero value transform coefficients following the Nth transform coefficient is zero.

16. The apparatus of claim 13, wherein to determine the type of transformation, the one or more processors are configured to: .

17. The apparatus of claim 11, wherein to decode the current block, the one or more processors are configured to

Forming a prediction block for the current block; and

combining samples of the prediction block with samples of the residual block.

18. The apparatus of claim 11, wherein the one or more processors are further configured to encode the current block prior to decoding the current block.

19. The apparatus of claim 11, further comprising a display configured to display decoded video data.

20. The apparatus of claim 11, wherein the apparatus comprises one or more of: a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.

21. The apparatus of claim 11, wherein the one or more processors are configured to apply a quadratic transform of the type of transform to the transform coefficients.

22. An apparatus for decoding video data, the apparatus comprising

Means for decoding transform coefficients of a transform block of a current block of video data;

means for determining locations of non-zero-valued transform coefficients of decoded transform coefficients in the transform block;

means for determining a type of transform to perform based on a position of a non-zero value transform coefficient in the transform block;

means for transforming transform coefficients in the transform block using the type of transform to produce a residual block; and

means for decoding the current block using the residual block.

23. The apparatus of claim 22, wherein means for determining the type of transform comprises means for determining the type of transform without decoding a value of a syntax element of the video data that indicates the type of transform.

24. The apparatus of claim 22, wherein the means for determining the type of transformation comprises:

means for determining a number of non-zero-valued transform coefficients that follow an Nth transform coefficient in the transform block in a scan order, where N is an integer value; and

means for determining a type of the transform based on a number of non-zero-valued transform coefficients that follow the Nth transform coefficient in a scan order.

25. The apparatus of claim 22, wherein means for transforming the transform coefficients comprises means for applying a quadratic transform of the type of transform to the transform coefficients.

26. A computer-readable storage medium having instructions stored thereon that, when executed, cause a processor of an apparatus for decoding video data to:

decoding the current block using the residual block.