WO2022063035A1 - Context model selection method and apparatus, device and storage medium - Google Patents

Context model selection method and apparatus, device and storage medium Download PDF

Info

Publication number
WO2022063035A1
WO2022063035A1 PCT/CN2021/118832 CN2021118832W WO2022063035A1 WO 2022063035 A1 WO2022063035 A1 WO 2022063035A1 CN 2021118832 W CN2021118832 W CN 2021118832W WO 2022063035 A1 WO2022063035 A1 WO 2022063035A1
Authority
WO
WIPO (PCT)
Prior art keywords
coding unit
target
division
syntax element
prediction
Prior art date
Application number
PCT/CN2021/118832
Other languages
French (fr)
Chinese (zh)
Inventor
朱晗
王英彬
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022063035A1 publication Critical patent/WO2022063035A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the embodiments of the present application relate to the technical field of audio and video, and in particular, to a method, apparatus, device, and storage medium for selecting a context model.
  • a video signal refers to a sequence of images comprising multiple frames. Because the data bandwidth of the digitized video signal is very high, it is difficult for computer equipment to directly store and process it, so it is necessary to use video compression technology to reduce the data bandwidth of the video signal.
  • the block division structure of the video is indicated by syntax elements, and entropy coding needs to be applied to these syntax elements before writing into the code stream, so as to further improve the coding efficiency by utilizing the high-level information between the syntax elements.
  • entropy coding needs to be applied to these syntax elements before writing into the code stream, so as to further improve the coding efficiency by utilizing the high-level information between the syntax elements.
  • multiple context models are defined for each syntax element in the entropy coding process, and the probability estimation of the syntax elements can be realized through the context model.
  • syntax element is used to indicate the block division structure of the coding unit
  • context model is used to estimate the probability of the syntax element
  • syntax element is used to indicate the block division structure of the coding unit
  • context model is used to estimate the probability of the syntax element
  • FIG. 4 is a schematic diagram of a spatial positional relationship of an encoding unit provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of a streaming system provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a method for selecting a context model provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a prediction process of a structural prediction model provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an output vector of a structure prediction model provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a selection process of a context model provided by an embodiment of the present application.
  • FIG. 12 is a block diagram of an apparatus for selecting a context model provided by an embodiment of the present application.
  • QP Quality Parameter, quantization parameter
  • FIG. 2 shows a schematic diagram of a block division process provided by an embodiment of the present application.
  • the block division process can be applied to AVS (eg, AVS3) or a next-generation video codec standard, which is not made in this embodiment of the present application. limited.
  • the block division process involves three division methods, namely: QT (Quad Tree, quad tree), BT (Binary Tree, binary tree), EQT (Extended Quad-Tree, extended quad tree) ).
  • QT Quad Tree, quad tree
  • BT Binary Tree, binary tree
  • EQT Extended Quad-Tree, extended quad tree
  • BT and EQT have the difference between horizontal division and vertical division.
  • the VEQT division method is to divide the node into three sub-regions, left, middle and right, and vertically divide the middle sub-region into two sub-regions, upper-middle and lower-middle sub-regions.
  • Node the width of the left and right child nodes is one-fourth of the width of the node before the division
  • the height is equal to the height of the node before the division
  • the width of the upper middle and lower child nodes is two times the width of the node before the division. 1/2, and the height is 1/2 of the height of the node before division).
  • the block partitioning structure of the video is indicated by syntax elements.
  • syntax elements corresponding to the three division methods shown in Figure 3 above, the following four syntax elements are defined:
  • bet_split_type_flag (BT/EQT division type flag): if the value of bet_split_type_flag is "0", it means that BT division should be used for BT/EQT division; if the value of bet_split_type_flag is "1", it means that BT/EQT division is performed EQT division should be used.
  • Figure 6 shows the placement of video encoders and video decoders in a streaming environment.
  • the subject matter disclosed in this application is equally applicable to other video-enabled applications, including, for example, videoconferencing, digital TV (television), CD (Compact Disc), DVD (Digital Versatile Disc), memory stick Compressed video, etc., are stored on digital media such as
  • encoded video data 304 (or encoded video codestream 304) is depicted as a thin line to emphasize the lower amount of encoded video data 304 (or encoded video codestream) 304), which can be stored on the streaming server 305 for future use.
  • One or more streaming client subsystems such as client subsystem 306 and client subsystem 308 in FIG. 6 , may access streaming server 305 to retrieve copies 307 and 309 of encoded video data 304 .
  • Client subsystem 306 may include, for example, video decoder 310 in electronic device 330 .
  • the reference coding unit is used to provide a reference for the block division of the target coding unit, where the block division refers to dividing the structure of the coding unit.
  • This embodiment of the present application does not limit the block size relationship between the reference coding unit and the target coding unit.
  • the block size of the reference coding unit may be equal to the block size of the target coding unit, or may not be equal to the block size of the target coding unit , such as smaller than the block size of the target coding unit.
  • This embodiment of the present application does not limit the positional relationship between the reference coding unit and the target coding unit.
  • the block division structure of the target coding unit is predicted by the block division structure of the coding unit (reference coding unit) that has completed the encoding process or the reconstruction process, and uses the predicted block division structure.
  • the block partition structure performs the selection of the context model.
  • the reference coding unit when the technical solution of the present application is applied to the video encoding process, the reference coding unit is called the coding unit that has completed the encoding process; when the technical solution of the present application is applied to the video decoding process, the reference A coding unit is referred to as a coding unit that has completed the reconstruction process.
  • the reference coding units used by the video decoder and the video encoder also need to be consistent, that is, , the video decoder and the video encoder need to use reference coding units with the same location information.
  • the number of reference coding units is a positive integer greater than or equal to 2; after step 710, the method further includes: determining the rate-distortion cost of each reference coding unit; according to the rate-distortion cost of each reference coding unit, from at least two A preferred coding unit is selected from the reference coding units, and the block division structure of the preferred coding unit is used to predict the block division structure of the target coding unit.
  • the rate-distortion (Rate-Distortion, RD) cost of the target coding unit under various division modes is also compared.
  • bit rate and distortion are usually negatively correlated, and higher compression rates bring lower bit rates, but also increase distortion, and vice versa.
  • a compromise choice is made, considering the rate-distortion cost.
  • the target coding unit is divided according to the block division structure with the least rate-distortion cost.
  • the number of coding units is one; or, it is preferable that the number of coding units is multiple.
  • This embodiment of the present application does not limit the selection method of the preferred coding unit.
  • the reference coding units may be sorted according to the order of the rate-distortion cost from the smallest to the largest.
  • the reference coding unit with S bits is used as the preferred coding unit, and S is a positive integer; or, after the rate-distortion cost of each reference coding unit is determined, the reference coding unit whose rate-distortion cost is less than the preset threshold may be used as the preferred coding unit.
  • the syntax element is used to indicate the block division structure of the coding unit
  • the context model is used to estimate the probability of the syntax element.
  • adding the block division structure of the target coding unit can reduce the selection conditions of the increased context model and save the number of code stream bits required for the transmission of the syntax elements. Improve coding efficiency. Therefore, after the block division prediction structure of the target coding unit is predicted and obtained, the block division prediction structure is added in the context model selection process of the syntax element, so as to improve the efficiency of entropy coding and reduce the number of bits of the code stream.
  • a selection mode flag such as spf_flag
  • spf_flag may be pre-defined to indicate whether to adopt the context model selection method described in the embodiments of the present application.
  • the coding unit adjacent to the target coding unit may be selected as the reference coding unit from the coding units that satisfy the target condition.
  • the coding units adjacent to the target coding unit include the left coding unit, the upper coding unit and the upper left coding unit of the target coding unit.
  • the coding units adjacent to the target coding unit include: coding unit A and coding unit B.
  • determine the coding unit that is consistent with the position information in at least one adjacent video frame as the reference coding unit that is, determine the coding unit at the position corresponding to the position information of the target coding unit in at least one adjacent video frame is the reference coding unit.
  • reference coding unit is determined by combining the coding unit adjacent to the target coding unit in space and the coding unit adjacent to the target coding unit in time; Stored coding units to determine reference coding units. It should be understood that these should all fall within the protection scope of the present application.
  • the input of the structure prediction model only includes the block division structure of the reference coding unit as an example for description.
  • the input of the structure prediction model may also include other coding information, such as QP (Quantization Parameter, quantization parameter) information, prediction (intra-frame prediction or inter-frame prediction) mode information, etc., it should be understood that these should all fall within the protection scope of the present application.
  • QP Quality Parameter, quantization parameter
  • prediction intra-frame prediction or inter-frame prediction
  • the determination method of the reference coding unit of the first coding unit is the same as the subsequent process of using the structure prediction model to predict the block division structure of the target coding unit, the reference coding unit of the target coding unit. Determine the same way. It is usually ensured that the reference coding unit is determined in the same way in the model training process and in the use process, which can improve the prediction accuracy of the structural prediction model.
  • the index increment value of the context model adopted by the target syntax element is determined, and the index increment value is used to indicate the context model, including: according to the predicted value of the target syntax element and the target The determination condition of the index increment value of the context model adopted by the syntax element determines the index increment value of the context model adopted by the target syntax element.
  • the ctxIdxInc of qt_split_flag, the ctxIdxInc of bet_split_flag, the ctxIdxInc of bet_split_type_flag, and the ctxIdxInc of bet_split_dir_flag are respectively introduced and explained exemplarily.
  • the block division prediction structure of the target coding unit is added to determine a more accurate context model for the syntax elements.
  • FIG. 14 does not constitute a limitation on the computer device 140, and may include more or less components than those shown, or combine some components, or adopt different component arrangements.
  • a chip comprising programmable logic circuits and/or program instructions for implementing the selection method of the context model described above when the chip is run on a computer device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Disclosed are a context model selection method and apparatus, a device, and a storage medium, belonging to the technical field of audio and video. The method comprises: determining a reference coding unit of a target coding unit; according to the reference coding unit, predicting a block division structure of the target coding unit to obtain a predicted block division structure of the target coding unit; and on the basis of the predicted block division structure of the target coding unit, determining a context model respectively used by at least one syntax element involved in the block division of the target coding unit. In the present application, a selection condition of a context model is optimized, so that the entropy coding efficiency is improved, the number of bits of a code stream is reduced, and the video compression efficiency is improved.

Description

上下文模型的选择方法、装置、设备及存储介质Context model selection method, apparatus, device and storage medium
本申请要求于2020年9月23日提交的申请号为202011009881.5、发明名称为“上下文模型的选择方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011009881.5 filed on September 23, 2020 and the invention titled "Method, Apparatus, Equipment and Storage Medium for Context Model Selection", the entire contents of which are incorporated herein by reference Applying.
技术领域technical field
本申请实施例涉及音视频技术领域,特别涉及一种上下文模型的选择方法、装置、设备及存储介质。The embodiments of the present application relate to the technical field of audio and video, and in particular, to a method, apparatus, device, and storage medium for selecting a context model.
背景技术Background technique
视频信号是指包括多个帧的图像序列。由于视频信号数字化后数据带宽很高,计算机设备难以直接对之进行存储和处理,因而需要采用视频压缩技术来降低视频信号的数据带宽。A video signal refers to a sequence of images comprising multiple frames. Because the data bandwidth of the digitized video signal is very high, it is difficult for computer equipment to directly store and process it, so it is necessary to use video compression technology to reduce the data bandwidth of the video signal.
视频压缩技术是通过视频编码来实现的,在一些主流的视频编码技术中,采用了混合编码框架,对输入的原始视频信号进行一系列的操作和处理。在编码端,编码器对输入的原始视频信号(视频序列)进行块划分结构、预测编码、变换编码及量化、熵编码或统计编码等,以得到视频码流,针对编码后得到的视频码流,对该视频码流进行封装得到视频轨道,并进一步封装视频轨道以得到视频文件,使得视频文件以一种更容易被解析的结构存储在编码器中。在解码端,解码器对已经编码过的图像,需要进行解封装、解码等逆操作以呈现视频内容。相关技术中,为了进一步压缩视频,视频的块划分结构通过语法元素来指示,在写进码流之前需要对这些语法元素采用熵编码,以利用语法元素之间的高阶信息进一步提升编码效率。并且,由于不同语法元素的概率分布特性不同,为了进一步压缩视频,熵编码过程中为每个语法元素定义了多个上下文模型,通过上下文模型可以实现对语法元素进行概率估计。Video compression technology is realized by video coding. In some mainstream video coding technologies, a hybrid coding framework is used to perform a series of operations and processing on the input original video signal. At the coding end, the encoder performs block division, predictive coding, transform coding, quantization, entropy coding or statistical coding on the input original video signal (video sequence) to obtain a video stream. , encapsulate the video stream to obtain a video track, and further encapsulate the video track to obtain a video file, so that the video file is stored in the encoder in a structure that is easier to parse. At the decoding end, the decoder needs to perform inverse operations such as decapsulation and decoding on the encoded images to present the video content. In the related art, in order to further compress the video, the block division structure of the video is indicated by syntax elements, and entropy coding needs to be applied to these syntax elements before writing into the code stream, so as to further improve the coding efficiency by utilizing the high-level information between the syntax elements. In addition, since the probability distribution characteristics of different syntax elements are different, in order to further compress the video, multiple context models are defined for each syntax element in the entropy coding process, and the probability estimation of the syntax elements can be realized through the context model.
在进行上下文模型设计时,既要考虑编码效率又要考虑概率模型实现的复杂度。When designing the context model, both the coding efficiency and the complexity of the probabilistic model implementation should be considered.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种上下文模型的选择方法、装置、设备及存储介质,可用于提升熵编码的效率、减少码流的比特数,并且提升视频压缩效果。所述技术方案如下:Embodiments of the present application provide a context model selection method, apparatus, device, and storage medium, which can be used to improve the efficiency of entropy coding, reduce the number of bits of a code stream, and improve video compression effects. The technical solution is as follows:
一方面,本申请实施例提供了一种上下文模型的选择方法,所述方法包括:On the one hand, an embodiment of the present application provides a method for selecting a context model, the method comprising:
确定目标编码单元的参考编码单元;determining the reference coding unit of the target coding unit;
根据所述参考编码单元,对所述目标编码单元的块划分结构进行预测,得到所述目标编码单元的块划分预测结构;According to the reference coding unit, the block division structure of the target coding unit is predicted to obtain the block division prediction structure of the target coding unit;
基于所述目标编码单元的块划分预测结构,确定所述目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型;determining, based on the block division prediction structure of the target coding unit, a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit;
其中,所述语法元素用于指示编码单元的块划分结构,所述上下文模型用于对语法元素概率估计。Wherein, the syntax element is used to indicate the block division structure of the coding unit, and the context model is used to estimate the probability of the syntax element.
另一方面,本申请实施例提供了一种上下文模型的选择装置,所述装置包括:On the other hand, an embodiment of the present application provides an apparatus for selecting a context model, and the apparatus includes:
单元确定模块,用于确定目标编码单元的参考编码单元;a unit determination module for determining the reference coding unit of the target coding unit;
结构预测模块,用于根据所述参考编码单元,对所述目标编码单元的块划分结构进行预测,得到所述目标编码单元的块划分预测结构;a structure prediction module, configured to predict the block division structure of the target coding unit according to the reference coding unit, to obtain the block division prediction structure of the target coding unit;
模型确定模块,用于基于所述目标编码单元的块划分预测结构,确定所述目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型;a model determination module, configured to determine, based on the block division prediction structure of the target coding unit, a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit;
其中,所述语法元素用于指示编码单元的块划分结构,所述上下文模型用于对语法元素进行概率估计。Wherein, the syntax element is used to indicate the block division structure of the coding unit, and the context model is used to estimate the probability of the syntax element.
再一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述上下文模型的选择方法。In another aspect, an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and the memory stores at least one instruction, at least a piece of program, code set or instruction set, the at least one instruction , The at least one piece of program, the code set or the instruction set is loaded and executed by the processor to implement the selection method of the context model as described above.
又一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述上下文模型的选择方法。In another aspect, an embodiment of the present application provides a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the computer-readable storage medium, the at least one instruction, the At least one piece of program, the code set or the instruction set is loaded and executed by the processor to implement the selection method of the context model as described above.
还一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行如上述上下文模型的选择方法。In another aspect, an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the selection method of the context model as described above.
还一方面,本申请实施例提供了一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片在计算机设备上运行时,用于实现如上述上下文模型的选择方法。In another aspect, an embodiment of the present application provides a chip, the chip includes a programmable logic circuit and/or program instructions, and when the chip runs on a computer device, it is used to implement the above context model selection method.
本申请实施例提供的技术方案可以带来如下有益效果:The technical solutions provided in the embodiments of the present application can bring the following beneficial effects:
通过根据某一编码单元的参考编码单元,预测该编码单元的块划分结构,然后在语法元素的上下文模型的选择过程中,添加该编码单元的块划分结构的预测结果,从而可以增加上下文模型的选择条件或优化上下文模型的选择条件,以提升熵编码的效率、减少码流的比特数。并且,由于本申请实施例只添加了块划分结构的预测结果即可获得更为精确的概率估计,有助于视频压缩效率的提升。By predicting the block division structure of a coding unit according to the reference coding unit of the coding unit, and then adding the prediction result of the block division structure of the coding unit during the selection process of the context model of the syntax element, the block division structure of the context model can be increased. Select conditions or optimize the selection conditions of the context model to improve the efficiency of entropy coding and reduce the number of bits in the code stream. In addition, since only the prediction result of the block division structure is added in the embodiment of the present application, a more accurate probability estimation can be obtained, which is helpful to improve the video compression efficiency.
附图说明Description of drawings
图1是本申请一个实施例提供的视频编码过程的示意图;1 is a schematic diagram of a video encoding process provided by an embodiment of the present application;
图2是本申请一个实施例提供的块划分流程的示意图;2 is a schematic diagram of a block division process provided by an embodiment of the present application;
图3是本申请一个实施例提供的块划分方式的示意图;3 is a schematic diagram of a block division method provided by an embodiment of the present application;
图4是本申请一个实施例提供的编码单元的空间位置关系的示意图;4 is a schematic diagram of a spatial positional relationship of an encoding unit provided by an embodiment of the present application;
图5是本申请一个实施例提供的通信***的框图;5 is a block diagram of a communication system provided by an embodiment of the present application;
图6是本申请一个实施例提供的流式传输***的框图;6 is a block diagram of a streaming system provided by an embodiment of the present application;
图7是本申请一个实施例提供的上下文模型的选择方法的流程图;7 is a flowchart of a method for selecting a context model provided by an embodiment of the present application;
图8是本申请一个实施例提供的结构预测模型的示意图;8 is a schematic diagram of a structure prediction model provided by an embodiment of the present application;
图9是本申请一个实施例提供的结构预测模型的预测过程的示意图;9 is a schematic diagram of a prediction process of a structural prediction model provided by an embodiment of the present application;
图10是本申请一个实施例提供的结构预测模型的输出向量的示意图;10 is a schematic diagram of an output vector of a structure prediction model provided by an embodiment of the present application;
图11是本申请一个实施例提供的上下文模型的选择流程的示意图;11 is a schematic diagram of a selection process of a context model provided by an embodiment of the present application;
图12是本申请一个实施例提供的上下文模型的选择装置的框图;FIG. 12 is a block diagram of an apparatus for selecting a context model provided by an embodiment of the present application;
图13是本申请另一个实施例提供的上下文模型的选择装置的框图;FIG. 13 is a block diagram of an apparatus for selecting a context model provided by another embodiment of the present application;
图14是本申请一个实施例提供的计算机设备的结构框图。FIG. 14 is a structural block diagram of a computer device provided by an embodiment of the present application.
具体实施方式detailed description
首先结合图1对视频编码技术进行简单介绍。请参考图1,其示出了本申请一个实施例提供的视频编码过程的示意图。Firstly, the video coding technology is briefly introduced with reference to FIG. 1 . Please refer to FIG. 1 , which shows a schematic diagram of a video encoding process provided by an embodiment of the present application.
视频信号是指包括一个或多个帧的图像序列。帧(frame)是视频信号空间信息的表示。以YUV模式为例,一个帧包括一个亮度样本矩阵(Y)和两个色度样本矩阵(Cb和Cr)。从视频信号的获取方式来看,可以分为摄像机拍摄到的以及计算机生成的两种方式。由于统计特性的不同,其对应的压缩编码方式也可能有所区别。A video signal refers to a sequence of images comprising one or more frames. A frame is a representation of the spatial information of a video signal. Taking YUV mode as an example, a frame includes one luma sample matrix (Y) and two chroma sample matrices (Cb and Cr). From the point of view of the acquisition method of video signal, it can be divided into two methods: camera captured and computer generated. Due to different statistical characteristics, the corresponding compression coding methods may also be different.
在一些主流的视频编码技术中,如H.265/HEVC(High Efficient Video Coding,高效率视频压缩编码)、H.266/VVC(Versatile Video Coding,通用视频编码)标准、AVS(Audio Video coding Standard,音视频编码标准)(如AVS3)中,采用了混合编码框架,对输入的原始视频信号进行如下一系列的操作和处理:In some mainstream video coding technologies, such as H.265/HEVC (High Efficient Video Coding, High Efficiency Video Coding), H.266/VVC (Versatile Video Coding, Universal Video Coding) standard, AVS (Audio Video coding Standard) , audio and video coding standards) (such as AVS3), a hybrid coding framework is used to perform the following series of operations and processing on the input original video signal:
1、块划分结构(Block Partition Structure):输入图像划分成若干个不重叠的处理单元,每个处理单元将进行类似的压缩操作。这个处理单元被称作CTU(Coding Tree Unit,编码树单元),或者LCU(Large Coding Unit,最大编码单元)。CTU再往下,可以继续进行更加精细的划分,得到一个或多个基本编码的单元,称之为CU(Coding Unit,编码单元)。每个CU是一个编码环节中最基本的元素,当进行预测时,CU还需要进一步划分为不同的PU(Predict Unit,预测单元)。以下描述的是对每一个CU可能采用的各种编码方式。1. Block Partition Structure: The input image is divided into several non-overlapping processing units, and each processing unit will perform similar compression operations. This processing unit is called CTU (Coding Tree Unit, coding tree unit), or LCU (Large Coding Unit, largest coding unit). Further down the CTU, more finer divisions can be continued to obtain one or more basic coding units, which are called CUs (Coding Units, coding units). Each CU is the most basic element in an encoding process. When performing prediction, the CU needs to be further divided into different PUs (Predict Unit, prediction unit). Described below are various encoding methods that may be used for each CU.
2、预测编码(Predictive Coding):包括了帧内预测和帧间预测等方式,原始视频信号经过选定的已重建视频信号的预测后,得到残差视频信号。编码端需要为当前CU决定在众多可能的预测编码模式中,选择最适合的一种,并告知解码端。其中,帧内预测是指预测的信号来自于同一图像内已经编码重建过的区域。帧间预测是指预测的信号来自已经编码过的,不同于当前图像的其他图像(称之为参考图像)。2. Predictive Coding: Including intra-frame prediction and inter-frame prediction. After the original video signal is predicted by the selected reconstructed video signal, a residual video signal is obtained. The encoder needs to decide among many possible predictive coding modes for the current CU, select the most suitable one, and inform the decoder. Among them, intra-frame prediction means that the predicted signal comes from an area that has been coded and reconstructed in the same image. Inter-frame prediction means that the predicted signal comes from other pictures (called reference pictures) that have been encoded and different from the current picture.
3、变换编码及量化(Transform&Quantization):残差视频信号经过DFT(Discrete Fourier Transform,离散傅里叶变换)、DCT(Discrete Cosine Transform,离散余弦变换)等变换操作,将信号转换到变换域中,称之为变换系数。在变换域中的信号,进一步进行有损的量化操作,丢失掉一定的信息,使得量化后的信号有利于压缩表达。在一些视频编码标准中,可能有多于一种变换方式可以选择,因此,编码端也需要为当前CU选择其中的一种变换,并告知解码端。量化的精细程度通常由量化参数来决定。QP(Quantization Parameter,量化参数)取值较大,表示更大取值范围的系数将被量化为同一个输出,因此通常会带来更大的失真,及较低的码率;相反,QP取值较小,表示较小取值范围的系数将被量化为同一个输出,因此通常会带来较小的失真,同时对应较高的码率。3. Transform & Quantization: The residual video signal undergoes transformation operations such as DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform, discrete cosine transform), etc., to convert the signal into the transform domain, are called transform coefficients. The signal in the transform domain is further subjected to a lossy quantization operation, which loses a certain amount of information, so that the quantized signal is conducive to compressed expression. In some video coding standards, there may be more than one transformation mode to choose from. Therefore, the encoder also needs to select one of the transformations for the current CU and inform the decoder. The fineness of quantization is usually determined by the quantization parameter. QP (Quantization Parameter, quantization parameter) value is larger, indicating that coefficients with a larger value range will be quantized into the same output, so it usually brings greater distortion and lower code rate; on the contrary, QP takes If the value is smaller, the coefficients representing a smaller value range will be quantized into the same output, so it usually brings less distortion and corresponds to a higher code rate.
4、熵编码(Entropy Coding)或统计编码:量化后的变换域信号,将根据各个值出现的频率,进行统计压缩编码,最后输出二值化(0或者1)的压缩码流(或称为“视频码流”,为描述方便,以下统称为“视频码流”)。同时,编码产生其他信息,例如选择的模式、运动矢量等,也需要进行熵编码以降低码率。统计编码是一种无损编码方式,可以有效的降低表达同样的信号所需要的码率。常见的统计编码方式有变长编码(Variable Length Coding,简称 VLC)或者基于上下文的二值化算术编码(Content Adaptive Binary Arithmetic Coding,简称CABAC)。4. Entropy Coding or Statistical Coding: The quantized transform domain signal will undergo statistical compression coding according to the frequency of occurrence of each value, and finally output a binarized (0 or 1) compressed code stream (or called "Video code stream", for the convenience of description, hereinafter collectively referred to as "video code stream"). At the same time, the encoding generates other information, such as the selected mode, motion vector, etc., and entropy encoding is also required to reduce the bit rate. Statistical coding is a lossless coding method that can effectively reduce the code rate required to express the same signal. Common statistical coding methods include Variable Length Coding (VLC for short) or context-based binary arithmetic coding (Content Adaptive Binary Arithmetic Coding, CABAC for short).
5、环路滤波(Loop Filtering):已经编码过的图像,经过反量化、反变换及预测补偿的操作(上述2~4的反向操作),可获得重建的解码图像。重建图像与原始图像相比,由于存在量化的影响,部分信息与原始图像有所不同,产生失真(distortion)。对重建图像进行滤波操作,例如去块效应滤波(deblocking)、SAO(Sample Adaptive Offset,样本自适应偏移量)或者ALF(Adaptive Lattice Filter,自适应格型滤波器)等滤波器,可以有效地降低量化所产生的失真程度。由于这些经过滤波后的重建图像,将作为后续编码图像的参考,用于对将来的信号进行预测,所以上述的滤波操作也被称为环路滤波,或者称为在编码环路内的滤波操作。5. Loop Filtering: The decoded image can be reconstructed by performing inverse quantization, inverse transformation, and prediction compensation operations (the inverse operations of 2 to 4 above) for an already encoded image. Compared with the original image, the reconstructed image has some information different from the original image due to the influence of quantization, resulting in distortion. Filtering the reconstructed image, such as deblocking, SAO (Sample Adaptive Offset, sample adaptive offset) or ALF (Adaptive Lattice Filter, adaptive lattice filter) and other filters, can effectively Reduces the amount of distortion produced by quantization. Since these filtered reconstructed images will be used as references for subsequent encoded images to predict future signals, the above filtering operation is also called in-loop filtering, or the filtering operation in the encoding loop .
由上述介绍说明可知,在解码端,对于每一个CU,解码器获得压缩码流后,一方面进行熵解码,以获得各种模式信息及量化后的变换系数,然后各个变换系数经过反量化及反变换,得到残差信号;另一方面,根据已知的编码模式信息,可获得该CU对应的预测信号。将该CU的残差信号与预测信号相加之后,即可得到该CU的重建信号。解码图像的重建值,需要经过环路滤波的操作,以产生最终的输出信号。It can be seen from the above description that at the decoding end, for each CU, after the decoder obtains the compressed code stream, it performs entropy decoding on the one hand to obtain various mode information and quantized transform coefficients, and then each transform coefficient undergoes inverse quantization and Inverse transformation is performed to obtain a residual signal; on the other hand, according to the known coding mode information, the prediction signal corresponding to the CU can be obtained. After the residual signal of the CU is added to the prediction signal, the reconstructed signal of the CU can be obtained. The reconstructed value of the decoded image needs to undergo a loop filtering operation to generate the final output signal.
请参考图2,其示出了本申请一个实施例提供的块划分流程的示意图,该块划分流程可以应用于AVS(如AVS3)或者下一代视频编解码标准中,本申请实施例对此不作限定。从图2中可以看出,块划分流程中涉及三种划分方式,分别为:QT(Quad Tree,四叉树)、BT(Binary Tree,二叉树)、EQT(Extended Quad-Tree,扩展四叉树)。其中,BT和EQT又有水平方向划分和垂直方向划分的区别,因此,BT又分为HBT(Horizontal Binary Tree,水平二叉树)和VBT(Vertical Binary Tree,垂直二叉树),EQT又分为HEQT(Horizontal Extended Quad Tree,水平扩展四叉树)和VEQT(Vertical Extended Quad Tree,垂直扩展四叉树)。下面,结合图3对这三种划分方式进行介绍说明。Please refer to FIG. 2 , which shows a schematic diagram of a block division process provided by an embodiment of the present application. The block division process can be applied to AVS (eg, AVS3) or a next-generation video codec standard, which is not made in this embodiment of the present application. limited. As can be seen from Figure 2, the block division process involves three division methods, namely: QT (Quad Tree, quad tree), BT (Binary Tree, binary tree), EQT (Extended Quad-Tree, extended quad tree) ). Among them, BT and EQT have the difference between horizontal division and vertical division. Therefore, BT is divided into HBT (Horizontal Binary Tree, horizontal binary tree) and VBT (Vertical Binary Tree, vertical binary tree), EQT is divided into HEQT (Horizontal Binary Tree, vertical binary tree) Extended Quad Tree, horizontally extended quad tree) and VEQT (Vertical Extended Quad Tree, vertically extended quad tree). Hereinafter, the three division manners will be described with reference to FIG. 3 .
针对某一编码图像块CTU,可以将该CTU作为根节点(root),划分成若干叶节点(Leaf Node)。一个节点对应于一个图像区域,如果某一节点不再继续划分,则将该节点称为叶节点,且其所对应的图像区域就形成一个CU;如果该节点继续划分,则可以采用上述一种划分方式或多种划分方式的组合将该节点划分为多个子区域,每个子区域对应一个子节点,之后需要分别确定这些子节点是否还会继续划分。示例性地,假设根节点的划分层级为0,则子节点的划分层级可以为父节点的划分层级加1。在视频编码过程中,通常编码器设置有CU的最小块尺寸,在划分过程中,若某一节点的块尺寸等于该最小块尺寸,则该节点默认不再继续划分。为了便于表述,下文中将“节点对应的图像区域”简称为“节点”。For a CTU of a coded image block, the CTU may be used as a root node (root) and divided into several leaf nodes (Leaf Node). A node corresponds to an image area. If a node is no longer divided, the node is called a leaf node, and the corresponding image area forms a CU; if the node continues to be divided, the above one can be used. The division method or a combination of multiple division methods divides the node into multiple sub-areas, each sub-area corresponds to a sub-node, and then it is necessary to separately determine whether these sub-nodes will continue to be divided. Exemplarily, assuming that the division level of the root node is 0, the division level of the child node may be the division level of the parent node plus 1. In the video encoding process, the encoder usually sets the minimum block size of the CU. During the division process, if the block size of a node is equal to the minimum block size, the node does not continue to divide by default. For convenience of description, the "image area corresponding to a node" is simply referred to as a "node" hereinafter.
1、QT划分。1. QT division.
针对某一节点,可以采用QT划分方式将该节点划分为4个子节点。如图3(a)所示,按照QT划分方式可以将该节点划分成四个相同块尺寸的子节点(各个子节点的宽相同、高也相同,且宽为划分前节点的宽的一半,高为划分前节点的高的一半)。For a certain node, the QT partitioning method can be used to divide the node into 4 sub-nodes. As shown in Figure 3(a), the node can be divided into four sub-nodes of the same block size according to the QT division method (the width and height of each sub-node are the same, and the width is half of the width of the node before division, The height is half of the height of the node before division).
例如,对于块尺寸为64×64的节点,若不再继续划分该节点,则该节点直接成为1个块尺寸为64×64的CU;若继续划分该节点,则可以按照QT划分方式将该节点划分为4个块尺寸为32×32的节点。对于这4个块尺寸为32×32的节点中的某一节点,若继续通过QT划分方式来划分该节点,则产生4个块尺寸为16×16的节点。For example, for a node with a block size of 64×64, if the node is no longer divided, the node will directly become a CU with a block size of 64×64; if the node is continued to be divided, it can be divided according to QT The nodes are divided into 4 nodes of block size 32×32. For a certain node among the 4 nodes with a block size of 32×32, if the node is further divided by the QT division method, 4 nodes with a block size of 16×16 are generated.
2、BT划分。2, BT division.
针对某一节点,可以采用BT划分方式将该节点划分成2个子节点。可选地,BT划分方式包括两种类型:HBT划分方式和VBT划分方式。其中,如图3(b)所示,HBT划分方式是将节点划分成上、下两个相同块尺寸的子节点(各个子节点的宽相同、高也相同,且宽等于划分前节点的宽,高为划分前节点的高的一半);如图3(c)所示,VBT是将节点划分成左、右两个相同大小的子节点(各个子节点的宽相同、高也相同,且宽为划分前节点的宽的一半,高等于划分前节点的高)。For a certain node, the BT division method can be used to divide the node into two sub-nodes. Optionally, the BT division manner includes two types: HBT division manner and VBT division manner. Among them, as shown in Figure 3(b), the HBT division method is to divide the node into two upper and lower sub-nodes of the same block size (the width and height of each sub-node are the same, and the width is equal to the width of the node before division). , the height is half of the height of the node before the division); as shown in Figure 3(c), VBT divides the node into left and right child nodes of the same size (each child node has the same width and height, and The width is half of the width of the node before division, and the height is equal to the height of the node before division).
例如,对于块尺寸为64×64的节点,若不再继续划分该节点,则该节点直接成为1个块尺寸为64×64的CU;若继续划分该节点,则可以按照HBT划分方式将该节点划分为2个块尺寸为64×32的节点,或者按照VBT划分方式将该节点划分为2个块尺寸为32×64的节点。For example, for a node with a block size of 64×64, if the node is no longer divided, the node will directly become a CU with a block size of 64×64; if the node is continued to be divided, it can be divided according to the HBT method The node is divided into 2 nodes with a block size of 64×32, or the node is divided into 2 nodes with a block size of 32×64 according to the VBT division method.
3、EQT划分。3. EQT division.
针对某一节点,可以采用EQT划分方式将该节点划分为4个子节点。可选地,EQT划分方式包括两种类型:HEQT划分方式和VEQT划分方式。其中,如图3(d)所示,HEQT划分方式是将节点划分为上、中、下三个子区域,并将中间的子区域水平划分为中左、中右两个子节点(上、下这两个子节点的宽等于划分前节点的宽、高为划分前节点的高的四分之一,中左、中右这两个子节点的宽为划分前节点的宽的一半、高也为划分前节点的高的一半);如图3(e)所示,VEQT划分方式是将节点划分为左、中、右三个子区域,并且将中间的子区域垂直划分为中上、中下这两个子节点(左、右这两个子节点的宽为划分前节点的宽的四分之一、高等于划分前节点的高,中上、中下这两个子节点的宽为划分前节点的宽的二分之一,高为划分前节点的高的二分之一)。For a certain node, the EQT division method can be used to divide the node into 4 sub-nodes. Optionally, the EQT division manner includes two types: the HEQT division manner and the VEQT division manner. Among them, as shown in Figure 3(d), the HEQT division method is to divide the nodes into three sub-regions: upper, middle, and lower, and divide the middle sub-region horizontally into two sub-nodes, middle-left and middle-right (upper, lower). The width of the two child nodes is equal to the width of the node before the division, and the height is a quarter of the height of the node before the division. half of the height of the node); as shown in Figure 3(e), the VEQT division method is to divide the node into three sub-regions, left, middle and right, and vertically divide the middle sub-region into two sub-regions, upper-middle and lower-middle sub-regions. Node (the width of the left and right child nodes is one-fourth of the width of the node before the division, the height is equal to the height of the node before the division, and the width of the upper middle and lower child nodes is two times the width of the node before the division. 1/2, and the height is 1/2 of the height of the node before division).
例如,对于块尺寸为64×64的节点,若不再继续划分该节点,则该节点直接成为1个块尺寸为64×64的CU;若继续划分该节点,则可以按照HEQT划分方式将该节点划分为4个节点,这4个节点的块尺寸分别为64×16、32×32、32×32以及64×16,或者按照VEQT划分方式将该节点划分为4个子节点,这4个子节点的块尺寸分别为16×64、32×32、32×32以及16×64。For example, for a node with a block size of 64×64, if the node is not further divided, the node will directly become a CU with a block size of 64×64; if the node is continued to be divided, it can be divided according to HEQT. The node is divided into 4 nodes, and the block sizes of these 4 nodes are 64×16, 32×32, 32×32, and 64×16, or the node is divided into 4 sub-nodes according to the VEQT division method. These 4 sub-nodes The block sizes are 16×64, 32×32, 32×32, and 16×64, respectively.
在一个示例中,为了进一步压缩视频,视频的块划分结构通过语法元素来指示。对应于上述图3所示的三种划分方式,定义了以下四个语法元素:In one example, to further compress the video, the block partitioning structure of the video is indicated by syntax elements. Corresponding to the three division methods shown in Figure 3 above, the following four syntax elements are defined:
1、qt_split_flag(QT划分标志):若qt_split_flag的值为“1”,则表示划分过程应使用QT划分进行划分;若qt_split_flag的值为“0”,则表示划分过程不应使用QT划分进行划分。1. qt_split_flag (QT division flag): If the value of qt_split_flag is "1", it means that the division process should use QT division for division; if the value of qt_split_flag is "0", it means that the division process should not use QT division for division.
2、bet_split_flag(BT/EQT划分标志):若bet_split_flag的值为“1”,则表示划分过程应使用BT/EQT划分进行划分;若bet_split_flag的值为“0”,则表示划分过程不应使用BT/EQT划分进行划分。2. bet_split_flag (BT/EQT division flag): If the value of bet_split_flag is "1", it means that the division process should use BT/EQT division for division; if the value of bet_split_flag is "0", it means that the division process should not use BT /EQT division to divide.
3、bet_split_type_flag(BT/EQT划分类型标志):若bet_split_type_flag的值为“0”,则表示进行BT/EQT划分时应使用BT划分;若bet_split_type_flag的值为“1”,则表示进行BT/EQT划分时应使用EQT划分。3. bet_split_type_flag (BT/EQT division type flag): if the value of bet_split_type_flag is "0", it means that BT division should be used for BT/EQT division; if the value of bet_split_type_flag is "1", it means that BT/EQT division is performed EQT division should be used.
4、bet_split_dir_flag(BT/EQT划分方向标志):若bet_split_type_flag的值为“1”,则表示进行BT/EQT划分时应使用垂直划分;若bet_split_type_flag的值为“0”,则表示进行BT/EQT划分时应使用水平划分。4. bet_split_dir_flag (BT/EQT division direction flag): If the value of bet_split_type_flag is "1", it means that vertical division should be used for BT/EQT division; if the value of bet_split_type_flag is "0", it means that BT/EQT division is performed Horizontal division should be used.
需要说明的一点是,上述语法元素的名称,以及各个语法元素的取值所代表的含义仅为一个示例,本领域技术人员在了解了本申请的技术方案后,将很容易想到其它的实现方式, 应理解,这些均应属于本申请的保护范围之内。例如,针对qt_split_flag,定义qt_split_flag的值为“1”时表示划分过程不应使用QT划分进行划分,qt_split_flag的值为“0”时表示划分过程应使用QT划分进行划分。It should be noted that the names of the above syntax elements and the meanings represented by the values of each syntax element are only an example, and those skilled in the art will easily think of other implementations after understanding the technical solutions of the present application. , it should be understood that these should all fall within the protection scope of the present application. For example, for qt_split_flag, when the value of qt_split_flag is defined as "1", it indicates that the division process should not use QT division for division, and when the value of qt_split_flag is "0", it indicates that the division process should use QT division for division.
由于不同语法元素的概率分布特性不同,为了进一步压缩视频,熵编码过程中为每个语法元素定义了多个上下文模型,通过上下文模型可以实现对语法元素进行概率估计。如下述表一所示,其示出了本申请一个实施例提供的语法元素与上下文模型之间的对应关系。Since the probability distribution characteristics of different syntax elements are different, in order to further compress the video, multiple context models are defined for each syntax element in the entropy coding process, and the probability estimation of the syntax elements can be realized through the context model. As shown in Table 1 below, it shows the correspondence between syntax elements and context models provided by an embodiment of the present application.
表一语法元素与上下文模型之间的对应关系Table 1 Correspondence between grammar elements and context models
语法元素syntax element ctxIdxIncctxIdxInc ctxIdxStartctxIdxStart ctx的数量number of ctx
qt_split_flagqt_split_flag 0、1、2、30, 1, 2, 3 1010 44
bet_split_flag bet_split_flag 0、1、2、3、4、5、6、7、80, 1, 2, 3, 4, 5, 6, 7, 8 1414 99
bet_split_type_flag bet_split_type_flag 0、1、20, 1, 2 23twenty three 33
bet_split_dir_flag bet_split_dir_flag 0、1、2、3、40, 1, 2, 3, 4 2626 55
通过ctxIdxInc(context index increments,上下文索引增量)和ctxIdxStart(context index Start,上下文起始索引)即可定位到语法元素所对应的上下文模型。例如,针对qt_split_flag,由于ctxIdxStart为10,在ctxIdxInc为1的情况下,上下文模型的索引即为11(10+1=11);在ctxIdxInc为3的情况下,上下文模型的索引即为13(10+3=13)。又例如,针对bet_split_type_flag,由于ctxIdxStart为23,在ctxIdxInc为1的情况下,上下文模型的索引即为24(23+1=24);在ctxIdxInc为2的情况下,上下文模型的索引即为25(23+2=25)。Through ctxIdxInc(context index increments, context index increment) and ctxIdxStart(context index Start, context start index), the context model corresponding to the grammar element can be located. For example, for qt_split_flag, since ctxIdxStart is 10, when ctxIdxInc is 1, the index of the context model is 11 (10+1=11); when ctxIdxInc is 3, the index of the context model is 13 (10 +3=13). For another example, for bet_split_type_flag, since ctxIdxStart is 23, when ctxIdxInc is 1, the index of the context model is 24 (23+1=24); when ctxIdxInc is 2, the index of the context model is 25 ( 23+2=25).
在一个示例中,可以根据各种局部信息进行上下文模型的选择,如当前编码单元的大小、相邻编码单元的大小、划分深度等。如图4所示,其示出了本申请一个实施例提供的当前编码单元(E)与相邻编码单元的空间位置关系。在一个示例中,结合图4,各个语法元素的ctxIdxInc的确定方式如下所示:In one example, the selection of the context model may be performed according to various local information, such as the size of the current coding unit, the size of adjacent coding units, and the split depth. As shown in FIG. 4 , it shows the spatial position relationship between the current coding unit (E) and adjacent coding units provided by an embodiment of the present application. In an example, with reference to FIG. 4 , the manner of determining the ctxIdxInc of each syntax element is as follows:
1、qt_split_flag的ctxIdxInc。1. ctxIdxInc of qt_split_flag.
根据以下方法确定qt_split_flag的ctxIdxInc:Determine the ctxIdxInc of qt_split_flag according to the following method:
(1)如果当前图像为帧内预测图像且E的宽度为128,则ctxIdxInc等于3;(1) If the current image is an intra-frame prediction image and the width of E is 128, then ctxIdxInc is equal to 3;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像”和“E的宽度为128”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(2) Otherwise (that is, the "if" described in (1) does not hold, or: the two conditions of "the current image is an intra-frame prediction image" and "the width of E is 128" are not satisfied at the same time), if A "Exists" and the height of A is less than the height of E, and B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 2;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像”和“E的宽度为128”这两个条件不同时成立,并且,“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(3) Otherwise (that is, the "if" described in (2) does not hold, or: "the current image is an intra-frame prediction image" and "the width of E is 128" are not satisfied at the same time, and, " A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" do not hold simultaneously), if A "exists" and the height of A is less than the height of E , or B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 1;
(4)否则(也即(3)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像”和“E的宽度为128”这两个条件不同时成立,并且,“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(4) Otherwise (that is, the "if" described in (3) does not hold, or: "the current image is an intra-frame prediction image" and "the width of E is 128" are not satisfied at the same time, and, " A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" both conditions are not true), then ctxIdxInc is equal to 0.
2、bet_split_flag的ctxIdxInc。2. ctxIdxInc of bet_split_flag.
根据以下方法确定bet_split_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_flag according to the following method:
首先:first:
(1)如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(1) If A "exists" and the height of A is less than the height of E, and B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 2;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(2) Otherwise (that is, the "if" stated in (1) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "These two conditions do not hold at the same time), if A "exists" and A's height is less than E's height, or B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 1;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(3) Otherwise (that is, the "if" stated in (2) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "Neither of these two conditions hold), then ctxIdxInc is equal to 0.
其次(在(1)、(2)、(3)中任一项的基础上,进一步地):Next (on the basis of any one of (1), (2), (3), and further):
(4)如果E的宽度乘以E的高度的积大于1024,则ctxIdxInc不变;(4) If the product of the width of E times the height of E is greater than 1024, then ctxIdxInc remains unchanged;
(5)否则(也即(4)中所述的“如果”不成立,或者说:“E的宽度乘以E的高度的积大于1024”不成立,E的宽度乘以E的高度的积小于或等于1024),如果E的宽度乘以E的高度的积大于256,则ctxIdxInc增加3;(5) Otherwise (that is, the "if" described in (4) does not hold, or: "The product of the width of E times the height of E is greater than 1024" does not hold, and the product of the width of E times the height of E is less than or Equal to 1024), if the product of the width of E times the height of E is greater than 256, then ctxIdxInc increases by 3;
(6)否则(也即(5)中所述的“如果”不成立,或者说:“E的宽度乘以E的高度的积大于1024”不成立,并且,“E的宽度乘以E的高度的积大于256”不成立,E的宽度乘以E的高度的积小于或等于256),ctxIdxInc增加6。(6) Otherwise (that is, the "if" described in (5) does not hold, or: "The product of the width of E times the height of E is greater than 1024" does not hold, and, "The width of E times the height of E is not true. If the product is greater than 256", the product of the width of E multiplied by the height of E is less than or equal to 256), and ctxIdxInc is increased by 6.
3、bet_split_type_flag的ctxIdxInc。3. ctxIdxInc of bet_split_type_flag.
根据以下方法确定bet_split_type_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_type_flag according to the following method:
(1)如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(1) If A "exists" and the height of A is less than the height of E, and B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 2;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(2) Otherwise (that is, the "if" stated in (1) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "These two conditions do not hold at the same time), if A "exists" and A's height is less than E's height, or B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 1;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(3) Otherwise (that is, the "if" stated in (2) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "Neither of these two conditions hold), then ctxIdxInc is equal to 0.
4、bet_split_dir_flag的ctxIdxInc。4. ctxIdxInc of bet_split_dir_flag.
根据以下方法确定bet_split_dir_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_dir_flag according to the following method:
(1)如果E的宽度为128且高度为64,则ctxIdxInc等于4;(1) If the width of E is 128 and the height is 64, then ctxIdxInc is equal to 4;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立),如果E的宽度为64且高度为128,则ctxIdxInc等于3;(2) Otherwise (that is, the "if" described in (1) does not hold, or: "E has a width of 128 and a height of 64" does not hold), if E has a width of 64 and a height of 128, then ctxIdxInc is equal to 3;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立,并且,“E的宽度为64且高度为128”不成立),如果E的高度大于E的宽度,则ctxIdxInc等于2;(3) Otherwise (that is, the "if" described in (2) does not hold, or: "E has a width of 128 and a height of 64" does not hold, and, "E has a width of 64 and a height of 128" does not hold) , if the height of E is greater than the width of E, then ctxIdxInc is equal to 2;
(4)否则(也即(3)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立,并且,“E的宽度为64且高度为128”不成立,并且,“E的高度大于E的宽度”不成立),如果E的宽度大于E的高度,则ctxIdxInc等于1;(4) Otherwise (that is, the "if" described in (3) does not hold, or: "E has a width of 128 and a height of 64" does not hold, and, "E has a width of 64 and a height of 128" does not hold, And, "the height of E is greater than the width of E" does not hold), if the width of E is greater than the height of E, then ctxIdxInc is equal to 1;
(5)否则(也即(4)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立,并且,“E的宽度为64且高度为128”不成立,并且,“E的高度大于E的宽度”不 成立,并且,“E的宽度大于E的高度”不成立,E的高度等于E的宽度),则ctxIdxInc等于0。(5) Otherwise (that is, the "if" described in (4) does not hold, or: "E has a width of 128 and a height of 64" does not hold, and, "E has a width of 64 and a height of 128" does not hold, And, "the height of E is greater than the width of E" does not hold, and, "the width of E is greater than the height of E" does not hold, the height of E is equal to the width of E), then ctxIdxInc is equal to 0.
可选地,语法元素上下文模型的选择条件越精确,视频的压缩效果越好。基于此,本申请实施例提供了一种上下文模型的选择方法,通过根据某一编码单元的参考编码单元,预测该编码单元的块划分结构,然后在语法元素的上下文模型的选择过程中,添加该编码单元的块划分结构的预测结果,从而可以增加上下文模型的选择条件或优化上下文模型的选择条件,以提升熵编码的效率、减少码流的比特数。并且,由于本申请实施例只添加了块划分结构的预测结果即可获得更为精确的概率估计,有助于视频压缩效率的提升。Optionally, the more precise the selection condition of the syntax element context model, the better the video compression effect. Based on this, an embodiment of the present application provides a method for selecting a context model, by predicting the block division structure of a coding unit according to a reference coding unit of a coding unit, and then adding a The prediction result of the block division structure of the coding unit can increase the selection condition of the context model or optimize the selection condition of the context model, so as to improve the efficiency of entropy coding and reduce the number of bits of the code stream. In addition, since only the prediction result of the block division structure is added in the embodiment of the present application, a more accurate probability estimation can be obtained, which is helpful to improve the video compression efficiency.
需要说明的一点是,本申请实施例提供的上下文模型的选择方法可以应用于AVS(如AVS3)或者下一代视频编解码标准中,本申请实施例对此不作限定。It should be noted that the context model selection method provided by the embodiments of the present application may be applied to AVS (eg, AVS3) or the next-generation video codec standard, which is not limited in the embodiments of the present application.
请参考图5,其示出了本申请一个实施例提供的通信***的简化框图。通信***200包括多个设备,所述设备可通过例如网络250彼此通信。举例来说,通信***200包括通过网络250互连的第一设备210和第二设备220。在图5的实施例中,第一设备210和第二设备220执行单向数据传输。举例来说,第一设备210可对视频数据例如由第一设备210采集的视频图片流进行编码以通过网络250传输到第二设备220。已编码的视频数据以一个或多个已编码视频码流形式传输。第二设备220可从网络250接收已编码视频数据,对已编码视频数据进行解码以恢复视频数据,并根据恢复的视频数据显示视频图片。单向数据传输在媒体服务等应用中是较常见的。Please refer to FIG. 5, which shows a simplified block diagram of a communication system provided by an embodiment of the present application. Communication system 200 includes a plurality of devices that can communicate with each other via, for example, network 250 . For example, the communication system 200 includes a first device 210 and a second device 220 interconnected by a network 250 . In the embodiment of FIG. 5, the first device 210 and the second device 220 perform unidirectional data transfer. For example, the first device 210 may encode video data, such as a stream of video pictures captured by the first device 210, for transmission to the second device 220 over the network 250. The encoded video data is transmitted in one or more encoded video streams. The second device 220 may receive encoded video data from the network 250, decode the encoded video data to restore the video data, and display a video picture according to the restored video data. One-way data transfer is common in applications such as media services.
在另一实施例中,通信***200包括执行已编码视频数据的双向传输的第三设备230和第四设备240,所述双向传输可例如在视频会议期间发生。对于双向数据传输,第三设备230和第四设备240中的每个设备可对视频数据(例如由设备采集的视频图片流)进行编码,以通过网络250传输到第三设备230和第四设备240中的另一设备。第三设备230和第四设备240中的每个设备还可接收由第三设备230和第四设备240中的另一设备传输的已编码视频数据,且可对所述已编码视频数据进行解码以恢复视频数据,且可根据恢复的视频数据在可访问的显示装置上显示视频图片。In another embodiment, the communication system 200 includes a third device 230 and a fourth device 240 that perform bi-directional transmission of encoded video data, which may occur, for example, during a video conference. For bidirectional data transfer, each of the third device 230 and the fourth device 240 may encode video data (eg, a stream of video pictures captured by the device) for transmission to the third device 230 and the fourth device over the network 250 Another device in 240. Each of third device 230 and fourth device 240 may also receive encoded video data transmitted by the other of third device 230 and fourth device 240, and may decode the encoded video data to restore the video data, and a video picture can be displayed on an accessible display device according to the restored video data.
在图5的实施例中,第一设备210、第二设备220、第三设备230和第四设备240可为服务器、个人计算机和智能电话等计算机设备,但本申请公开的原理可不限于此。本申请实施例适用于PC(Personal Computer,个人计算机)、手机、平板电脑、媒体播放器和/或专用视频会议设备。网络250表示在第一设备210、第二设备220、第三设备230和第四设备240之间传送已编码视频数据的任何数目的网络,包括例如有线连线的和/或无线通信网络。通信网络250可在电路交换和/或分组交换信道中交换数据。该网络可包括电信网络、局域网、广域网和/或互联网。出于本申请的目的,除非在下文中有所解释,否则网络250的架构和拓扑对于本申请公开的操作来说可能是无关紧要的。In the embodiment of FIG. 5 , the first device 210 , the second device 220 , the third device 230 and the fourth device 240 may be computer devices such as servers, personal computers and smart phones, but the principles disclosed in this application may not be limited thereto. The embodiments of the present application are applicable to a PC (Personal Computer, personal computer), a mobile phone, a tablet computer, a media player, and/or a dedicated video conference device. Network 250 represents any number of networks that communicate encoded video data between first device 210, second device 220, third device 230, and fourth device 240, including, for example, wired and/or wireless communication networks. Communication network 250 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the Internet. For the purposes of this application, unless explained below, the architecture and topology of network 250 may be immaterial to the operations disclosed herein.
作为实施例,图6示出视频编码器和视频解码器在流式传输环境中的放置方式。本申请所公开主题可同等地适用于其它支持视频的应用,包括例如视频会议、数字TV(电视)、在包括CD(Compact Disc,光盘)、DVD(Digital Versatile Disc,数字通用光盘)、存储棒等的数字介质上存储压缩视频等等。As an example, Figure 6 shows the placement of video encoders and video decoders in a streaming environment. The subject matter disclosed in this application is equally applicable to other video-enabled applications, including, for example, videoconferencing, digital TV (television), CD (Compact Disc), DVD (Digital Versatile Disc), memory stick Compressed video, etc., are stored on digital media such as
流式传输***可包括采集子***313,所述采集子***可包括数码相机等视频源301,所 述视频源创建未压缩的视频图片流302。在实施例中,视频图片流302包括由数码相机拍摄的样本。相较于已编码的视频数据304(或已编码的视频码流),视频图片流302被描绘为粗线以强调高数据量的视频图片流,视频图片流302可由电子装置320处理,所述电子装置320包括耦接到视频源301的视频编码器303。视频编码器303可包括硬件、软件或软硬件组合以实现或实施如下文更详细地描述的所公开主题的各方面。视频编码器303可以是计算机设备,该计算机设备是指具备数据计算、处理和存储能力的电子设备,如PC、手机、平板电脑、媒体播放器、专用视频会议设备、服务器等等。基于本申请所提供方法的视频编码器303,可以由1个或多个处理器或是1个或多个集成电路来实现。The streaming system may include a capture subsystem 313, which may include a video source 301, such as a digital camera, that creates a stream 302 of uncompressed video pictures. In an embodiment, the video picture stream 302 includes samples captured by a digital camera. Compared to encoded video data 304 (or encoded video bitstream), video picture stream 302 is depicted as a thick line to emphasize the high data volume of video picture stream 302 that can be processed by electronic device 320, which Electronic device 320 includes video encoder 303 coupled to video source 301 . Video encoder 303 may include hardware, software, or a combination of hardware and software to implement or implement various aspects of the disclosed subject matter as described in greater detail below. The video encoder 303 may be a computer device, which refers to an electronic device with data computing, processing, and storage capabilities, such as a PC, a mobile phone, a tablet computer, a media player, a dedicated video conference device, a server, and the like. The video encoder 303 based on the method provided in this application may be implemented by one or more processors or one or more integrated circuits.
相较于视频图片流302,已编码的视频数据304(或已编码的视频码流304)被描绘为细线以强调较低数据量的已编码的视频数据304(或已编码的视频码流304),其可存储在流式传输服务器305上以供将来使用。一个或多个流式传输客户端子***,例如图6中的客户端子***306和客户端子***308,可访问流式传输服务器305以检索已编码的视频数据304的副本307和副本309。客户端子***306可包括例如电子装置330中的视频解码器310。视频解码器310对已编码的视频数据的传入副本307进行解码,且产生可在显示器312(例如显示屏)或另一呈现装置(未描绘)上呈现的输出视频图片流311。在一些流式传输***中,可根据某些视频编码/压缩标准对已编码的视频数据304、副本307和副本309(例如视频码流)进行编码。Compared to video picture stream 302, encoded video data 304 (or encoded video codestream 304) is depicted as a thin line to emphasize the lower amount of encoded video data 304 (or encoded video codestream) 304), which can be stored on the streaming server 305 for future use. One or more streaming client subsystems, such as client subsystem 306 and client subsystem 308 in FIG. 6 , may access streaming server 305 to retrieve copies 307 and 309 of encoded video data 304 . Client subsystem 306 may include, for example, video decoder 310 in electronic device 330 . Video decoder 310 decodes incoming copy 307 of the encoded video data and produces output video picture stream 311 that can be rendered on display 312 (eg, a display screen) or another rendering device (not depicted). In some streaming systems, the encoded video data 304, replicas 307, and replicas 309 (eg, video streams) may be encoded according to certain video encoding/compression standards.
应注意,电子装置320和电子装置330可包括其它组件(未示出)。举例来说,电子装置320可包括视频解码器(未示出),且电子装置330还可包括视频编码器(未示出)。其中,视频解码器用于对接收到的已编码视频数据进行解码;视频编码器用于对视频数据进行编码。It should be noted that electronic device 320 and electronic device 330 may include other components (not shown). For example, electronic device 320 may include a video decoder (not shown), and electronic device 330 may also include a video encoder (not shown). The video decoder is used for decoding the received encoded video data; the video encoder is used for encoding the video data.
下面,通过几个实施例对本申请技术方案进行介绍说明。Hereinafter, the technical solutions of the present application will be introduced and described through several embodiments.
请参考图7,其示出了本申请一个实施例提供的上下文模型的选择方法的流程图。该方法可应用于对视频序列进行编码的设备中,如图5所示的通信***中的第一设备210中;也可以应用于对已编码的视频数据进行解码以恢复视频序列的设备中,如图5所示的通信***中的第二设备220中。该方法可以包括如下几个步骤(步骤710~730)。Please refer to FIG. 7 , which shows a flowchart of a context model selection method provided by an embodiment of the present application. The method can be applied to a device for encoding a video sequence, such as the first device 210 in the communication system as shown in FIG. 5; it can also be applied to a device that decodes encoded video data to restore the video sequence, In the second device 220 in the communication system as shown in FIG. 5 . The method may include the following steps (steps 710-730).
步骤710,确定目标编码单元的参考编码单元。Step 710: Determine the reference coding unit of the target coding unit.
目标编码单元是指视频编解码过程中的待处理图像单元,其可以是当前待处理图像单元,也可以是当前待处理图像单元之后的待处理图像单元。由上述“块划分结构”的介绍说明可知,在视频压缩过程中,视频信号中的一帧图像可以被分割成互不重叠的CTU,通常,可以将CTU称为待处理图像单元,也即目标编码单元。本申请实施例对目标编码单元的形状不作限定,可选地,目标编码单元为正方形,也即,目标编码单元的宽和高相等;或者,目标编码单元为长方形,也即,目标编码单元的宽和高不相等。本申请实施例对目标编码单元的块尺寸大小不作限定,可选地,目标编码单元的尺寸为64×64或128×128或128×64,实际应用中,目标编码单元的尺寸可以结合视频编码器所允许的最大编码单元的块尺寸确定,例如,视频编码器所允许的最大编码单元的块尺寸为128×128,则目标编码单元的块尺寸小于或等于128×128。The target coding unit refers to the to-be-processed image unit in the video encoding and decoding process, which may be the current to-be-processed image unit, or the to-be-processed image unit after the current to-be-processed image unit. It can be seen from the above description of the "block division structure" that in the process of video compression, a frame of image in the video signal can be divided into non-overlapping CTUs. Usually, the CTU can be called the image unit to be processed, that is, the target. coding unit. The embodiment of the present application does not limit the shape of the target coding unit. Optionally, the target coding unit is a square, that is, the width and height of the target coding unit are equal; or, the target coding unit is a rectangle, that is, the target coding unit is Width and height are not equal. This embodiment of the present application does not limit the block size of the target coding unit. Optionally, the size of the target coding unit is 64×64 or 128×128 or 128×64. In practical applications, the size of the target coding unit can be combined with video coding. The block size of the maximum coding unit allowed by the video encoder is determined. For example, if the block size of the maximum coding unit allowed by the video encoder is 128×128, the block size of the target coding unit is less than or equal to 128×128.
参考编码单元用于为目标编码单元的块划分提供参考,其中,块划分是指对编码单元的结构进行划分。本申请实施例对参考编码单元与目标编码单元之间的块尺寸关系不作限定,可选地,参考编码单元的块尺寸可以等于目标编码单元的块尺寸,也可以不等于目标编码单 元的块尺寸,如小于目标编码单元的块尺寸。本申请实施例对参考编码单元与目标编码单元之间的位置关系不作限定,可选地,参考编码单元与目标编码单元是同一帧图像中相邻的编码单元;或者,参考编码单元与目标编码单元是同一帧图像中不相邻的编码单元;或者,参考编码单元与目标编码单元是不同帧的图像中的编码单元,且参考编码单元在其所在帧中的相对位置与目标编码单元在其所在帧中的相对位置相同,或者,参考编码单元在其所在帧中的相对位置与目标编码单元在其所在帧中的相对位置不相同;或者,参考编码单元是缓存中存储的编码单元等等。有关参考编码单元的位置等其它介绍说明,请参见下述实施例,此处不多赘述。The reference coding unit is used to provide a reference for the block division of the target coding unit, where the block division refers to dividing the structure of the coding unit. This embodiment of the present application does not limit the block size relationship between the reference coding unit and the target coding unit. Optionally, the block size of the reference coding unit may be equal to the block size of the target coding unit, or may not be equal to the block size of the target coding unit , such as smaller than the block size of the target coding unit. This embodiment of the present application does not limit the positional relationship between the reference coding unit and the target coding unit. Optionally, the reference coding unit and the target coding unit are adjacent coding units in the same frame image; or, the reference coding unit and the target coding unit The unit is a non-adjacent coding unit in the same frame of image; or, the reference coding unit and the target coding unit are coding units in images of different frames, and the relative position of the reference coding unit in the frame where the reference coding unit is located is the same as that of the target coding unit. The relative position in the frame where it is located is the same, or the relative position of the reference coding unit in the frame where it is located is different from the relative position of the target coding unit in the frame where it is located; or the reference coding unit is the coding unit stored in the cache, etc. . For other introductory descriptions such as the location of the reference coding unit, please refer to the following embodiments, and details are not repeated here.
由上述介绍说明可知,在视频压缩过程中,视频信号中的一帧图像可以被分割成互不重叠的CTU,而CTU再往下,可以继续进行更加精细的划分,得到一个或多个基本编码的单元,之后即基于这一个或多个基本编码的单元进行视频编码。本申请实施例中,将CTU划分为一个或多个基本编码的单元的过程称为编码单元的结构划分过程,也即块划分。在对目标编码单元的语法元素进行概率估计的过程中,也即上下文模型的选择过程中,加入目标编码单元的块划分结构,可以在减少增加的上下文模型的选择条件的同时,提升概率估计的精确程度。然而,考虑到视频解码器在解译视频码流的过程中无法获取当前编码单元以及当前编码单元之后的编码单元的内容信息,从而视频解码器将无法获取目标编码单元的块划分结构,因此,本申请实施例中,在上下文模型的选择过程中,通过已经完成编码过程或已经完成重建过程的编码单元(参考编码单元)的块划分结构,预测目标编码单元的块划分结构,并使用预测的块划分结构进行上下文模型的选择。It can be seen from the above description that in the process of video compression, a frame of image in the video signal can be divided into non-overlapping CTUs, and further CTUs can be further divided into more finely divided to obtain one or more basic codes. , and then video coding is performed based on the one or more basic coded units. In the embodiment of the present application, the process of dividing the CTU into one or more basic coding units is called the structure division process of coding units, that is, block division. In the process of estimating the probability of the syntax elements of the target coding unit, that is, the selection process of the context model, adding the block division structure of the target coding unit can reduce the selection conditions of the increased context model and improve the probability estimation. degree of precision. However, considering that the video decoder cannot obtain the content information of the current coding unit and the coding units after the current coding unit in the process of decoding the video code stream, the video decoder will not be able to obtain the block division structure of the target coding unit. Therefore, In this embodiment of the present application, in the selection process of the context model, the block division structure of the target coding unit is predicted by the block division structure of the coding unit (reference coding unit) that has completed the encoding process or the reconstruction process, and uses the predicted block division structure. The block partition structure performs the selection of the context model.
需要说明的一点是,在本申请的技术方案应用于视频编码过程的情况下,参考编码单元称为已经完成编码过程的编码单元;在本申请的技术方案应用于视频解码过程的情况下,参考编码单元称为已经完成重建过程的编码单元。为了确保视频解码器和视频编码器针对目标编码单元的块划分结构的预测结果是相同的,本申请实施例中,视频解码器和视频编码器所采用的参考编码单元也需保持一致,也即,视频解码器和视频编码器需要采用位置信息相同的参考编码单元。可选地,视频编码器可以将参考编码单元的位置信息写入视频码流,进而视频解码器在解译视频码流时即可按照解译的位置信息确定参考编码单元;或者,预先定义参考编码单元的位置信息,视频编码器和视频解码器均按照预先定义的位置信息来确定参考编码单元;或者,视频编码器和视频解码器采用相同的确定条件来确定参考编码单元。It should be noted that, when the technical solution of the present application is applied to the video encoding process, the reference coding unit is called the coding unit that has completed the encoding process; when the technical solution of the present application is applied to the video decoding process, the reference A coding unit is referred to as a coding unit that has completed the reconstruction process. In order to ensure that the prediction results of the video decoder and the video encoder for the block division structure of the target coding unit are the same, in this embodiment of the present application, the reference coding units used by the video decoder and the video encoder also need to be consistent, that is, , the video decoder and the video encoder need to use reference coding units with the same location information. Optionally, the video encoder can write the position information of the reference coding unit into the video code stream, and then the video decoder can determine the reference coding unit according to the decoded position information when deciphering the video code stream; For the position information of the coding unit, both the video encoder and the video decoder determine the reference coding unit according to the pre-defined position information; or, the video encoder and the video decoder use the same determination condition to determine the reference coding unit.
在一个示例中,参考编码单元的数量为大于或等于2的正整数;上述步骤710之后,还包括:确定各个参考编码单元的率失真代价;根据各个参考编码单元的率失真代价,从至少两个参考编码单元中选择优选编码单元,优选编码单元的块划分结构用于预测目标编码单元的块划分结构。In an example, the number of reference coding units is a positive integer greater than or equal to 2; after step 710, the method further includes: determining the rate-distortion cost of each reference coding unit; according to the rate-distortion cost of each reference coding unit, from at least two A preferred coding unit is selected from the reference coding units, and the block division structure of the preferred coding unit is used to predict the block division structure of the target coding unit.
由于在对目标编码单元进行块划分的过程中,也会比较各种划分方式下,目标编码单元的率失真(Rate-Distortion,RD)代价。对于一个有损的视频编码过程,码率与失真通常为负相关,更高的压缩率带来更低码率的同时,也会增加失真,反之亦然。为此,做出了折中的选择,考虑率失真代价。通常,按照率失真代价最小的块划分结构来划分目标编码单元。因此,为了更加准确地预测目标编码单元的块划分结构,本申请实施例中,在确定参考编码单元之后,进一步比较各个参考编码单元的率失真代价,并从中选择率失真代价较小的参考编码单元,作为优选编码单元,以采用该优选编码单元的块划分结构预测目标编码单元的块划 分结构。In the process of dividing the target coding unit into blocks, the rate-distortion (Rate-Distortion, RD) cost of the target coding unit under various division modes is also compared. For a lossy video encoding process, bit rate and distortion are usually negatively correlated, and higher compression rates bring lower bit rates, but also increase distortion, and vice versa. To this end, a compromise choice is made, considering the rate-distortion cost. Generally, the target coding unit is divided according to the block division structure with the least rate-distortion cost. Therefore, in order to more accurately predict the block division structure of the target coding unit, in this embodiment of the present application, after the reference coding unit is determined, the rate-distortion cost of each reference coding unit is further compared, and a reference code with a smaller rate-distortion cost is selected from among them. The unit, as a preferred coding unit, predicts the block division structure of the target coding unit by adopting the block division structure of the preferred coding unit.
可选地,优选编码单元的数量为一个;或者,优选编码单元的数量为多个。本申请实施例对优选编码单元的选择方式不作限定,可选地,在确定各个参考编码单元的率失真代价后,可以按照率失真代价从小至大的顺序对参考编码单元进行排序,并将前S位的参考编码单元作为优选编码单元,S为正整数;或者,在确定各个参考编码单元的率失真代价后,可以将率失真代价小于预设阈值的参考编码单元作为优选编码单元。Optionally, it is preferable that the number of coding units is one; or, it is preferable that the number of coding units is multiple. This embodiment of the present application does not limit the selection method of the preferred coding unit. Optionally, after the rate-distortion cost of each reference coding unit is determined, the reference coding units may be sorted according to the order of the rate-distortion cost from the smallest to the largest. The reference coding unit with S bits is used as the preferred coding unit, and S is a positive integer; or, after the rate-distortion cost of each reference coding unit is determined, the reference coding unit whose rate-distortion cost is less than the preset threshold may be used as the preferred coding unit.
由于参考编码单元为已经完成编码过程的编码单元或已经完成重建过程的编码单元,因此,参考编码单元的内容信息可以被视频编码器以及视频解码器获知。在确定了参考编码单元之后,进一步获取参考编码单元的内容信息,如参考编码单元的各个像素的Y/U/V分量,以为目标编码单元的块划分结构的预测提供参考。Since the reference coding unit is the coding unit that has completed the encoding process or the coding unit that has completed the reconstruction process, the content information of the reference coding unit can be known by the video encoder and the video decoder. After the reference coding unit is determined, content information of the reference coding unit, such as Y/U/V components of each pixel of the reference coding unit, is further obtained, to provide reference for the prediction of the block division structure of the target coding unit.
本申请实施例对参考编码单元的内容信息的获取方式不作限定,可选地,本申请实施例针对编码单元分配了一定的内存空间,以存储这些编码单元的内容信息,在需要访问某一编码单元的内容信息时,读取该编码单元对应的内存位置上存储的信息即可。可选地,为了明确各个编码单元的内存位置,可以确定各个编码单元的索引值,并将索引值与内存位置之间对应起来,以便于后续根据某一编码单元的索引值读取对应的内存位置上存储的内容信息,进而获取该编码单元的内容信息。The embodiment of the present application does not limit the acquisition method of the content information of the reference coding unit. Optionally, the embodiment of the present application allocates a certain memory space for the coding unit to store the content information of these coding units. When the content information of the unit is to be read, the information stored in the memory location corresponding to the encoding unit can be read. Optionally, in order to clarify the memory location of each coding unit, the index value of each coding unit can be determined, and the index value and the memory location are corresponded, so that the corresponding memory can be read subsequently according to the index value of a certain coding unit. The content information stored in the location, and then the content information of the coding unit is obtained.
步骤720,根据参考编码单元,对目标编码单元的块划分结构进行预测,得到目标编码单元的块划分预测结构。Step 720: Predict the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit.
获取到参考编码单元之后,即可根据参考编码单元预测目标编码单元的块划分结构,得到目标编码单元的块划分预测结果。本申请实施例对预测目标编码单元的块划分结构的方式不作限定,可选地,通过训练完成的深度学习模型对参考编码单元进行处理,得到目标编码单元的块划分预测结构,通过训练完成的深度学习模型来预测块划分结构,可以快速进行块划分结构的预测,并提升预测兼容性,避免针对每一个待处理的图像单元,均需要执行复杂地预测过程;或者,通过对参考编码单元与目标编码单元之间进行相似度匹配,并将相似度最高的参考编码单元的块划分结构作为块划分预测结构;或者,通过比较参考编码单元的率失真代价,将率失真代价最小的参考编码单元的块划分结构作为块划分预测结构。有关通过深度学习模型来预测目标编码单元的块划分结构的过程请参见下述方法实施例,此处不多赘述。After the reference coding unit is obtained, the block division structure of the target coding unit can be predicted according to the reference coding unit, and the block division prediction result of the target coding unit can be obtained. The embodiment of the present application does not limit the method of predicting the block division structure of the target coding unit. Optionally, the reference coding unit is processed by the deep learning model completed by training to obtain the block division prediction structure of the target coding unit. The deep learning model to predict the block division structure can quickly predict the block division structure, and improve the prediction compatibility, avoiding the need to perform a complex prediction process for each image unit to be processed; Similarity matching is performed between target coding units, and the block division structure of the reference coding unit with the highest similarity is used as the block division prediction structure; The block partition structure of is used as the block partition prediction structure. For the process of predicting the block division structure of the target coding unit by using the deep learning model, please refer to the following method embodiments, and details are not repeated here.
步骤730,基于目标编码单元的块划分预测结构,确定目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型。Step 730: Determine, based on the block division prediction structure of the target coding unit, a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit.
由上述介绍说明可知,语法元素用于指示编码单元的块划分结构,上下文模型用于对语法元素进行概率估计。在对目标编码单元的语法元素进行上下文模型的选择过程中,加入目标编码单元的块划分结构,可以在减少增加的上下文模型的选择条件的同时,节省语法元素传输所需的码流比特数,提升编码的效率。因此,本申请实施例在预测得到目标编码单元的块划分预测结构后,在语法元素的上下文模型选择过程中增加块划分预测结构,以提升熵编码的效率、减少码流的比特数。It can be seen from the above description that the syntax element is used to indicate the block division structure of the coding unit, and the context model is used to estimate the probability of the syntax element. In the process of selecting the context model for the syntax elements of the target coding unit, adding the block division structure of the target coding unit can reduce the selection conditions of the increased context model and save the number of code stream bits required for the transmission of the syntax elements. Improve coding efficiency. Therefore, after the block division prediction structure of the target coding unit is predicted and obtained, the block division prediction structure is added in the context model selection process of the syntax element, so as to improve the efficiency of entropy coding and reduce the number of bits of the code stream.
本申请实施例对块划分预测结构在上下文模型的选择过程中添加方式不作限定,可选地,块划分预测结构可以作为上下文模型选择过程中增加的进一步的选择条件,也可以与原有的上下文模型的选择条件进行融合,以优化原有的上下文模型的选择条件。有关块划分预测结 构在上下文模型的选择过程中添加方式的介绍说明,请参见下述方法实施例,此处不多赘述。This embodiment of the present application does not limit the manner in which the block division prediction structure is added in the context model selection process. The selection conditions of the model are fused to optimize the selection conditions of the original context model. For an introduction and description of how the block division prediction structure is added in the context model selection process, please refer to the following method embodiments, and details are not repeated here.
在一个示例中,目标编码单元的块划分所涉及的至少一个语法元素包括:第一语法元素,用于指示是否采用第一划分方式对目标编码单元进行块划分,如上述实施例中的“qt_split_flag”;第二语法元素,用于指示是否采用第二划分方式和/或第三划分方式对目标编码单元进行块划分,如上述实施例中的“bet_split_flag”;第三语法元素,用于在第二语法元素指示采用第二划分方式或第三划分方式对目标编码单元进行块划分的情况下,指示对目标编码单元进行块划分采用的是第二划分方式还是第三划分方式,如上述实施例中的“bet_split_type_flag”;第四语法元素,用于指示采用第二划分方式或第三划分方式对目标编码单元进行块划分的情况下,第二划分方式或第三划分方式的划分方向,如上述实施例中的“bet_split_dir_flag”。有关各个语法元素的值的含义,可以参考上述实施例的介绍说明,此处不多赘述。可选地,第一划分方式包括QT划分;和/或,第二划分方式包括BT划分和/或TT(Ternary Tree,三叉树)划分;和/或,第三划分方式包括EQT划分。需要说明的一点是,上述第一划分方式包括QT划分、第二划分方式包括BT划分和/或TT划分、第三划分方式包括EQT划分这三个条件可以不同时成立,例如,在第一划分方式包括QT、第二划分方式包括BT的情况下,第三划分方式也可以不包括EQT。应理解,这些技术方案属于本申请的保护范围之内。In an example, the at least one syntax element involved in the block division of the target coding unit includes: a first syntax element, which is used to indicate whether to use the first division method to perform block division on the target coding unit, such as "qt_split_flag" in the above embodiment "; the second syntax element, used to indicate whether to use the second division mode and/or the third division mode to perform block division on the target coding unit, such as "bet_split_flag" in the above-mentioned embodiment; the third syntax element, used in the first The second syntax element indicates that when the target coding unit is divided into blocks by the second division method or the third division method, it indicates whether the second division method or the third division method is used for the block division of the target coding unit, as in the above-mentioned embodiment. "bet_split_type_flag" in; the fourth syntax element, which is used to indicate the division direction of the second division method or the third division method when the target coding unit is divided by the second division method or the third division method, as described above "bet_split_dir_flag" in Example. For the meaning of the value of each syntax element, reference may be made to the description of the above-mentioned embodiment, and details are not repeated here. Optionally, the first division manner includes QT division; and/or, the second division manner includes BT division and/or TT (Ternary Tree, ternary tree) division; and/or, the third division manner includes EQT division. It should be noted that, the above three conditions may not be satisfied at the same time that the first division method includes QT division, the second division method includes BT division and/or TT division, and the third division method includes EQT division. When the mode includes QT and the second division mode includes BT, the third division mode may not include EQT. It should be understood that these technical solutions fall within the protection scope of the present application.
需要说明的一点是,在视频编码或视频解码的过程中,可以预先定义一个选择方式标志(flag),如spf_flag,以用来指示是否采用本申请实施例所述的上下文模型的选择方法。以选择方式标志为spf_flag为例,若spf_flag=1,则执行本申请实施例所述的上下文模型的选择方法;若spf_flag=0,则不执行本申请实施例所述的上下文模型的选择方法,如采用原有或相关技术中的上下文模型的选择方法。通过定义选择方式标志,可以提升本申请实施例所提供的技术方案的兼容性,避免由于某些视频编码器或视频解码器无法支持本申请实施例所提供的技术方案时导致上下文模型选择失败,进而避免导致影响视频压缩的效率。It should be noted that, in the process of video encoding or video decoding, a selection mode flag (flag), such as spf_flag, may be pre-defined to indicate whether to adopt the context model selection method described in the embodiments of the present application. Taking the selection mode flag as spf_flag as an example, if spf_flag=1, the selection method of the context model described in the embodiment of the present application is executed; if spf_flag=0, the selection method of the context model described in the embodiment of the present application is not executed, For example, the selection method of the context model in the original or related technologies is adopted. By defining the selection mode flag, the compatibility of the technical solutions provided by the embodiments of the present application can be improved, and the context model selection failure due to some video encoders or video decoders being unable to support the technical solutions provided by the embodiments of the present application can be avoided, In order to avoid affecting the efficiency of video compression.
综上所述,本申请实施例提供的技术方案,通过根据某一编码单元的参考编码单元,预测该编码单元的块划分结构,然后在语法元素的上下文模型的选择过程中,添加该编码单元的块划分结构的预测结果,从而可以增加上下文模型的选择条件或优化上下文模型的选择条件,以提升熵编码的效率、减少码流的比特数。并且,由于本申请实施例只添加了块划分结构的预测结果即可获得更为精确的概率估计,有助于视频压缩效率的提升。To sum up, the technical solutions provided by the embodiments of the present application predict the block division structure of a coding unit according to the reference coding unit of the coding unit, and then add the coding unit during the selection process of the context model of the syntax element. Therefore, the selection conditions of the context model can be increased or the selection conditions of the context model can be optimized, so as to improve the efficiency of entropy coding and reduce the number of bits of the code stream. In addition, since only the prediction result of the block division structure is added in the embodiment of the present application, a more accurate probability estimation can be obtained, which is helpful to improve the video compression efficiency.
本申请实施例针对参考编码单元的确定过程,提供了多种确定方式。下面,针对这些确定方式进行介绍说明。The embodiments of the present application provide a variety of determination methods for the determination process of the reference coding unit. Hereinafter, these determination methods will be introduced and explained.
在一个示例中,上述步骤710,包括:获取目标视频帧中满足目标条件的编码单元,目标视频帧为目标编码单元所在的视频帧;从满足目标条件的编码单元中选择与目标编码单元相邻的编码单元,作为参考编码单元。In an example, the above-mentioned step 710 includes: obtaining a coding unit that satisfies the target condition in the target video frame, where the target video frame is the video frame where the target coding unit is located; and selecting the coding units that meet the target condition adjacent to the target coding unit , as the reference coding unit.
空间上与目标编码单元相邻的编码单元可以作为参考编码单元。可选地,空间上与目标编码单元相邻的编码单元与目标编码单元位于相同的视频帧,因此,需要先确定目标编码单元所在的视频帧,即目标视频帧。然后,需要从目标视频帧包括的编码单元中选择满足目标条件的编码单元,其中,针对视频编码过程,目标条件包括已经完成编码过程;针对视频解码过程,目标条件包括已经完成重建过程。之后,从满足目标条件的编码单元中选择与目标编码单元相邻的编码单元作为参考编码单元即可。可选地,与目标编码单元相邻的编码单元 包括目标编码单元的左边的编码单元、上方的编码单元以及左上方的编码单元。例如,如图4所示,假设目标编码单元为编码单元E,则与目标编码单元相邻的编码单元包括:编码单元A和编码单元B。A coding unit spatially adjacent to the target coding unit may be used as a reference coding unit. Optionally, the coding units that are spatially adjacent to the target coding unit are located in the same video frame as the target coding unit. Therefore, it is necessary to first determine the video frame where the target coding unit is located, that is, the target video frame. Then, a coding unit that satisfies the target condition needs to be selected from the coding units included in the target video frame, wherein, for the video encoding process, the target condition includes that the encoding process has been completed; for the video decoding process, the target condition includes that the reconstruction process has been completed. After that, the coding unit adjacent to the target coding unit may be selected as the reference coding unit from the coding units that satisfy the target condition. Optionally, the coding units adjacent to the target coding unit include the left coding unit, the upper coding unit and the upper left coding unit of the target coding unit. For example, as shown in FIG. 4 , assuming that the target coding unit is the coding unit E, the coding units adjacent to the target coding unit include: coding unit A and coding unit B.
在另一个示例中,上述步骤710,包括:确定目标编码单元在目标视频帧中的位置信息,目标视频帧为目标编码单元所在的视频帧;获取目标视频帧的至少一个相邻视频帧;将至少一个相邻视频帧中满足位置信息的编码单元,确定为参考编码单元。In another example, the above step 710 includes: determining the position information of the target coding unit in the target video frame, where the target video frame is the video frame where the target coding unit is located; obtaining at least one adjacent video frame of the target video frame; A coding unit satisfying the position information in at least one adjacent video frame is determined as a reference coding unit.
时间上与目标编码单元相邻的编码单元可以作为参考编码单元。可选地,时间上与目标编码单元相邻的编码单元与目标编码单元位于不同的视频帧,但是,这些编码单元在其所在的视频帧中的位置信息,与目标编码单元在其所在的视频帧中的位置信息相一致。因此,需要先确定目标编码单元所在的视频帧,也即,目标视频帧。然后,需要先确定目标编码单元在目标视频帧中的位置信息,并获取与目标视频帧相邻的至少一个视频帧,也即,至少一个相邻视频帧。之后将至少一个相邻视频帧中与位置信息一致的编码单元,确定为参考编码单元,也即,将至少一个相邻视频帧中与目标编码单元的位置信息相对应的位置上的编码单元确定为参考编码单元。A coding unit that is temporally adjacent to the target coding unit may be used as a reference coding unit. Optionally, the coding units adjacent to the target coding unit in time are located in different video frames from the target coding unit, but the position information of these coding units in the video frame where they are located is different from the video frame where the target coding unit is located. The position information in the frame is consistent. Therefore, it is necessary to first determine the video frame where the target coding unit is located, that is, the target video frame. Then, it is necessary to first determine the position information of the target coding unit in the target video frame, and acquire at least one video frame adjacent to the target video frame, that is, at least one adjacent video frame. Then, determine the coding unit that is consistent with the position information in at least one adjacent video frame as the reference coding unit, that is, determine the coding unit at the position corresponding to the position information of the target coding unit in at least one adjacent video frame is the reference coding unit.
可选地,针对视频编码过程,相邻视频帧包括已经完成编码过程的视频帧;针对视频解码过程,相邻视频帧包括已经完成重建过程的视频帧。通过采用已经完成编码过程的视频帧和已经完成重建过程的视频帧作为目标视频帧的相邻视频帧,可以确保相邻视频帧中存在与目标编码单元的位置信息对应的编码单元。本申请实施例中,相邻视频帧既可以是目标视频帧之前的视频帧,也可以是目标视频帧之后的视频帧,实际应用中,相邻视频帧与目标视频帧之间的时间先后关系,可以参考帧间预测的参考方式,在帧间预测采用前向参考的情况下,相邻视频帧为目标视频帧之前的视频帧;在帧间预测采用后向参考的情况下,相邻视频帧为目标视频帧之后的视频帧。本申请实施例对相邻视频帧的数量不作限定,可选地,相邻视频帧的数量与帧间预测参考的视频帧的帧数相同;或者,相邻视频帧的数量为预设数量。Optionally, for the video encoding process, the adjacent video frames include video frames for which the encoding process has been completed; for the video decoding process, the adjacent video frames include video frames for which the reconstruction process has been completed. By using the video frame that has completed the encoding process and the video frame that has completed the reconstruction process as adjacent video frames of the target video frame, it can be ensured that there are coding units corresponding to the position information of the target coding unit in the adjacent video frames. In this embodiment of the present application, the adjacent video frame may be either a video frame before the target video frame or a video frame after the target video frame. In practical applications, the time sequence relationship between adjacent video frames and the target video frame , you can refer to the reference method of inter-frame prediction. In the case of inter-frame prediction using forward reference, the adjacent video frame is the video frame before the target video frame; in the case of inter-frame prediction using backward reference, the adjacent video frame frame is the video frame following the target video frame. This embodiment of the present application does not limit the number of adjacent video frames. Optionally, the number of adjacent video frames is the same as the number of video frames referenced by inter-frame prediction; or, the number of adjacent video frames is a preset number.
在又一个示例中,上述步骤710,包括:获取缓存中存储的至少一个编码单元;将获取的至少一个编码单元,确定为参考编码单元。In yet another example, the above step 710 includes: acquiring at least one coding unit stored in the cache; and determining the acquired at least one coding unit as a reference coding unit.
针对视频编码过程,视频编码器的缓存中通常存储有最近已经完成编码过程的至少一个编码单元;针对视频解码过程,视频解码器的缓存中通常存储有最近已经完成重建过程的至少一个解码单元。由于最近已经完成编码过程的编码单元或最近已经完成重建过程的编码单元,在空间位置上通常与目标编码单元相邻,因此,缓存中存储的编码单元也可以作为目标编码单元的参考编码单元。基于此,需要先获取缓存中存储的至少一个编码单元,并将获取到的至少一个编码单元,确定为参考编码单元。For the video encoding process, the buffer of the video encoder usually stores at least one coding unit that has recently completed the encoding process; for the video decoding process, the buffer of the video decoder usually stores at least one decoding unit that has recently completed the reconstruction process. Since the coding unit that has recently completed the coding process or the coding unit that has recently completed the reconstruction process is usually adjacent to the target coding unit in spatial position, the coding unit stored in the cache can also be used as the reference coding unit of the target coding unit. Based on this, at least one coding unit stored in the cache needs to be acquired first, and the acquired at least one coding unit is determined as a reference coding unit.
需要说明的一点是,以上针对参考编码单元的确定方式的介绍说明,仅是为了便于描述,而分别进行介绍说明,实际应用中,也可以结合以上至少两种参考编码单元的确定方式来确定参考编码单元。例如,结合空间上与目标编码单元相邻的编码单元和时间上与目标编码单元相邻的编码单元,来确定参考编码单元;或者,结合空间上与目标编码单元相邻的编码单元和缓存中存储的编码单元,来确定参考编码单元。应理解,这些均应属于本申请的保护范围之内。It should be noted that the above descriptions for the determination methods of reference coding units are only for the convenience of description, and they are introduced and described separately. coding unit. For example, the reference coding unit is determined by combining the coding unit adjacent to the target coding unit in space and the coding unit adjacent to the target coding unit in time; Stored coding units to determine reference coding units. It should be understood that these should all fall within the protection scope of the present application.
综上所述,本申请实施例提供的技术方案,通过获取对目标编码单元的块划分结构具有参考价值的编码单元,并根据这些编码单元确定参考编码单元,可以提升目标编码单元的块 划分结构预测的准确性。并且,本申请实施例中,参考编码单元为已经完成编码过程或已经完成重建过程的编码单元,进而在预测目标编码单元的块划分结构的过程中,可以充分利用已知信息和先验知识,避免这些信息资源的浪费,提升信息资源的利用率。此外,本申请实施例提供了多种参考编码单元的确定方式,提升了参考编码单元确定的灵活性。To sum up, the technical solutions provided by the embodiments of the present application can improve the block division structure of the target coding unit by acquiring coding units that have reference value for the block division structure of the target coding unit, and determining the reference coding unit according to these coding units. Prediction accuracy. Moreover, in the embodiment of the present application, the reference coding unit is a coding unit that has completed the coding process or has completed the reconstruction process, and then in the process of predicting the block division structure of the target coding unit, known information and prior knowledge can be fully utilized, Avoid the waste of these information resources and improve the utilization rate of information resources. In addition, the embodiments of the present application provide a variety of ways for determining the reference coding unit, which improves the flexibility of determining the reference coding unit.
下面,针对通过深度学习模型来预测目标编码单元的块划分结构的过程进行介绍说明。The following describes the process of predicting the block division structure of the target coding unit by using the deep learning model.
在一个示例中,上述步骤720,包括:调用结构预测模型对参考编码单元进行处理,得到目标编码单元的块划分预测结构。In an example, the above step 720 includes: invoking the structure prediction model to process the reference coding unit to obtain the block division prediction structure of the target coding unit.
结构预测模型用于预测编码单元的块划分结构,本申请实施例中结构预测模型为深度学习模型,如卷积神经网络模型。请参考图8,其示出了本申请一个实施例提供的结构预测模型的示意图。从图8中可见,结构预测模型将目标编码单元的块划分结构的预测过程转换为多个二分类问题,确定某一级划分中的某一语法元素的值为1还是为0的过程即可看作一个二分类问题。The structure prediction model is used to predict the block division structure of the coding unit. In this embodiment of the present application, the structure prediction model is a deep learning model, such as a convolutional neural network model. Please refer to FIG. 8 , which shows a schematic diagram of a structure prediction model provided by an embodiment of the present application. As can be seen from Figure 8, the structure prediction model converts the prediction process of the block division structure of the target coding unit into multiple binary classification problems, and the process of determining whether the value of a certain syntax element in a certain level of division is 1 or 0 can be Consider it a binary classification problem.
本申请实施例中,结构预测模型既可以直接预测目标编码单元的块划分结构,也可以间接预测目标编码单元的块划分结构。直接预测目标编码单元的块划分结构是指结构预测模型的输出向量中的元素分别代表各个语法元素的取值,例如,图8所示的结构预测模型即可根据输入的参考编码单元,直接预测得到目标编码单元的各级(不同划分深度)划分所涉及的语法元素的取值。间接预测目标编码单元的块划分结构是指结构预测模型的输出向量有其它的物理含义,需要将该输出向量进一步转换以确定各级划分所涉及的语法元素的取值。如图9所示,结构预测模型根据输入的参考编码单元,可以预测得到目标编码单元中各个CU边缘划分的概率,然后根据各个CU边缘划分的概率,可以进一步推理得到目标编码单元的块划分预测结构。In this embodiment of the present application, the structure prediction model can either directly predict the block division structure of the target coding unit, or indirectly predict the block division structure of the target coding unit. Directly predicting the block division structure of the target coding unit means that the elements in the output vector of the structure prediction model represent the values of each syntax element respectively. For example, the structure prediction model shown in FIG. 8 can directly predict the input reference coding unit according to the input The values of the syntax elements involved in the division of the target coding unit at all levels (different division depths) are obtained. Indirectly predicting the block division structure of the target coding unit means that the output vector of the structure prediction model has other physical meanings, and the output vector needs to be further converted to determine the values of the syntax elements involved in the division at all levels. As shown in Figure 9, the structure prediction model can predict the probability of each CU edge division in the target coding unit according to the input reference coding unit, and then further infer the block division prediction of the target coding unit according to the probability of each CU edge division structure.
需要说明的一点是,本申请实施例中,仅以结构预测模型的输入包括参考编码单元的块划分结构为例进行介绍说明,实际应用中,结构预测模型的输入还可以包括其它编码信息,如QP(Quantization Parameter,量化参数)信息、预测(帧内预测或帧间预测)模式信息等,应理解,这些均应属于本申请的保护范围之内。It should be noted that, in the embodiments of the present application, the input of the structure prediction model only includes the block division structure of the reference coding unit as an example for description. In practical applications, the input of the structure prediction model may also include other coding information, such as QP (Quantization Parameter, quantization parameter) information, prediction (intra-frame prediction or inter-frame prediction) mode information, etc., it should be understood that these should all fall within the protection scope of the present application.
下面,以结构预测模型的输入包括参考编码单元(如参考编码单元的各个像素的Y/U/V分量)为例,对结构预测模型的训练过程进行介绍说明。Hereinafter, the training process of the structure prediction model will be described by taking as an example that the input of the structure prediction model includes a reference coding unit (eg, the Y/U/V components of each pixel of the reference coding unit).
在一个示例中,结构预测模型的训练过程包括如下几个步骤:In one example, the training process of the structure prediction model includes the following steps:
(1)获取至少一个训练样本。(1) Obtain at least one training sample.
训练样本用于结构预测模型的训练过程,本申请实施例对训练样本的数量不作限定,实际应用中,可以结合结构预测模型的训练设备的处理能力以及结构预测模型的预测准确性等因素来确定训练样本的数量。本申请实施例中,每个训练样本包括第一编码单元的块划分结构和第二编码单元,第二编码单元为第一编码单元的参考编码单元。其中,第二编码单元的内容信息(如第二编码单元的各个像素的Y/U/V分量)和第一编码单元的块划分结构均为能够获取的信息,也即,在获取训练样本之前,已经完成对第一编码单元和第二编码单元的块划分过程。The training samples are used in the training process of the structure prediction model. The number of the training samples is not limited in the embodiment of the present application. In practical applications, it can be determined by combining factors such as the processing capability of the training equipment for the structure prediction model and the prediction accuracy of the structure prediction model. The number of training samples. In this embodiment of the present application, each training sample includes a block division structure of the first coding unit and a second coding unit, and the second coding unit is a reference coding unit of the first coding unit. The content information of the second coding unit (such as the Y/U/V components of each pixel of the second coding unit) and the block division structure of the first coding unit are both obtainable information, that is, before obtaining the training samples , the block division process of the first coding unit and the second coding unit has been completed.
可选地,在获取训练样本的过程中,第一编码单元的参考编码单元的确定方式,与后续使用结构预测模型预测目标编码单元的块划分结构的过程中,目标编码单元的参考编码单元的确定方式一致。通常确保模型训练过程和使用过程中参考编码单元采用相同的确定方式来 确定,可以提升结构预测模型的预测准确性。Optionally, in the process of acquiring the training samples, the determination method of the reference coding unit of the first coding unit is the same as the subsequent process of using the structure prediction model to predict the block division structure of the target coding unit, the reference coding unit of the target coding unit. Determine the same way. It is usually ensured that the reference coding unit is determined in the same way in the model training process and in the use process, which can improve the prediction accuracy of the structural prediction model.
(2)调用结构预测模型对第二编码单元进行处理,得到第一编码单元的块划分预测结构。(2) Invoking the structure prediction model to process the second coding unit to obtain the block division prediction structure of the first coding unit.
在结构预测模型的训练过程中,可以先预设结构预测模型的各项参数,以构建结构预测模型,然后,调用结构预测模型对第二编码单元(如第二编码单元的各个像素的Y/U/V分量)进行处理,以预测第一编码单元的块划分结构,得到第一编码单元的块划分预测结构。In the training process of the structure prediction model, various parameters of the structure prediction model can be preset to construct the structure prediction model, and then the structure prediction model is called to perform the second coding unit (such as Y// of each pixel of the second coding unit). U/V components) to predict the block division structure of the first coding unit to obtain the block division prediction structure of the first coding unit.
结构预测模型训练过程中所需的基本事实(Ground Truth),也即,各个编码单元的块划分结构,需要各个编码单元遍历所有划分方式,并比较各种划分方式的率失真代价后,确定最终的块划分结构,并且该块划分结构的格式需要符合结构预测模型的输出格式。以结构预测模型直接预测得到目标编码单元的块划分结构为例,该结构预测模型的基本事实的构建过程如图10所示。由于结构预测模型输出的通道数需要固定,因此,需要预先定义结构预测模型预测的划分深度,并补齐每个语法元素值。如图10所示,数值100即为补齐语法元素值时增加的无效语法元素值。结构预测模型的输出是包括这些无效语法元素值的,此时,可以根据图2所示的编码单元的划分流程去除无效语法元素值,即可推理得到符合预设格式的码流以用于后续的上下文模型的选择过程。The basic fact (Ground Truth) required in the training process of the structure prediction model, that is, the block division structure of each coding unit, requires each coding unit to traverse all the division methods, and compare the rate-distortion costs of the various division methods to determine the final and the format of the block partition structure needs to conform to the output format of the structure prediction model. Taking the block division structure of the target coding unit directly predicted by the structure prediction model as an example, the construction process of the basic fact of the structure prediction model is shown in FIG. 10 . Since the number of channels output by the structure prediction model needs to be fixed, it is necessary to predefine the division depth predicted by the structure prediction model, and fill in the value of each syntax element. As shown in FIG. 10 , the value 100 is the invalid syntax element value added when the syntax element value is complemented. The output of the structure prediction model includes these invalid syntax element values. At this time, the invalid syntax element values can be removed according to the division process of the coding unit shown in FIG. 2, and the code stream that conforms to the preset format can be obtained by inference for subsequent use. The selection process of the context model.
(3)根据第一编码单元的块划分预测结构和第一编码单元的块划分结构,计算结构预测模型的预测损失值。(3) Calculate the prediction loss value of the structure prediction model according to the block division prediction structure of the first coding unit and the block division structure of the first coding unit.
在通过结构预测模型输出第一编码单元的块划分预测结构之后,即可根据第一编码单元的块划分预测结构和第一编码单元的块划分结构,计算结构预测模型的预测损失值,该预测损失值用于指示第一编码单元的块划分预测结构和第一编码单元的块划分结构之间的误差。可选地,结构预测模型的预测损失值可以通过分类网络损失函数得到,如交叉熵等。After the block division prediction structure of the first coding unit is output through the structure prediction model, the prediction loss value of the structure prediction model can be calculated according to the block division prediction structure of the first coding unit and the block division structure of the first coding unit. The loss value is used to indicate an error between the block split prediction structure of the first coding unit and the block split structure of the first coding unit. Optionally, the predicted loss value of the structure prediction model can be obtained through a classification network loss function, such as cross entropy.
(4)根据预测损失值,调整结构预测模型的参数。(4) Adjust the parameters of the structural prediction model according to the predicted loss value.
由上述介绍说明可知,在结构预测模型的训练过程中,是预先定义了结构预测模型的各项参数,后续在计算出预测损失值之后,即可根据该预测损失值调整结构预测模型的各项参数,以使得预测损失值进入收敛范围,从而完成对结构预测模型的训练过程,得到训练完成的结构预测模型。It can be seen from the above description that in the training process of the structure prediction model, various parameters of the structure prediction model are pre-defined. parameters, so that the predicted loss value enters the convergence range, so as to complete the training process of the structure prediction model, and obtain the trained structure prediction model.
综上所述,本申请实施例提供的技术方案,通过深度学习模型处理参考编码单元,以预测待处理的编码单元的块划分结构,由于在预测过程中,直接将参考编码单元输入深度学习模型,即可得到待处理的编码单元的块划分结构,因而可以有效提升块划分结构的预测过程的速度。并且,由于深度学习模型可以多次、持续使用,其可以用于预测各种不同的编码单元的块划分结构,因而提升了本申请实施例所提供的技术方案的兼容性。To sum up, in the technical solutions provided by the embodiments of the present application, the reference coding unit is processed by the deep learning model to predict the block division structure of the coding unit to be processed, because in the prediction process, the reference coding unit is directly input into the deep learning model , the block division structure of the coding unit to be processed can be obtained, and thus the speed of the prediction process of the block division structure can be effectively improved. Moreover, since the deep learning model can be used repeatedly and continuously, it can be used to predict the block division structure of various coding units, thus improving the compatibility of the technical solutions provided by the embodiments of the present application.
下面,针对块划分预测结构在上下文模型的选择过程中添加方式进行介绍说明。In the following, the method of adding the block division prediction structure in the selection process of the context model will be described.
在一个示例中,上述步骤730包括:对于至少一个语法元素中的目标语法元素,根据目标编码单元的块划分预测结构,确定目标语法元素的预测值;根据目标语法元素的预测值,确定目标语法元素采用的上下文模型的索引增量值,索引增量值用于指示上下文模型。In an example, the above step 730 includes: for a target syntax element in at least one syntax element, determining the prediction value of the target syntax element according to the block division prediction structure of the target coding unit; determining the target syntax element according to the prediction value of the target syntax element The index increment value of the context model that the element adopts, the index increment value is used to indicate the context model.
从目标编码单元的块划分预测结构中,视频编码器和视频解码器可以推理得到目标语法元素的预测值(allowSplitRef)。例如,在通过结构预测模型直接预测目标编码单元的块划分结构时,结构预测模型的输出向量中的元素分别代表各个语法元素的取值,从而基于结构预测模型的输出,即可确定目标语法元素的预测值。又例如,在通过结构预测模型间接预测目标编码单元的块划分结构时,结构预测模型的输出向量有其它的物理含义,将该输出向量进 一步转换即可确定各级划分所涉及的语法元素的取值,从而基于结构预测模型的输出,也可以确定目标语法元素的预测值。From the block split prediction structure of the target coding unit, the video encoder and the video decoder can infer the predicted value (allowSplitRef) of the target syntax element. For example, when the block division structure of the target coding unit is directly predicted by the structure prediction model, the elements in the output vector of the structure prediction model respectively represent the values of each syntax element, so that the target syntax element can be determined based on the output of the structure prediction model. predicted value. For another example, when the block division structure of the target coding unit is indirectly predicted by the structure prediction model, the output vector of the structure prediction model has other physical meanings, and the output vector can be further converted to determine the grammatical elements involved in the division at all levels. value, so that based on the output of the structural prediction model, the predicted value of the target syntax element can also be determined.
基于目标语法元素的预测值(allowSplitRef),可以确定目标语法元素采用的上下文模型的索引增量值(ctxIdxInc),该索引增量值结合索引起始值(ctxIdxStart)即可确定上下文模型的索引。本申请实施例对索引增量值的确定方式不作限定,块划分预测结构可以作为上下文模型选择过程中增加的进一步的选择条件(也即,作为索引增量值确定过程中增加的进一步的选择条件),也可以与原有的上下文模型的选择条件进行融合(也即,与原有的索引增量值确定过程进行融合),以优化原有的上下文模型的选择条件。Based on the predicted value (allowSplitRef) of the target syntax element, the index increment value (ctxIdxInc) of the context model adopted by the target syntax element can be determined, and the index increment value can be combined with the index start value (ctxIdxStart) to determine the index of the context model. This embodiment of the present application does not limit the method for determining the index increment value, and the block division prediction structure can be used as a further selection condition added in the context model selection process (that is, as a further selection condition added in the index increment value determination process ), and can also be fused with the selection conditions of the original context model (that is, fused with the original index increment value determination process) to optimize the selection conditions of the original context model.
首先,介绍说明块划分预测结构可以作为上下文模型选择过程中增加的进一步的选择条件。First, it is introduced that the block partition prediction structure can be used as a further selection condition added in the context model selection process.
在一个示例中,上述根据目标语法元素的预测值,确定目标语法元素采用的上下文模型的索引增量值,索引增量值用于指示上下文模型,包括:根据目标语法元素的预测值和目标语法元素采用的上下文模型的初始索引增量值,确定目标语法元素采用的上下文模型的索引增量值。或者说,上述步骤730,包括:对于至少一个语法元素中的目标语法元素,根据目标编码单元的块划分预测结构,确定目标语法元素的预测值;获取目标语法元素采用的上下文模型的初始索引增量值;根据目标语法元素的预测值和目标语法元素采用的上下文模型的初始索引增量值,确定目标语法元素采用的上下文模型的索引增量值。In an example, according to the predicted value of the target syntax element, the index increment value of the context model adopted by the target syntax element is determined, and the index increment value is used to indicate the context model, including: according to the predicted value of the target syntax element and the target syntax element The initial index increment value of the context model adopted by the element determines the index increment value of the context model adopted by the target syntax element. In other words, the above step 730 includes: for the target syntax element in at least one syntax element, according to the block division prediction structure of the target coding unit, determining the prediction value of the target syntax element; obtaining the initial index increment of the context model adopted by the target syntax element. Quantity; according to the predicted value of the target syntax element and the initial index increment value of the context model adopted by the target syntax element, the index increment value of the context model adopted by the target syntax element is determined.
一方面,从目标编码单元的块划分预测结构中可以推理得到目标语法元素的预测值(allowSplitRef);另一方面,通过上述实施例有关语法元素采用的上下文模型的索引增量值的确定过程,可以得到目标语法元素采用的上下文模型的初始索引增量值。之后再进一步根据目标语法元素的预测值和初始索引增量值,确定目标语法元素最终采用的上下文模型的索引增量值(ctxIdxInc)。本申请实施例对目标语法元素的预测值(allowSplitRef)的取值范围不作限定,可选地,目标语法元素的预测值(allowSplitRef)的取值包括0或1。On the one hand, the predicted value (allowSplitRef) of the target syntax element can be obtained by inference from the block division prediction structure of the target coding unit; The initial index increment value of the context model adopted by the target syntax element can be obtained. Then, according to the predicted value of the target syntax element and the initial index increment value, the index increment value (ctxIdxInc) of the context model finally adopted by the target syntax element is determined. This embodiment of the present application does not limit the value range of the predicted value (allowSplitRef) of the target syntax element. Optionally, the value of the predicted value (allowSplitRef) of the target syntax element includes 0 or 1.
下面,结合图4和本申请实施例所述的上下文模型的选择方法,分别针对qt_split_flag的ctxIdxInc、bet_split_flag的ctxIdxInc、bet_split_type_flag的ctxIdxInc以及bet_split_dir_flag的ctxIdxInc进行示例性地介绍说明。4 and the context model selection method described in the embodiments of the present application, the ctxIdxInc of qt_split_flag, the ctxIdxInc of bet_split_flag, the ctxIdxInc of bet_split_type_flag, and the ctxIdxInc of bet_split_dir_flag are respectively introduced and explained exemplarily.
1、qt_split_flag的ctxIdxInc。1. ctxIdxInc of qt_split_flag.
根据以下方法确定qt_split_flag的ctxIdxInc:Determine the ctxIdxInc of qt_split_flag according to the following method:
首先:first:
(1)如果当前图像为帧内预测图像且E的宽度为128,则ctxIdxInc等于3;(1) If the current image is an intra-frame prediction image and the width of E is 128, then ctxIdxInc is equal to 3;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像”和“E的宽度为128”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(2) Otherwise (that is, the "if" described in (1) does not hold, or: the two conditions of "the current image is an intra-frame prediction image" and "the width of E is 128" are not satisfied at the same time), if A "Exists" and the height of A is less than the height of E, and B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 2;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像”和“E的宽度为128”这两个条件不同时成立,并且,“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(3) Otherwise (that is, the "if" described in (2) does not hold, or: "the current image is an intra-frame prediction image" and "the width of E is 128" are not satisfied at the same time, and, " A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" do not hold simultaneously), if A "exists" and the height of A is less than the height of E , or B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 1;
(4)否则(也即(3)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像”和“E的宽度为128”这两个条件不同时成立,并且,“A‘存在’且A的高度小于E的高度” 和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(4) Otherwise (that is, the "if" described in (3) does not hold, or: "the current image is an intra-frame prediction image" and "the width of E is 128" are not satisfied at the same time, and, " A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" both conditions are not established), then ctxIdxInc is equal to 0.
其次(在(1)、(2)、(3)中任一项的基础上,进一步地):Next (on the basis of any one of (1), (2), (3), and further):
ctxIdxInc增加4*allowSplitRef。ctxIdxInc adds 4*allowSplitRef.
2、bet_split_flag的ctxIdxInc。2. ctxIdxInc of bet_split_flag.
根据以下方法确定bet_split_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_flag according to the following method:
首先:first:
(1)如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(1) If A "exists" and A's height is less than E's height, and B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 2;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(2) Otherwise (that is, the "if" stated in (1) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "These two conditions do not hold at the same time), if A "exists" and A's height is less than E's height, or B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 1;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(3) Otherwise (that is, the "if" stated in (2) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "Neither of these two conditions hold), then ctxIdxInc is equal to 0.
其次(在(1)、(2)、(3)中任一项的基础上,进一步地):Next (on the basis of any one of (1), (2), (3), and further):
(4)如果E的宽度乘以E的高度的积大于1024,则ctxIdxInc不变;(4) If the product of the width of E times the height of E is greater than 1024, then ctxIdxInc remains unchanged;
(5)否则(也即(4)中所述的“如果”不成立,或者说:“E的宽度乘以E的高度的积大于1024”不成立,E的宽度乘以E的高度的积小于或等于1024),如果E的宽度乘以E的高度的积大于256,则ctxIdxInc增加3;(5) Otherwise (that is, the "if" described in (4) does not hold, or: "The product of the width of E times the height of E is greater than 1024" does not hold, and the product of the width of E times the height of E is less than or Equal to 1024), if the product of the width of E times the height of E is greater than 256, then ctxIdxInc increases by 3;
(6)否则(也即(5)中所述的“如果”不成立,或者说:“E的宽度乘以E的高度的积大于1024”不成立,并且,“E的宽度乘以E的高度的积大于256”不成立,E的宽度乘以E的高度的积小于或等于256),ctxIdxInc增加6。(6) Otherwise (that is, the "if" described in (5) does not hold, or: "The product of the width of E times the height of E is greater than 1024" does not hold, and, "The width of E times the height of E is not true. If the product is greater than 256", the product of the width of E multiplied by the height of E is less than or equal to 256), and ctxIdxInc is increased by 6.
最后(在(4)、(5)、(6)中任一项的基础上,进一步地):Finally (on the basis of any of (4), (5), (6), and further):
ctxIdxInc增加9*allowSplitRef。ctxIdxInc increased 9*allowSplitRef.
3、bet_split_type_flag的ctxIdxInc。3. ctxIdxInc of bet_split_type_flag.
根据以下方法确定bet_split_type_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_type_flag according to the following method:
首先:first:
(1)如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(1) If A "exists" and A's height is less than E's height, and B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 2;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(2) Otherwise (that is, the "if" stated in (1) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "These two conditions do not hold at the same time), if A "exists" and A's height is less than E's height, or B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 1;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(3) Otherwise (that is, the "if" stated in (2) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "Neither of these two conditions hold), then ctxIdxInc is equal to 0.
其次(在(1)、(2)、(3)中任一项的基础上,进一步地):Next (on the basis of any one of (1), (2), (3), and further):
ctxIdxInc增加3*allowSplitRef。ctxIdxInc adds 3*allowSplitRef.
4、bet_split_dir_flag的ctxIdxInc。4. ctxIdxInc of bet_split_dir_flag.
根据以下方法确定bet_split_dir_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_dir_flag according to the following method:
首先:first:
(1)如果E的宽度为128且高度为64,则ctxIdxInc等于4;(1) If the width of E is 128 and the height is 64, then ctxIdxInc is equal to 4;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立),如果E的宽度为64且高度为128,则ctxIdxInc等于3;(2) Otherwise (that is, the "if" described in (1) does not hold, or: "E has a width of 128 and a height of 64" does not hold), if E has a width of 64 and a height of 128, then ctxIdxInc is equal to 3;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立,并且,“E的宽度为64且高度为128”不成立),如果E的高度大于E的宽度,则ctxIdxInc等于2;(3) Otherwise (that is, the "if" described in (2) does not hold, or: "E has a width of 128 and a height of 64" does not hold, and, "E has a width of 64 and a height of 128" does not hold) , if the height of E is greater than the width of E, then ctxIdxInc is equal to 2;
(4)否则(也即(3)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立,并且,“E的宽度为64且高度为128”不成立,并且,“E的高度大于E的宽度”不成立),如果E的宽度大于E的高度,则ctxIdxInc等于1;(4) Otherwise (that is, the "if" described in (3) does not hold, or: "E has a width of 128 and a height of 64" does not hold, and, "E has a width of 64 and a height of 128" does not hold, And, "the height of E is greater than the width of E" does not hold), if the width of E is greater than the height of E, then ctxIdxInc is equal to 1;
(5)否则(也即(4)中所述的“如果”不成立,或者说:“E的宽度为128且高度为64”不成立,并且,“E的宽度为64且高度为128”不成立,并且,“E的高度大于E的宽度”不成立,并且,“E的宽度大于E的高度”不成立,E的高度等于E的宽度),则ctxIdxInc等于0。(5) Otherwise (that is, the "if" described in (4) does not hold, or: "E has a width of 128 and a height of 64" does not hold, and, "E has a width of 64 and a height of 128" does not hold, And, "the height of E is greater than the width of E" does not hold, and, "the width of E is greater than the height of E" does not hold, the height of E is equal to the width of E), then ctxIdxInc is equal to 0.
其次(在(1)、(2)、(3)、(4)、(5)中任一项的基础上,进一步地):Next (on the basis of any one of (1), (2), (3), (4), (5), and further):
ctxIdxInc增加5*allowSplitRef。ctxIdxInc increased 5*allowSplitRef.
由上述示例可见,通过块划分预测结构作为上下文模型选择过程中增加的进一步的选择条件,可以拓展原有的各个语法元素可以采用的上下文模型,例如针对qt_split_flag这一语法元素,在allowSplitRef(预测值)为1的情况下,增加了4种上下文模型,分别为ctxIdxStart为10,ctxIdxInc为4、5、6、7的上下文模型(也即,索引分别为14、15、16、17这四个上下文模型)。又例如针对bet_split_dir_flag这一语法元素,在allowSplitRef(预测值)为1的情况下,增加了5种上下文模型,分别为ctxIdxStart为26,ctxIdxInc为5、6、7、8、9的上下文模型(也即,索引分别为31、32、33、34、35这四个上下文模型)。通过拓展上下文模型的数量,可以使得语法元素的概率估计更为精确,从而降低码率的比特数。It can be seen from the above example that by using the block division prediction structure as a further selection condition added in the context model selection process, the context model that can be used by the original syntax elements can be expanded. For example, for the syntax element qt_split_flag, in allowSplitRef (prediction value ) is 1, four context models are added, which are the context models with ctxIdxStart as 10 and ctxIdxInc as 4, 5, 6, and 7 (that is, the four contexts with indexes 14, 15, 16, and 17 respectively). Model). For another example, for the syntax element bet_split_dir_flag, when allowSplitRef (predicted value) is 1, 5 context models are added, ctxIdxStart is 26, ctxIdxInc is 5, 6, 7, 8, 9 context models (also That is, the indices are 31, 32, 33, 34, and 35 of the four context models). By expanding the number of context models, the probability estimation of syntax elements can be made more accurate, thereby reducing the number of bits of the code rate.
其次,介绍说明块划分预测结构与原有的上下文模型的选择条件进行融合,以优化原有的上下文模型的选择条件。Secondly, it introduces and explains that the block division prediction structure is integrated with the selection conditions of the original context model to optimize the selection conditions of the original context model.
在另一个示例中,上述根据目标语法元素的预测值,确定目标语法元素采用的上下文模型的索引增量值,索引增量值用于指示上下文模型,包括:根据目标语法元素的预测值和目标语法元素采用的上下文模型的索引增量值的确定条件,确定目标语法元素采用的上下文模型的索引增量值。或者说,上述步骤730,包括:对于至少一个语法元素中的目标语法元素,根据目标编码单元的块划分预测结构,确定目标语法元素的预测值;获取目标语法元素采用的上下文模型的索引增量值的确定条件;根据目标语法元素的预测值和确定条件,确定目标语法元素采用的上下文模型的索引增量值。In another example, according to the predicted value of the target syntax element, the index increment value of the context model adopted by the target syntax element is determined, and the index increment value is used to indicate the context model, including: according to the predicted value of the target syntax element and the target The determination condition of the index increment value of the context model adopted by the syntax element determines the index increment value of the context model adopted by the target syntax element. In other words, the above-mentioned step 730 includes: for the target syntax element in at least one syntax element, according to the block division prediction structure of the target coding unit, determining the prediction value of the target syntax element; obtaining the index increment of the context model adopted by the target syntax element The determination condition of the value; according to the predicted value and determination condition of the target syntax element, the index increment value of the context model adopted by the target syntax element is determined.
从目标编码单元的块划分预测结构中可以推理得到目标语法元素的预测值(allowSplitRef),然后将目标语法元素的预测值与原有的上下文模型的索引增量值的确定条件相融合,以确定目标语法元素采用的上下文模型的索引增量值(ctxIdxInc)。通过将语法元素的预测值融入原有的上下文模型的选择条件,可以将原有的上下文模型的选择条件进行优化,并避免引入过多的上下文模型的选择条件,达到了在提升熵编码效率的同时,避免视频压缩过程的复杂度过高,有助于视频压缩效率的提升。本申请实施例对目标语法元素的预测 值(allowSplitRef)的取值范围不作限定,可选地,目标语法元素的预测值(allowSplitRef)的取值包括0或1。The prediction value (allowSplitRef) of the target syntax element can be obtained by inference from the block division prediction structure of the target coding unit, and then the prediction value of the target syntax element is fused with the determination condition of the index increment value of the original context model to determine The index increment value (ctxIdxInc) of the context model adopted by the target syntax element. By integrating the predicted values of the syntax elements into the selection conditions of the original context model, the selection conditions of the original context model can be optimized, and the introduction of too many selection conditions of the context model can be avoided, so as to improve the efficiency of entropy coding. At the same time, avoiding the excessively high complexity of the video compression process is helpful to improve the video compression efficiency. This embodiment of the present application does not limit the value range of the predicted value (allowSplitRef) of the target syntax element. Optionally, the value of the predicted value (allowSplitRef) of the target syntax element includes 0 or 1.
下面,结合图4和本申请实施例所述的上下文模型的选择方法,分别针对qt_split_flag的ctxIdxInc、bet_split_flag的ctxIdxInc、bet_split_type_flag的ctxIdxInc以及bet_split_dir_flag的ctxIdxInc进行示例性地介绍说明。4 and the context model selection method described in the embodiments of the present application, the ctxIdxInc of qt_split_flag, the ctxIdxInc of bet_split_flag, the ctxIdxInc of bet_split_type_flag, and the ctxIdxInc of bet_split_dir_flag are respectively introduced and explained exemplarily.
1、qt_split_flag的ctxIdxInc。1. ctxIdxInc of qt_split_flag.
根据以下方法确定qt_split_flag的ctxIdxInc:Determine the ctxIdxInc of qt_split_flag according to the following method:
(1)如果当前图像为帧内预测图像且E的宽度为128,或allowSplitRef为1时,则ctxIdxInc等于3;(1) If the current image is an intra-frame prediction image and the width of E is 128, or when allowSplitRef is 1, then ctxIdxInc is equal to 3;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像且E的宽度为128”和“allowSplitRef为1”这两个条件均不成立),如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(2) Otherwise (that is, the "if" described in (1) does not hold, or: "the current image is an intra-frame prediction image and the width of E is 128" and "allowSplitRef is 1" The two conditions are not true) , if A "exists" and A's height is less than E's height, and B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 2;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像且E的宽度为128”和“allowSplitRef为1”这两个条件均不成立,并且,“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(3) Otherwise (that is, the "if" described in (2) does not hold, or: "the current image is an intra-frame prediction image and the width of E is 128" and "allowSplitRef is 1". Both conditions are not true, And, "A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" do not hold simultaneously), if A "exists" and the height of A is less than The height of E, or B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 1;
(4)否则(也即(3)中所述的“如果”不成立,或者说:“当前图像为帧内预测图像且E的宽度为128”和“allowSplitRef为1”这两个条件均不成立,并且,“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(4) Otherwise (that is, the "if" described in (3) does not hold, or: "the current image is an intra-frame prediction image and the width of E is 128" and "allowSplitRef is 1" The two conditions are not true, And, "A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" both conditions are not established), then ctxIdxInc is equal to 0.
2、bet_split_flag的ctxIdxInc。2. ctxIdxInc of bet_split_flag.
根据以下方法确定bet_split_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_flag according to the following method:
首先:first:
(1)如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于2;(1) If A "exists" and A's height is less than E's height, and B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 2;
(2)否则(也即(1)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件不同时成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(2) Otherwise (that is, the "if" stated in (1) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "These two conditions do not hold at the same time), if A "exists" and A's height is less than E's height, or B "exists" and B's width is less than E's width, then ctxIdxInc is equal to 1;
(3)否则(也即(2)中所述的“如果”不成立,或者说:“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立),则ctxIdxInc等于0。(3) Otherwise (that is, the "if" stated in (2) does not hold, or: "A 'exists' and A's height is less than E's height" and "B 'exists' and B's width is less than E's width" "Neither of these two conditions hold), then ctxIdxInc is equal to 0.
其次(在(1)、(2)、(3)中任一项的基础上,进一步地):Next (on the basis of any one of (1), (2), (3), and further):
(4)如果E的宽度乘以E的高度的积大于1024,或allowSplitRef为1时,则ctxIdxInc不变;(4) If the product of the width of E times the height of E is greater than 1024, or when allowSplitRef is 1, then ctxIdxInc remains unchanged;
(5)否则(也即(4)中所述的“如果”不成立,或者说:“E的宽度乘以E的高度的积大于1024”和“allowSplitRef为1”这两个条件均不成立),如果E的宽度乘以E的高度的积大于256,则ctxIdxInc增加3;(5) Otherwise (that is, the "if" described in (4) does not hold, or: "The product of the width of E multiplied by the height of E is greater than 1024" and "allowSplitRef is 1" The two conditions are not true), If the product of the width of E times the height of E is greater than 256, then ctxIdxInc is increased by 3;
(6)否则(也即(4)中所述的“如果”不成立,或者说:“E的宽度乘以E的高度的积 大于1024”和“allowSplitRef为1”这两个条件均不成立,并且“E的宽度乘以E的高度的积大于256”不成立),则ctxIdxInc增加6。and "The product of the width of E times the height of E is greater than 256" does not hold), then ctxIdxInc is increased by 6.
3、bet_split_type_flag的ctxIdxInc。3. ctxIdxInc of bet_split_type_flag.
根据以下方法确定bet_split_type_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_type_flag according to the following method:
(1)如果A“存在”且A的高度小于E的高度,且B“存在”且B的宽度小于E的宽度,或allowSplitRef为1时,则ctxIdxInc等于2;(1) If A "exists" and the height of A is less than the height of E, and B "exists" and the width of B is less than the width of E, or allowSplitRef is 1, then ctxIdxInc is equal to 2;
(2)否则(也即(1)中所述的“如果”不成立,“A‘存在’且A的高度小于E的高度,且B‘存在’且B的宽度小于E的宽度”和“allowSplitRef为1”这两个条件均不成立),如果A“存在”且A的高度小于E的高度,或B“存在”且B的宽度小于E的宽度,则ctxIdxInc等于1;(2) Otherwise (that is, the "if" stated in (1) does not hold, "A 'exists' and the height of A is less than the height of E, and B' exists' and the width of B is less than the width of E" and "allowSplitRef" 1" neither of these two conditions is true), if A "exists" and the height of A is less than the height of E, or B "exists" and the width of B is less than the width of E, then ctxIdxInc is equal to 1;
(3)否则(也即(2)中所述的“如果”不成立,“A‘存在’且A的高度小于E的高度”和“B‘存在’且B的宽度小于E的宽度”这两个条件均不成立,并且,“allowSplitRef为1”不成立),则ctxIdxInc等于0。(3) Otherwise (that is, the "if" stated in (2) does not hold, "A 'exists' and the height of A is less than the height of E" and "B 'exists' and the width of B is less than the width of E" two None of the conditions are met, and "allowSplitRef is 1" is not established), then ctxIdxInc is equal to 0.
4、bet_split_dir_flag的ctxIdxInc。4. ctxIdxInc of bet_split_dir_flag.
根据以下方法确定bet_split_dir_flag的ctxIdxInc:Determine the ctxIdxInc of bet_split_dir_flag according to the following method:
(1)如果E的宽度为128且高度为64,或allowSplitRef为1时,则ctxIdxInc等于4;(1) If the width of E is 128 and the height is 64, or allowSplitRef is 1, then ctxIdxInc is equal to 4;
(2)否则(也即(1)中的“如果”不成立,或者说:“E的宽度为128且高度为64”和“allowSplitRef为1”这两个条件均不成立),如果E的宽度为64且高度为128,则ctxIdxInc等于3;(2) Otherwise (that is, the "if" in (1) does not hold, or: "E has a width of 128 and a height of 64" and "allowSplitRef is 1" both conditions are not true), if the width of E is 64 and the height is 128, then ctxIdxInc is equal to 3;
(3)否则(也即(2)中的“如果”不成立,或者说:“E的宽度为128且高度为64”、“E的宽度为64且高度为128”和“allowSplitRef为1”这三个条件均不成立),如果E的高度大于E的宽度,则ctxIdxInc等于2;(3) Otherwise (that is, the "if" in (2) does not hold, or: "E has a width of 128 and a height of 64", "E has a width of 64 and a height of 128" and "allowSplitRef is 1". None of the three conditions hold), if the height of E is greater than the width of E, then ctxIdxInc is equal to 2;
(4)否则(也即(3)中的“如果”不成立,或者说:“E的宽度为128且高度为64”、“E的宽度为64且高度为128”、“E的高度大于E的宽度”和“allowSplitRef为1”这四个条件均不成立),如果E的宽度大于E的高度,则ctxIdxInc等于1;(4) Otherwise (that is, the "if" in (3) does not hold, or: "E has a width of 128 and a height of 64", "E has a width of 64 and a height of 128", "E has a height greater than E" The four conditions of "width" and "allowSplitRef is 1" are not established), if the width of E is greater than the height of E, then ctxIdxInc is equal to 1;
(5)否则(也即(4)中的“如果”不成立,或者说:“E的宽度为128且高度为64”、“E的宽度为64且高度为128”、“E的高度大于E的宽度”、“E的宽度大于E的高度”和“allowSplitRef为1”这五个条件均不成立),ctxIdxInc等于0。(5) Otherwise (that is, the "if" in (4) does not hold, or: "E has a width of 128 and a height of 64", "E has a width of 64 and a height of 128", "E has a height greater than E" The five conditions of "width of E", "width of E is greater than height of E" and "allowSplitRef is 1" are not established), ctxIdxInc is equal to 0.
综上所述,本申请实施例提供的技术方案,通过块划分预测结构作为上下文模型选择过程中增加的进一步的选择条件,可以拓展原有的各个语法元素可以采用的上下文模型,由于上下文模型的数量得到拓展,进而上下文模型的选择更为精确,从而使得语法元素的概率估计更为精确,从而降低码率的比特数。另外,本申请实施例提供的技术方案,通过将语法元素的预测值融入原有的上下文模型的选择条件,可以将原有的上下文模型的选择条件进行优化,并避免引入过多的上下文模型的选择条件,节省语法元素传输所需的码流比特数,避免视频压缩过程的复杂度过高,有助于视频压缩效率的提升。To sum up, the technical solutions provided by the embodiments of the present application can expand the context models that can be used by the original grammar elements by using the block division prediction structure as a further selection condition added in the context model selection process. The number is expanded, and the selection of the context model is more accurate, so that the probability estimation of the syntax elements is more accurate, and the number of bits of the code rate is reduced. In addition, the technical solutions provided by the embodiments of the present application can optimize the selection conditions of the original context model by incorporating the predicted values of the grammar elements into the selection conditions of the original context model, and avoid introducing too many context models. The selection condition saves the number of bits of the code stream required for the transmission of syntax elements, avoids the excessive complexity of the video compression process, and helps to improve the video compression efficiency.
请参考图11,其示出了本申请一个实施例提供的上下文模型的选择流程的示意图。该方法可应用于对视频序列进行编码的设备中,如图5所示的通信***中的第一设备210中;也可以应用于对已编码的视频数据进行解码以恢复视频序列的设备中,如图5所示的通信***中的第二设备220中。Please refer to FIG. 11 , which shows a schematic diagram of a selection process of a context model provided by an embodiment of the present application. The method can be applied to a device for encoding a video sequence, such as the first device 210 in the communication system as shown in FIG. 5; it can also be applied to a device that decodes encoded video data to restore the video sequence, In the second device 220 in the communication system as shown in FIG. 5 .
如图11所示,在对目标编码单元的块划分所涉及的语法元素进行上下文模型选择之前,先确定目标编码单元的参考编码单元。其中,目标编码单元的参考编码单元可以包括以下几种编码单元:空间上与目标编码单元相邻的编码单元(目标编码单元所在视频帧中与目标编码单元相邻的编码单元)、时间上与目标编码单元相邻的编码单元(目标编码单元所在视频帧的相邻视频帧中与目标编码单元相对应的编码单元)、缓存中存储的编码单元。As shown in FIG. 11 , before context model selection is performed on the syntax elements involved in the block division of the target coding unit, the reference coding unit of the target coding unit is determined first. Wherein, the reference coding unit of the target coding unit may include the following coding units: coding units adjacent to the target coding unit in space (coding units adjacent to the target coding unit in the video frame where the target coding unit is located), temporally adjacent to the target coding unit The coding unit adjacent to the target coding unit (the coding unit corresponding to the target coding unit in the video frame adjacent to the video frame where the target coding unit is located), and the coding unit stored in the cache.
上述参考编码单元均为已经完成编码过程或已经完成重建过程的编码单元,因此,上述参考编码单元的内容信息可以为视频编码器或视频解码器获取。确定了参考编码单元后,从参考编码单元的内容信息中获取参考编码单元的各个像素的Y/U/V分量即可。The above-mentioned reference coding units are all coding units that have completed the encoding process or the reconstruction process. Therefore, the content information of the above-mentioned reference coding units can be obtained by a video encoder or a video decoder. After the reference coding unit is determined, the Y/U/V components of each pixel of the reference coding unit can be obtained from the content information of the reference coding unit.
本申请实施例中,采用深度学习模型(结构预测模型)来预测目标编码单元的块划分结构。获取到参考编码单元之后,调用结构预测模型对参考编码单元的各个像素的Y/U/V分量进行处理,从而得到目标编码单元的块划分预测结构。In the embodiment of the present application, a deep learning model (structure prediction model) is used to predict the block division structure of the target coding unit. After the reference coding unit is obtained, the structure prediction model is called to process the Y/U/V components of each pixel of the reference coding unit, so as to obtain the block division prediction structure of the target coding unit.
之后,在目标编码单元的块划分所涉及的语法元素采用的上下文模型的选择过程中,添加目标编码单元的块划分预测结构,以针对语法元素,确定更为精确的上下文模型。Then, in the selection process of the context model adopted by the syntax elements involved in the block division of the target coding unit, the block division prediction structure of the target coding unit is added to determine a more accurate context model for the syntax elements.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are apparatus embodiments of the present application, which can be used to execute the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图12,其示出了本申请一个实施例提供的上下文模型的选择装置的框图。该装置具有实现上述上下文模型的选择方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是对视频序列进行编码的设备,也可以是对已编码的视频数据进行解码以恢复视频序列的设备,还可以设置在上述设备中。该装置1200可以包括:单元确定模块1210、结构预测模块1220和模型确定模块1230。Please refer to FIG. 12 , which shows a block diagram of an apparatus for selecting a context model provided by an embodiment of the present application. The apparatus has the function of implementing the above example of the selection method of the context model, and the function may be implemented by hardware, or by executing corresponding software by the hardware. The apparatus may be a device for encoding a video sequence, or a device for decoding the encoded video data to restore the video sequence, and may also be provided in the above-mentioned device. The apparatus 1200 may include: a unit determination module 1210 , a structure prediction module 1220 and a model determination module 1230 .
单元确定模块1210,用于确定目标编码单元的参考编码单元。The unit determination module 1210 is configured to determine the reference coding unit of the target coding unit.
结构预测模块1220,用于根据所述参考编码单元,对所述目标编码单元的块划分结构进行预测,得到所述目标编码单元的块划分预测结构。The structure prediction module 1220 is configured to predict the block division structure of the target coding unit according to the reference coding unit, so as to obtain the block division prediction structure of the target coding unit.
模型确定模块1230,用于基于所述目标编码单元的块划分预测结构,确定所述目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型;其中,所述语法元素用于指示编码单元的块划分结构,所述上下文模型用于对语法元素进行概率估计。The model determining module 1230 is configured to determine, based on the block division prediction structure of the target coding unit, a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit; wherein the syntax element is used to indicate A block partitioning structure of a coding unit for which the context model is used for probability estimation of syntax elements.
在一个示例中,上述模型确定模块1230,用于:对于所述至少一个语法元素中的目标语法元素,根据所述目标编码单元的块划分预测结构,确定所述目标语法元素的预测值;根据所述目标语法元素的预测值,确定所述目标语法元素采用的上下文模型的索引增量值,所述索引增量值用于指示所述上下文模型。In an example, the above model determination module 1230 is configured to: for the target syntax element in the at least one syntax element, determine the prediction value of the target syntax element according to the block division prediction structure of the target coding unit; The predicted value of the target syntax element determines an index increment value of a context model adopted by the target syntax element, where the index increment value is used to indicate the context model.
在一个示例中,上述模型确定模块1230,用于:根据所述目标语法元素的预测值和所述目标语法元素采用的上下文模型的初始索引增量值,确定所述目标语法元素采用的上下文模型的索引增量值。In an example, the above model determination module 1230 is configured to: determine the context model adopted by the target syntax element according to the predicted value of the target syntax element and the initial index increment value of the context model adopted by the target syntax element The index increment value for .
在一个示例中,上述模型确定模块1230,用于:根据所述目标语法元素的预测值和所述目标语法元素采用的上下文模型的索引增量值的确定条件,确定所述目标语法元素采用的上下文模型的索引增量值。In one example, the above model determination module 1230 is configured to: determine the target syntax element to use according to the predicted value of the target syntax element and the determination condition of the index increment value of the context model used by the target syntax element. The index increment value for the context model.
在一个示例中,上述结构预测模块1220,用于:调用结构预测模型对所述参考编码单元进行处理,得到所述目标编码单元的块划分预测结构,所述结构预测模型用于预测编码单元的块划分结构。In an example, the structure prediction module 1220 is configured to: invoke a structure prediction model to process the reference coding unit to obtain a block division prediction structure of the target coding unit, where the structure prediction model is used to predict the coding unit Block partition structure.
在一个示例中,如图13所示,上述装置1200还包括:样本获取模块1232,用于获取至少一个训练样本;每个所述训练样本包括第一编码单元的块划分结构和第二编码单元,所述第二编码单元为所述第一编码单元的参考编码单元;结构处理模块1234,用于调用所述结构预测模型对所述第二编码单元进行处理,得到所述第一编码单元的块划分预测结构;损失值计算模块1236,用于根据所述第一编码单元的块划分预测结构和所述第一编码单元的块划分结构,计算所述结构预测模型的预测损失值,所述预测损失值用于指示所述第一编码单元的块划分预测结构和所述第一编码单元的块划分结构之间的误差;参数调整模块1238,用于根据所述预测损失值,调整所述结构预测模型的参数。In an example, as shown in FIG. 13 , the above-mentioned apparatus 1200 further includes: a sample acquisition module 1232, configured to acquire at least one training sample; each of the training samples includes a block division structure of the first coding unit and a second coding unit , the second coding unit is the reference coding unit of the first coding unit; the structure processing module 1234 is configured to call the structure prediction model to process the second coding unit, and obtain the first coding unit. block division prediction structure; the loss value calculation module 1236 is configured to calculate the prediction loss value of the structure prediction model according to the block division prediction structure of the first coding unit and the block division structure of the first coding unit, the The prediction loss value is used to indicate the error between the block division prediction structure of the first coding unit and the block division structure of the first coding unit; the parameter adjustment module 1238 is configured to adjust the prediction loss value according to the prediction loss value. Parameters for structural prediction models.
在一个示例中,上述单元确定模块1210,用于:获取目标视频帧中满足目标条件的编码单元,所述目标视频帧为所述目标编码单元所在的视频帧;从所述满足所述目标条件的编码单元中选择与所述目标编码单元相邻的编码单元,作为所述参考编码单元。In one example, the above unit determination module 1210 is configured to: obtain a coding unit that satisfies the target condition in the target video frame, where the target video frame is the video frame where the target coding unit is located; Among the coding units of , the coding unit adjacent to the target coding unit is selected as the reference coding unit.
在一个示例中,上述单元确定模块1210,用于:确定所述目标编码单元在目标视频帧中的位置信息,所述目标视频帧为所述目标编码单元所在的视频帧;获取所述目标视频帧的至少一个相邻视频帧;将所述至少一个相邻视频帧中满足所述位置信息的编码单元,确定为所述参考编码单元。In an example, the above unit determination module 1210 is configured to: determine the position information of the target coding unit in the target video frame, where the target video frame is the video frame where the target coding unit is located; obtain the target video at least one adjacent video frame of the frame; and determining a coding unit that satisfies the position information in the at least one adjacent video frame as the reference coding unit.
在一个示例中,上述单元确定模块1210,用于:获取缓存中存储的至少一个编码单元;将获取的所述至少一个编码单元,确定为所述参考编码单元。In an example, the above-mentioned unit determination module 1210 is configured to: acquire at least one coding unit stored in the cache; and determine the acquired at least one coding unit as the reference coding unit.
在一个示例中,所述参考编码单元的数量为大于或等于2的正整数;如图13所示,所述装置1200还包括:率失真代价确定模块1242,用于确定各个所述参考编码单元的率失真代价;优选单元确定模块1244,用于根据各个所述参考编码单元的率失真代价,从至少两个所述参考编码单元中选择优选编码单元,所述优选编码单元的块划分结构用于预测所述目标编码单元的块划分结构。In an example, the number of the reference coding units is a positive integer greater than or equal to 2; as shown in FIG. 13 , the apparatus 1200 further includes: a rate-distortion cost determination module 1242, configured to determine each of the reference coding units the rate-distortion cost; the preferred unit determination module 1244 is configured to select a preferred coding unit from at least two of the reference coding units according to the rate-distortion cost of each of the reference coding units, and the block division structure of the preferred coding unit is for predicting the block partition structure of the target coding unit.
在一个示例中,所述至少一个语法元素包括:第一语法元素,用于指示是否采用第一划分方式对所述目标编码单元进行块划分;第二语法元素,用于指示是否采用第二划分方式或第三划分方式对所述目标编码单元进行块划分;第三语法元素,用于在所述第二语法元素指示采用所述第二划分方式或所述第三划分方式对所述目标编码单元进行块划分的情况下,指示对所述目标编码单元进行块划分采用的是所述第二划分方式还是所述第三划分方式;第四语法元素,用于指示采用所述第二划分方式或所述第三划分方式对所述目标编码单元进行块划分的情况下,所述第二划分方式或所述第三划分方式的划分方向。In an example, the at least one syntax element includes: a first syntax element for indicating whether to use a first division manner to perform block division on the target coding unit; a second syntax element for indicating whether to use a second division The target coding unit is divided into blocks by the method or the third division method; the third syntax element is used to indicate that the target coding unit is encoded by the second division method or the third division method in the second syntax element. When the unit is divided into blocks, it indicates whether the second division method or the third division method is used for the block division of the target coding unit; the fourth syntax element is used to indicate that the second division method is used. Or when the third division method performs block division on the target coding unit, the division direction of the second division method or the third division method.
在一个示例中,所述第一划分方式包括QT划分;和/或,所述第二划分方式包括BT划分和/或TT划分;和/或,所述第三划分方式包括EQT划分。In an example, the first division manner includes QT division; and/or the second division manner includes BT division and/or TT division; and/or the third division manner includes EQT division.
综上所述,本申请实施例提供的技术方案,通过根据某一编码单元的参考编码单元,预测该编码单元的块划分结构,然后在语法元素的上下文模型的选择过程中,添加该编码单元的块划分结构的预测结果,从而可以增加上下文模型的选择条件或优化上下文模型的选择条件,以提升熵编码的效率、减少码流的比特数。并且,由于本申请实施例只添加了块划分结构的预测结果即可获得更为精确的概率估计,从而避免增加的上下文模型的选择条件过多,有助于视频压缩效率的提升。To sum up, the technical solutions provided by the embodiments of the present application predict the block division structure of a coding unit according to the reference coding unit of the coding unit, and then add the coding unit during the selection process of the context model of the syntax element. Therefore, the selection conditions of the context model can be increased or the selection conditions of the context model can be optimized, so as to improve the efficiency of entropy coding and reduce the number of bits of the code stream. In addition, since only the prediction result of the block division structure is added in the embodiment of the present application, a more accurate probability estimation can be obtained, so as to avoid too many selection conditions of the added context model, which is helpful to improve the video compression efficiency.
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将 设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when implementing the functions of the device provided in the above-mentioned embodiments, only the division of the above-mentioned functional modules is used as an example. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
请参考图14,其示出了本申请一个实施例提供的计算机设备的结构框图。该计算机设备可以是上文介绍的用于对视频序列进行编码的设备中,如图5所示的通信***中的第一设备210;也可以是上文介绍的用于对已编码的视频数据进行解码以恢复视频序列的设备中,如图5所示的通信***中的第二设备220。该计算机设备140可以包括:处理器141、存储器142、通信接口143、编码器/解码器144和总线145。Please refer to FIG. 14 , which shows a structural block diagram of a computer device provided by an embodiment of the present application. The computer device may be the device for encoding a video sequence described above, such as the first device 210 in the communication system shown in FIG. 5; it may also be the device described above for encoding the encoded video data In the device for decoding to recover a video sequence, the second device 220 in the communication system as shown in FIG. 5 is used. The computer device 140 may include: a processor 141 , a memory 142 , a communication interface 143 , an encoder/decoder 144 and a bus 145 .
处理器141包括一个或者一个以上处理核心,处理器141通过运行软件程序以及模块,从而执行各种功能应用以及信息处理。The processor 141 includes one or more processing cores, and the processor 141 executes various functional applications and information processing by running software programs and modules.
存储器142可用于存储计算机程序,处理器141用于执行该计算机程序,以实现上述上下文模型的选择方法。The memory 142 can be used for storing a computer program, and the processor 141 is used for executing the computer program, so as to realize the above-mentioned selection method of the context model.
通信接口143可用于与其它设备进行通信,如收发音视频数据。The communication interface 143 can be used to communicate with other devices, such as to receive audio and video data.
编码器/解码器144可用于实现编码和解码功能,如对音视频数据进行编码和解码。The encoder/decoder 144 may be used to implement encoding and decoding functions, such as encoding and decoding audio and video data.
存储器142通过总线145与处理器141相连。The memory 142 is connected to the processor 141 through the bus 145 .
此外,存储器142可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,易失性或非易失性存储设备包括但不限于:磁盘或光盘,EEPROM(Electrically Erasable Programmable Read-Only Memory,电可擦除可编程只读存储器),EPROM(Erasable Programmable Read-Only Memory,可擦除可编程只读存储器),SRAM(Static Random-Access Memory,静态随时存取存储器),ROM(Read-Only Memory,只读存储器),磁存储器,快闪存储器,PROM(Programmable read-only memory,可编程只读存储器)。In addition, the memory 142 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, volatile or non-volatile storage devices include but are not limited to: magnetic disk or optical disk, EEPROM (Electrically Erasable Programmable Read -Only Memory, Electrically Erasable Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), SRAM (Static Random-Access Memory, Static Access Memory), ROM (Read-Only Memory, read-only memory), magnetic memory, flash memory, PROM (Programmable read-only memory, programmable read-only memory).
本领域技术人员可以理解,图14中示出的结构并不构成对计算机设备140的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。Those skilled in the art can understand that the structure shown in FIG. 14 does not constitute a limitation on the computer device 140, and may include more or less components than those shown, or combine some components, or adopt different component arrangements.
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或所述指令集在被计算机设备的处理器执行时以实现上述上下文模型的选择方法。In an exemplary embodiment, a computer-readable storage medium is also provided, wherein the storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program , the set of codes or the set of instructions when executed by a processor of a computer device to implement the above selection method of a context model.
可选地,该计算机可读存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、固态硬盘(SSD,Solid State Drives)或光盘等。其中,随机存取记忆体可以包括电阻式随机存取记忆体(ReRAM,Resistance Random Access Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。Optionally, the computer-readable storage medium may include: Read Only Memory (ROM, Read Only Memory), Random Access Memory (RAM, Random Access Memory), Solid State Drive (SSD, Solid State Drives), or an optical disc. The random access memory may include a resistive random access memory (ReRAM, Resistance Random Access Memory) and a dynamic random access memory (DRAM, Dynamic Random Access Memory).
在示例性实施例中,还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述上下文模型的选择方法。In an exemplary embodiment, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the above-mentioned selection method of the context model.
在示例性实施例中,还提供了一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片在计算机设备上运行时,用于实现上述上下文模型的选择方法。In an exemplary embodiment, there is also provided a chip comprising programmable logic circuits and/or program instructions for implementing the selection method of the context model described above when the chip is run on a computer device.

Claims (17)

  1. 一种上下文模型的选择方法,所述方法包括:A method for selecting a context model, the method comprising:
    确定目标编码单元的参考编码单元;determining the reference coding unit of the target coding unit;
    根据所述参考编码单元,对所述目标编码单元的块划分结构进行预测,得到所述目标编码单元的块划分预测结构;According to the reference coding unit, the block division structure of the target coding unit is predicted to obtain the block division prediction structure of the target coding unit;
    基于所述目标编码单元的块划分预测结构,确定所述目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型;determining, based on the block division prediction structure of the target coding unit, a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit;
    其中,所述语法元素用于指示编码单元的块划分结构,所述上下文模型用于对语法元素进行概率估计。Wherein, the syntax element is used to indicate the block division structure of the coding unit, and the context model is used to estimate the probability of the syntax element.
  2. 根据权利要求1所述的方法,其中,所述基于所述目标编码单元的块划分预测结构,确定所述目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型,包括:The method according to claim 1, wherein, based on the block division prediction structure of the target coding unit, determining a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit comprises:
    对于所述至少一个语法元素中的目标语法元素,根据所述目标编码单元的块划分预测结构,确定所述目标语法元素的预测值;for a target syntax element in the at least one syntax element, determining a prediction value of the target syntax element according to a block partition prediction structure of the target coding unit;
    根据所述目标语法元素的预测值,确定所述目标语法元素采用的上下文模型的索引增量值,所述索引增量值用于指示所述上下文模型。According to the predicted value of the target syntax element, an index increment value of the context model adopted by the target syntax element is determined, and the index increment value is used to indicate the context model.
  3. 根据权利要求2所述的方法,其中,所述根据所述目标语法元素的预测值,确定所述目标语法元素采用的上下文模型的索引增量值,包括:The method according to claim 2, wherein the determining, according to the predicted value of the target syntax element, the index increment value of the context model adopted by the target syntax element comprises:
    根据所述目标语法元素的预测值和所述目标语法元素采用的上下文模型的初始索引增量值,确定所述目标语法元素采用的上下文模型的索引增量值。According to the predicted value of the target syntax element and the initial index increment value of the context model adopted by the target syntax element, the index increment value of the context model adopted by the target syntax element is determined.
  4. 根据权利要求2所述的方法,其中,所述根据所述目标语法元素的预测值,确定所述目标语法元素采用的上下文模型的索引增量值,包括:The method according to claim 2, wherein the determining, according to the predicted value of the target syntax element, the index increment value of the context model adopted by the target syntax element comprises:
    根据所述目标语法元素的预测值和所述目标语法元素采用的上下文模型的索引增量值的确定条件,确定所述目标语法元素采用的上下文模型的索引增量值。According to the predicted value of the target syntax element and the determination condition of the index increment value of the context model adopted by the target syntax element, the index increment value of the context model adopted by the target syntax element is determined.
  5. 根据权利要求1所述的方法,其中,所述根据所述参考编码单元,对所述目标编码单元的块划分结构进行预测,得到所述目标编码单元的块划分预测结构,包括:The method according to claim 1, wherein the predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit comprises:
    调用结构预测模型对所述参考编码单元进行处理,得到所述目标编码单元的块划分预测结构,所述结构预测模型用于预测编码单元的块划分结构。A structure prediction model is called to process the reference coding unit to obtain a block division prediction structure of the target coding unit, and the structure prediction model is used to predict the block division structure of the coding unit.
  6. 根据权利要求5所述的方法,其中,所述结构预测模型的训练过程如下:The method according to claim 5, wherein the training process of the structure prediction model is as follows:
    获取至少一个训练样本;每个所述训练样本包括第一编码单元的块划分结构和第二编码单元,所述第二编码单元为所述第一编码单元的参考编码单元;Acquiring at least one training sample; each of the training samples includes a block division structure of a first coding unit and a second coding unit, and the second coding unit is a reference coding unit of the first coding unit;
    调用所述结构预测模型对所述第二编码单元进行处理,得到所述第一编码单元的块划分预测结构;invoking the structure prediction model to process the second coding unit to obtain a block division prediction structure of the first coding unit;
    根据所述第一编码单元的块划分预测结构和所述第一编码单元的块划分结构,计算所述结构预测模型的预测损失值,所述预测损失值用于指示所述第一编码单元的块划分预测结构和所述第一编码单元的块划分结构之间的误差;A prediction loss value of the structural prediction model is calculated according to the block division prediction structure of the first coding unit and the block division structure of the first coding unit, where the prediction loss value is used to indicate the prediction loss of the first coding unit. an error between the block partition prediction structure and the block partition structure of the first coding unit;
    根据所述预测损失值,调整所述结构预测模型的参数。According to the predicted loss value, the parameters of the structural prediction model are adjusted.
  7. 根据权利要求1所述的方法,其中,所述确定目标编码单元的参考编码单元,包括:The method according to claim 1, wherein the determining the reference coding unit of the target coding unit comprises:
    获取目标视频帧中满足目标条件的编码单元,所述目标视频帧为所述目标编码单元所在 的视频帧;Obtain the coding unit that satisfies the target condition in the target video frame, and the target video frame is the video frame where the target coding unit is located;
    从所述满足所述目标条件的编码单元中选择与所述目标编码单元相邻的编码单元,作为所述参考编码单元。A coding unit adjacent to the target coding unit is selected from the coding units satisfying the target condition as the reference coding unit.
  8. 根据权利要求1所述的方法,其中,所述确定目标编码单元的参考编码单元,包括:The method according to claim 1, wherein the determining the reference coding unit of the target coding unit comprises:
    确定所述目标编码单元在目标视频帧中的位置信息,所述目标视频帧为所述目标编码单元所在的视频帧;Determine the position information of the target coding unit in the target video frame, and the target video frame is the video frame where the target coding unit is located;
    获取所述目标视频帧的至少一个相邻视频帧;obtaining at least one adjacent video frame of the target video frame;
    将所述至少一个相邻视频帧中满足所述位置信息的编码单元,确定为所述参考编码单元。A coding unit satisfying the position information in the at least one adjacent video frame is determined as the reference coding unit.
  9. 根据权利要求1所述的方法,其中,所述确定目标编码单元的参考编码单元,包括:The method according to claim 1, wherein the determining the reference coding unit of the target coding unit comprises:
    获取缓存中存储的至少一个编码单元;Obtain at least one coding unit stored in the cache;
    将获取的所述至少一个编码单元,确定为所述参考编码单元。The acquired at least one coding unit is determined as the reference coding unit.
  10. 根据权利要求1所述的方法,其中,所述参考编码单元的数量为大于或等于2的正整数;所述确定目标编码单元的参考编码单元之后,还包括:The method according to claim 1, wherein the number of the reference coding unit is a positive integer greater than or equal to 2; after the determining the reference coding unit of the target coding unit, further comprising:
    确定各个所述参考编码单元的率失真代价;determining a rate-distortion cost for each of the reference coding units;
    根据各个所述参考编码单元的率失真代价,从至少两个所述参考编码单元中选择优选编码单元,所述优选编码单元的块划分结构用于预测所述目标编码单元的块划分结构。According to the rate-distortion cost of each of the reference coding units, a preferred coding unit is selected from at least two of the reference coding units, and the block division structure of the preferred coding unit is used to predict the block division structure of the target coding unit.
  11. 根据权利要求1至9任一项所述的方法,其中,所述至少一个语法元素包括:The method of any one of claims 1 to 9, wherein the at least one syntax element comprises:
    第一语法元素,用于指示是否采用第一划分方式对所述目标编码单元进行块划分;a first syntax element, used to indicate whether to perform block division on the target coding unit by adopting the first division method;
    第二语法元素,用于指示是否采用第二划分方式或第三划分方式对所述目标编码单元进行块划分;The second syntax element is used to indicate whether to use the second division manner or the third division manner to perform block division on the target coding unit;
    第三语法元素,用于在所述第二语法元素指示采用所述第二划分方式或所述第三划分方式对所述目标编码单元进行块划分的情况下,指示对所述目标编码单元进行块划分采用的是所述第二划分方式还是所述第三划分方式;a third syntax element, used to instruct the target coding unit to perform block division on the target coding unit when the second syntax element indicates that the second division manner or the third division manner is used to perform block division on the target coding unit whether the block division adopts the second division manner or the third division manner;
    第四语法元素,用于指示采用所述第二划分方式或所述第三划分方式对所述目标编码单元进行块划分的情况下,所述第二划分方式或所述第三划分方式的划分方向。The fourth syntax element is used to indicate the division of the second division manner or the third division manner when the target coding unit is divided by the second division manner or the third division manner direction.
  12. 根据权利要求10所述的方法,其中,所述第一划分方式包括四叉树QT划分;和/或,所述第二划分方式包括二叉树BT和/或三叉树TT划分;和/或,所述第三划分方式包括扩展四叉树EQT划分。The method according to claim 10, wherein the first division manner includes quadtree QT division; and/or the second division manner includes binary tree BT and/or ternary tree TT division; and/or, the The third division method includes extended quadtree EQT division.
  13. 一种上下文模型的选择装置,所述装置包括:An apparatus for selecting a context model, the apparatus comprising:
    单元确定模块,用于确定目标编码单元的参考编码单元;a unit determination module for determining the reference coding unit of the target coding unit;
    结构预测模块,用于根据所述参考编码单元,对所述目标编码单元的块划分结构进行预测,得到所述目标编码单元的块划分预测结构;a structure prediction module, configured to predict the block division structure of the target coding unit according to the reference coding unit, to obtain the block division prediction structure of the target coding unit;
    模型确定模块,用于基于所述目标编码单元的块划分预测结构,确定所述目标编码单元的块划分所涉及的至少一个语法元素分别采用的上下文模型;a model determination module, configured to determine, based on the block division prediction structure of the target coding unit, a context model respectively adopted by at least one syntax element involved in the block division of the target coding unit;
    其中,所述语法元素用于指示编码单元的块划分结构,所述上下文模型用于对语法元素进行概率估计。Wherein, the syntax element is used to indicate the block division structure of the coding unit, and the context model is used to estimate the probability of the syntax element.
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至12任一项所述的上下文模型的 选择方法。A computer device, the computer device includes a processor and a memory, the memory stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program, the code A set or instruction set is loaded and executed by the processor to implement the selection method of a context model as claimed in any one of claims 1 to 12.
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至12任一项所述的上下文模型的选择方法。A computer-readable storage medium storing at least one instruction, at least one piece of program, code set or instruction set, said at least one instruction, said at least one piece of program, said code set or instruction set The set is loaded and executed by the processor to implement the selection method of a context model as claimed in any one of claims 1 to 12.
  16. 一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中,处理器从所述计算机可读存储介质读取并执行所述计算机指令,以实现如权利要求1至12任一项所述的上下文模型的选择方法。A computer program product or computer program comprising computer instructions stored in a computer-readable storage medium from which a processor reads and executes the Computer instructions to implement a context model selection method as claimed in any one of claims 1 to 12.
  17. 一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片在计算机设备上运行时,用于实现如权利要求1至12任一项所述的上下文模型的选择方法。A chip comprising programmable logic circuits and/or program instructions for implementing the method for selecting a context model according to any one of claims 1 to 12 when the chip runs on a computer device.
PCT/CN2021/118832 2020-09-23 2021-09-16 Context model selection method and apparatus, device and storage medium WO2022063035A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011009881.5 2020-09-23
CN202011009881.5A CN114257810B (en) 2020-09-23 2020-09-23 Context model selection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022063035A1 true WO2022063035A1 (en) 2022-03-31

Family

ID=80788599

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118832 WO2022063035A1 (en) 2020-09-23 2021-09-16 Context model selection method and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN114257810B (en)
WO (1) WO2022063035A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170594A (en) * 2023-04-19 2023-05-26 中国科学技术大学 Coding method and device based on rate distortion cost prediction

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883835B (en) * 2023-03-03 2023-04-28 腾讯科技(深圳)有限公司 Video coding method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050179572A1 (en) * 2004-02-09 2005-08-18 Lsi Logic Corporation Method for selection of contexts for arithmetic coding of reference picture and motion vector residual bitstream syntax elements
CN103765887A (en) * 2011-07-01 2014-04-30 三星电子株式会社 Method and apparatus for entropy encoding using hierarchical data unit, and method and apparatus for decoding
WO2018174617A1 (en) * 2017-03-22 2018-09-27 한국전자통신연구원 Block form-based prediction method and device
CN109361920A (en) * 2018-10-31 2019-02-19 南京大学 A kind of interframe quick predict algorithm of the adaptive decision-making tree selection towards more scenes
WO2020012023A1 (en) * 2018-07-13 2020-01-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Partitioned intra coding concept
CN111316642A (en) * 2017-10-27 2020-06-19 华为技术有限公司 Method and apparatus for signaling image coding and decoding partition information
CN111435993A (en) * 2019-01-14 2020-07-21 华为技术有限公司 Video encoder, video decoder and corresponding methods

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190140862A (en) * 2018-06-12 2019-12-20 한국전자통신연구원 Method and apparatus for context adaptive binary arithmetic coding
WO2020141163A1 (en) * 2019-01-02 2020-07-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Encoding and decoding a picture

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050179572A1 (en) * 2004-02-09 2005-08-18 Lsi Logic Corporation Method for selection of contexts for arithmetic coding of reference picture and motion vector residual bitstream syntax elements
CN103765887A (en) * 2011-07-01 2014-04-30 三星电子株式会社 Method and apparatus for entropy encoding using hierarchical data unit, and method and apparatus for decoding
WO2018174617A1 (en) * 2017-03-22 2018-09-27 한국전자통신연구원 Block form-based prediction method and device
CN111316642A (en) * 2017-10-27 2020-06-19 华为技术有限公司 Method and apparatus for signaling image coding and decoding partition information
WO2020012023A1 (en) * 2018-07-13 2020-01-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Partitioned intra coding concept
CN109361920A (en) * 2018-10-31 2019-02-19 南京大学 A kind of interframe quick predict algorithm of the adaptive decision-making tree selection towards more scenes
CN111435993A (en) * 2019-01-14 2020-07-21 华为技术有限公司 Video encoder, video decoder and corresponding methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MARIO SALDANHA ET AL.: "Fast 3D-HEVC Depth Map Encoding.", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY., vol. 30, no. 3, 7 February 2019 (2019-02-07), XP011776286, DOI: 10.1109/TCSVT.2019.2898122 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170594A (en) * 2023-04-19 2023-05-26 中国科学技术大学 Coding method and device based on rate distortion cost prediction
CN116170594B (en) * 2023-04-19 2023-07-14 中国科学技术大学 Coding method and device based on rate distortion cost prediction

Also Published As

Publication number Publication date
CN114257810B (en) 2023-01-06
CN114257810A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
JP7483035B2 (en) Video decoding method and video encoding method, apparatus, computer device and computer program thereof
CN108605128B (en) Method and apparatus for filtering decoded blocks of video data and storage medium
TW201841503A (en) Intra filtering flag in video coding
TW201906406A (en) Internal filtering applied with transform processing in video write code
WO2022063035A1 (en) Context model selection method and apparatus, device and storage medium
WO2022078163A1 (en) Video decoding method, video encoding method, and related device
CN111770345B (en) Motion estimation method, device and equipment of coding unit and storage medium
WO2022116836A1 (en) Video decoding method and apparatus, video coding method and apparatus, and device
WO2022063033A1 (en) Video decoding method and apparatus, video coding method and apparatus, computer-readable medium, and electronic device
WO2023005709A1 (en) Video encoding method and apparatus, medium, and electronic device
TW202135530A (en) Method, apparatus and system for encoding and decoding a block of video samples
TW202032993A (en) Escape coding for coefficient levels
WO2022174660A1 (en) Video coding and decoding method, video coding and decoding apparatus, computer-readable medium, and electronic device
EP4366305A1 (en) Encoding method for video data, decoding method for video data, computing device, and medium
WO2022022299A1 (en) Method, apparatus, and device for constructing motion information list in video coding and decoding
WO2022063040A1 (en) Video coding/decoding method, apparatus, and device
CN111770338B (en) Method, device and equipment for determining index value of coding unit and storage medium
CN111953972A (en) Hash table construction method, device and equipment in IBC mode
US20230082386A1 (en) Video encoding method and apparatus, video decoding method and apparatus, computer-readable medium, and electronic device
WO2023051222A1 (en) Filtering method and apparatus, encoding method and apparatus, decoding method and apparatus, computer-readable medium, and electronic device
WO2022174701A1 (en) Video coding method and apparatus, video decoding method and apparatus, and computer-readable medium and electronic device
WO2023202097A1 (en) Loop filtering method, video coding method and apparatus, video decoding method and apparatus, medium, program product, and electronic device
WO2023051223A1 (en) Filtering method and apparatus, encoding method and apparatus, decoding method and apparatus, computer-readable medium, and electronic device
CN114286095B (en) Video decoding method, device and equipment
US20230065748A1 (en) Video decoding method and apparatus, readable medium, electronic device, and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21871405

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/08/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21871405

Country of ref document: EP

Kind code of ref document: A1