CN113473154A

CN113473154A - Video encoding method, video decoding method, video encoding device, video decoding device and storage medium

Info

Publication number: CN113473154A
Application number: CN202110735696.2A
Authority: CN
Inventors: 魏亮; 陈方栋; 王莉
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-01
Anticipated expiration: 2041-06-30
Also published as: CN113473154B

Abstract

The embodiment of the application discloses a video coding method, a video decoding method, a video coding device, a video decoding device and a storage medium, and belongs to the technical field of multimedia. In the embodiment of the application, a vector quantization method may be adopted to quantize coefficients to be encoded in a plurality of transform coefficient matrices to obtain corresponding first indication information, and then a video code stream of a target video is generated according to a plurality of code word indication information including the first indication information. Subsequently, the decoding end can decode the vector quantized coefficient according to the first indication information. Therefore, the embodiment of the application provides a video coding and decoding method based on vector quantization, the application of the vector quantization in video coding and decoding is realized, and compared with scalar quantization and network coding quantization, the quantization performance is improved, and further the performance of the video coding and decoding is improved.

Description

Video encoding method, video decoding method, video encoding device, video decoding device and storage medium

Technical Field

The present application relates to the field of multimedia technologies, and in particular, to a method, an apparatus, and a storage medium for video encoding and video decoding.

Background

With the development of multimedia technology, video coding technology has gained wide attention. When a video is coded, discrete cosine transform needs to be performed on a residual signal of the video, so that a transform coefficient matrix is obtained. And then, quantizing the transform coefficients included in the transform coefficient matrix, and further determining the video code stream according to the quantization result. At present, quantization technologies mainly include scalar quantization, vector quantization and trellis coding quantization, wherein the scalar quantization and the trellis coding quantization have been widely applied to video coding and decoding, but the vector quantization has not been applied to video coding and decoding, and therefore, there is a need for providing a video coding and decoding method based on the vector quantization technology.

Disclosure of Invention

The embodiment of the application provides a video coding method, a video decoding method, a video coding device, a video decoding device and a storage medium, which can acquire a plurality of transformation coefficient matrixes corresponding to a target video, quantize transformation coefficients included in the plurality of transformation coefficient matrixes, and obtain a video code stream according to a quantization result. The technical scheme is as follows:

in one aspect, a video encoding method is provided, where the method includes:

obtaining a plurality of transformation coefficient matrixes corresponding to a target video, wherein each transformation coefficient matrix in the plurality of transformation coefficient matrixes comprises a plurality of coefficients to be coded;

quantizing a coefficient to be coded in the multiple transformation coefficient matrixes to obtain multiple code word indication information, wherein the multiple code word indication information is used for indicating a quantization result obtained when the coefficient to be coded is quantized, the multiple code word indication information comprises first indication information, and the first indication information is used for indicating a code word obtained when the coefficient to be coded is subjected to vector quantization;

and generating a video code stream of the target video according to the plurality of code word indication information.

Optionally, the quantizing the coefficient to be encoded in the multiple transform coefficient matrices to obtain multiple codeword indication information includes:

and performing vector quantization or mixed quantization on a plurality of coefficients to be coded in each of the plurality of transform coefficient matrices to obtain at least one piece of codeword information corresponding to each transform coefficient matrix, wherein the mixed quantization includes vector quantization and other quantization except for vector quantization, the at least one piece of codeword information corresponding to each transform coefficient matrix includes the first indication information, and the first indication information is used for indicating codewords obtained in the process of performing vector quantization on all or part of coefficients to be coded in the corresponding transform coefficient matrix.

Optionally, the quantizing the coefficient to be encoded in the multiple transform coefficient matrices includes:

and performing vector quantization or mixed quantization on a plurality of coefficients to be coded in each target transform coefficient matrix which meets the vector quantization condition in the plurality of transform coefficient matrices to obtain at least one piece of codeword indication information corresponding to each target transform coefficient matrix, wherein the mixed quantization comprises vector quantization and other quantization except for the vector quantization, the at least one piece of codeword information corresponding to each target transform coefficient matrix comprises the first indication information, and the first indication information is used for indicating codewords obtained in the process of performing the vector quantization on all or part of the coefficients to be coded in the corresponding transform coefficient matrix.

Optionally, the plurality of coefficients to be encoded included in each of the plurality of transform coefficient matrices are all transform coefficients included in the corresponding transform coefficient matrix.

Optionally, the vector quantization condition includes at least one of a number of rows of the transform coefficient matrix reaching a first row number threshold, or a number of columns of the transform coefficient matrix reaching a first column number threshold, or an area of the transform coefficient matrix reaching a first area threshold.

Optionally, the multiple coefficients to be encoded included in each of the multiple transform coefficient matrices are transform coefficients in a first region in the corresponding transform coefficient matrix, where the first region is a region obtained by scanning through a scanning region-based SRCC encoding technique.

Optionally, the vector quantization condition includes at least one of a number of rows of a first region in the transform coefficient matrix reaching a second row number threshold, or a number of columns of the first region in the transform coefficient matrix reaching a second column number threshold, or an area of the first region in the transform coefficient matrix reaching a second area threshold.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in the video code stream of the target video corresponds to a vector quantization switch identifier, where the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes first indication information obtained through vector quantization.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in the video code stream of the target video corresponds to a vector number or a vector quantization frequency, where the vector number or the vector quantization frequency is used to indicate the number of the first indication information included in the corresponding at least one codeword indication information.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in the video code stream of the target video further corresponds to a vector quantization start position, where the vector quantization start position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information.

if the number of the coefficients to be coded in a first transformation coefficient matrix is N times of the dimension of the code word, generating N input signal vectors according to all the coefficients to be coded in the first transformation coefficient matrix, wherein N is a positive integer, the dimension of the input signal vector is equal to the dimension of the code word, and the first transformation coefficient matrix is any one of the plurality of transformation coefficient matrices, or the first transformation coefficient matrix is the transformation coefficient matrix which meets the vector quantization condition in the plurality of transformation coefficient matrices;

determining first indication information corresponding to each input signal vector in the N input signal vectors;

and taking the determined N pieces of first indication information as at least one piece of code word indication information corresponding to the first transformation coefficient matrix.

if the number of the coefficients to be coded in a first transformation coefficient matrix is not an integer multiple of the dimension of the code word, generating M input signal vectors according to part of the coefficients to be coded in the first transformation coefficient matrix and the dimension of the code word, wherein the dimension of the input signal vector is equal to the dimension of the code word, the first transformation coefficient matrix is any one of the plurality of transformation coefficient matrices, or the first transformation coefficient matrix is the transformation coefficient matrix which meets the vector quantization condition in the plurality of transformation coefficient matrices;

determining first indication information corresponding to each input signal vector in the M input signal vectors, and performing other non-vector quantization on the residual coefficients to be coded in the first transform coefficient matrix to obtain second indication information corresponding to the residual coefficients to be coded;

and taking the determined first indication information and the second indication information as a plurality of code word indication information corresponding to the first transformation coefficient matrix.

if the number of the coefficients to be coded in a first transformation coefficient matrix is not an integer multiple of the code word dimension, generating R input signal vectors according to all the coefficients to be coded in the first transformation coefficient matrix and the code word dimension, wherein the dimension of the input signal vector is equal to the code word dimension, one input signal vector in the R input signal vectors comprises one or more coefficient filling values, and the first transformation coefficient matrix is any one of the plurality of transformation coefficient matrices, or the first transformation coefficient matrix is a transformation coefficient matrix which meets the vector quantization condition in the plurality of transformation coefficient matrices;

determining first indication information corresponding to each input signal vector in the R input signal vectors;

and taking the determined R pieces of first indication information as at least one piece of code word indication information corresponding to the first transformation coefficient matrix.

Optionally, the first indication information is a codeword index, codeword index indication information, or indication information of a codeword component included in a codeword.

Optionally, the codeword index is a one-dimensional index value, or the codeword index is a multi-dimensional index coordinate.

Optionally, the video code stream of the target video includes an enable identifier of vector quantization, where the enable identifier of vector quantization is used to indicate whether the video code stream of the target video supports vector quantization.

Optionally, the video code stream of the target video contains at least one vector dimension, where the at least one vector dimension is used to indicate a dimension of a codeword indicated by each piece of first indication information in the video code stream of the target video.

Optionally, the video code stream of the target video further includes at least one codebook index, where the at least one codebook index is used to indicate a codebook used when determining a codeword corresponding to each piece of first indication information in the video code stream of the target video.

In another aspect, a video decoding method is provided, where the method includes:

the method comprises the steps of obtaining a video code stream of a target video, wherein the video code stream of the target video comprises a plurality of code word indicating information, the code word indicating information is used for indicating a quantization result obtained when a coefficient to be coded in a plurality of transformation coefficient matrixes corresponding to the target video is quantized, the code word indicating information comprises first indicating information, the first indicating information is used for indicating a code word obtained when the coefficient to be coded is subjected to vector quantization, and each transformation coefficient matrix in the plurality of transformation coefficient matrixes comprises a plurality of coefficients to be coded;

and acquiring the decoding data of the target video according to the plurality of code word indication information.

Optionally, the indicating information of the multiple code words is multiple first indicating information, and the obtaining the decoded data of the target video according to the multiple code word indicating information includes:

acquiring a code word corresponding to each piece of first indication information in the plurality of pieces of first indication information;

and generating the decoding data of the target video according to the acquired multiple code words.

Optionally, the generating, according to the obtained multiple code words, the decoded data of the target video includes:

taking the multiple code words as inverse quantization reconstruction values, or processing the multiple code words to obtain inverse quantization reconstruction values;

and generating the decoding data of the target video according to the inverse quantization reconstruction value.

Optionally, the multiple pieces of codeword indication information include multiple pieces of first indication information and multiple pieces of second indication information, where the second indication information is used to indicate quantization results obtained by other quantization that is not vector quantization;

the obtaining of the decoded data of the target video according to the multiple codeword indication information includes:

acquiring a code word corresponding to each first indication information in the plurality of first indication information, and acquiring a quantization result corresponding to each second indication information in the plurality of second indication information;

and generating the decoded data of the target video according to the acquired multiple code words and the quantization result corresponding to each piece of second indication information.

Optionally, the video code stream of the target video further includes at least one codebook index, where the at least one codebook index is used to indicate a codebook used when determining a codeword corresponding to each first indication information in the multiple codeword indication information.

Optionally, the video code stream of the target video further includes an enable identifier of vector quantization, where the enable identifier of vector quantization is used to indicate whether the video code stream of the target video supports vector quantization.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information included in a video code stream of the target video corresponds to a vector quantization switch identifier, where the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes first indication information obtained through vector quantization.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information included in the video code stream of the target video corresponds to a vector number or a vector quantization frequency, where the vector number or the vector quantization frequency is used to indicate a number of first indication information included in the corresponding at least one codeword indication information.

Optionally, the at least one codeword indication information corresponding to each transform coefficient matrix further corresponds to a vector quantization starting position, where the vector quantization starting position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information.

In another aspect, a video encoding apparatus is provided, the apparatus including:

the video coding device comprises an acquisition module, a coding module and a decoding module, wherein the acquisition module is used for acquiring a plurality of transformation coefficient matrixes corresponding to a target video, and each transformation coefficient matrix in the plurality of transformation coefficient matrixes comprises a plurality of coefficients to be coded;

the quantization module is configured to quantize coefficients to be encoded in the multiple transform coefficient matrices to obtain multiple codeword indication information, where the multiple codeword indication information is used to indicate a quantization result obtained when the coefficients to be encoded are quantized, the multiple codeword indication information includes first indication information, and the first indication information is used to indicate a codeword obtained when the coefficients to be encoded are vector quantized;

and the generating module is used for generating the video code stream of the target video according to the plurality of code word indication information.

Optionally, the quantization module is mainly configured to:

Optionally, the quantization module is further configured to:

In another aspect, a video decoding apparatus is provided, the apparatus including:

the video code stream of the target video comprises a plurality of code word indication information, the plurality of code word indication information are used for indicating a quantization result obtained when a coefficient to be coded in a plurality of transformation coefficient matrixes corresponding to the target video is quantized, the plurality of code word indication information comprise first indication information, the first indication information is used for indicating a code word obtained when the coefficient to be coded is subjected to vector quantization, and each transformation coefficient matrix in the plurality of transformation coefficient matrixes comprises a plurality of coefficients to be coded;

and the second obtaining module is used for obtaining the decoding data of the target video according to the plurality of code word indication information.

Optionally, the multiple codeword indication information is multiple first indication information, and the second obtaining module is mainly configured to:

Optionally, the second obtaining module is mainly configured to:

Optionally, the multiple pieces of codeword indication information include multiple pieces of first indication information and multiple pieces of second indication information, where the second indication information is used to indicate quantization results obtained by other quantization that is not vector quantization; the second obtaining module is further mainly configured to:

a processor;

a memory for storing processor-executable instructions;

wherein the processor executes executable instructions in the memory to perform the above-described video encoding method.

a processor;

a memory for storing processor-executable instructions;

wherein the processor executes executable instructions in the memory to perform the above-described video decoding method.

In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which, when executed by a computer, implements the steps of the video encoding method or the video decoding method described above.

In another aspect, a computer program product comprising instructions is provided, which when run on a computer, causes the computer to perform the steps of the above-described video encoding or video decoding method.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the embodiment of the application, a vector quantization method may be adopted to quantize coefficients to be encoded in a plurality of transform coefficient matrices to obtain corresponding first indication information, and then a video code stream of a target video is generated according to a plurality of code word indication information including the first indication information. Subsequently, the decoding end can decode the vector quantized coefficient according to the first indication information. Therefore, the embodiment of the application provides a video coding and decoding method based on vector quantization, the application of the vector quantization in video coding and decoding is realized, and compared with scalar quantization and network coding quantization, the quantization performance is improved, and further the performance of the video coding and decoding is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a system architecture diagram related to video encoding and decoding provided by an embodiment of the present application;

fig. 2 is a flowchart of a video encoding method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a scanning area provided in an embodiment of the present application;

fig. 4 is a flowchart of a video decoding method according to an embodiment of the present application;

fig. 5 is a block diagram of a video encoding apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of a video decoding apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server for video encoding or video decoding according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before explaining the embodiments of the present application in detail, a system architecture related to the embodiments of the present application will be described.

Fig. 1 is a system architecture diagram according to a video encoding and decoding method provided in an embodiment of the present application. As shown in fig. 1, the system includes a video encoding apparatus 101 and a video decoding apparatus 102. Among them, the video encoding device 101 and the video decoding device 102 can communicate.

The video encoding device 101 is configured to obtain a video signal, obtain a plurality of transform coefficient matrices according to the video signal, and quantize and encode coefficients to be encoded in the plurality of transform coefficient matrices by using the method provided in the embodiment of the present application, thereby obtaining a video code stream. And sending the generated video code stream to the video decoding device 102.

Among other things, the video encoding device 101 may include a prediction module 1011, a transform module 1012, a quantization module 1013, and an entropy encoding module 1014.

The prediction module 1011 includes an intra-frame prediction unit and an inter-frame prediction unit, where the intra-frame prediction unit may predict pixels of a current coding block by using reconstructed pixels of coded blocks around the current coding block, so as to obtain a prediction block; the inter prediction unit may predict pixels of a currently not-encoded image frame using reconstructed pixels of a temporally adjacent encoded image frame, thereby obtaining a prediction block. The prediction module 1011 may obtain a prediction block by any of the prediction units described above, and then calculate a residual signal from the current coding block and the prediction block.

The transform module 1012 maps the residual signal of the image to a transform domain to obtain a transform coefficient matrix, so that the energy of the image in the transform domain is more concentrated, and the frequency domain correlation of the signal is removed.

The quantization module 1013 maps the coefficients to be encoded in the transform coefficient matrix obtained by the transform module 1012 into codewords, and this mapping is a "many-to-one" process, which is not reversible and may cause signal loss. But the advantage is that the value range of the signal can be greatly reduced, thereby a good approximation of the original signal can be given by a small number of symbols, and the compression rate is improved.

The entropy coding module 1014 is a lossless coding method based on the information entropy principle, and converts the codeword index corresponding to the codeword mapped by the quantization module 1013 or other information for indicating the codeword into a binary code stream, thereby generating a video code stream.

The video decoding device 102 receives the video code stream sent by the video encoding device 101, and decodes the video code stream according to the code word indication information contained in the video code stream, thereby obtaining decoded data.

Among other things, the video decoding apparatus 102 may include an entropy decoding module 1021, an inverse quantization module 1022, an inverse transform module 1023, a reconstruction module 1024, and a filtering module 1025.

The entropy decoding module 1021 is configured to receive a video code stream sent by the video encoding device 101, and since the video code stream is a binary code stream obtained by the entropy encoding module 1014 by converting the codeword indication information, the entropy decoding module 1021 may perform entropy decoding on the video code stream to obtain codeword indication information included in the video code stream.

The inverse quantization module 1022 is configured to perform inverse quantization on the codeword indication information obtained by the entropy decoding module 1021 to obtain an inverse quantization reconstruction value.

The inverse transform module 1023 is configured to perform DCT inverse transform on the inverse quantized reconstructed values obtained by the inverse quantization module 1022, thereby obtaining a plurality of transform coefficient matrices.

The reconstruction module 1024 is configured to generate a reconstructed image according to the plurality of transform coefficient matrices obtained by the inverse transform module 1023.

The filtering module 1025 is used for enhancing the reconstructed image obtained by the reconstruction module 1024, so that the reconstructed image is closer to the original image, the influence of blocking effect and ringing effect is reduced, and the quality of the reconstructed image is improved.

It should be noted that, in the embodiment of the present application, the video encoding apparatus 101 may be a terminal apparatus, such as a smart phone, a tablet computer, a notebook computer, and a desktop computer, and the video encoding apparatus 101 may also be a server or a server cluster. The video decoding device 102 may be a terminal device, for example, a tablet computer, a desktop computer, or the like, and the video decoding device 102 may also be a server or a server cluster, which is not limited in this embodiment.

Next, a video encoding method provided in an embodiment of the present application will be described.

Fig. 2 is a flowchart of a video encoding method according to an embodiment of the present application. The method can be applied to a video encoding device in the aforementioned system architecture, as shown in fig. 2, and includes the following steps:

step 201: the method comprises the steps of obtaining a plurality of transformation coefficient matrixes corresponding to a target video, wherein each transformation coefficient matrix in the plurality of transformation coefficient matrixes comprises a plurality of coefficients to be coded.

In the embodiment of the present application, DCT (Discrete Cosine Transform) Transform is performed on a residual signal of a target video. In the transformation process, DCT transformation is performed on each TB (Transform Block) of each frame of video image in the target video, so as to obtain a Transform coefficient matrix corresponding to each TB. In this way, one or more matrices of transform coefficients may be obtained for a frame of video image.

In the embodiment of the present application, each of the plurality of transform coefficient matrices includes a plurality of coefficients to be encoded.

In one possible implementation manner, the plurality of coefficients to be encoded included in each of the plurality of transform coefficient matrices may be all transform coefficients included in the corresponding transform coefficient matrix.

Optionally, in another possible implementation manner, the multiple coefficients to be encoded included in each transform coefficient matrix may be transform coefficients in a first region in the corresponding transform coefficient matrix, where the first region is a region scanned by a scanning region-based coefficient encoding SRCC technique.

That is, in the embodiment of the present application, the video encoding apparatus may determine, by using the SRCC technique, the abscissa SRx of the rightmost non-zero coefficient and the ordinate SRy of the bottommost non-zero coefficient in the transform coefficient matrix of the TB, determine a scanning area to be scanned in the transform coefficient matrix, that is, a first area, as shown in fig. 3, by using (SRx, SRy), and then encode the transform coefficient in the first area determined by (SRx, SRy).

Step 202: the method comprises the steps of quantizing a coefficient to be coded in a plurality of transformation coefficient matrixes to obtain a plurality of code word indication information, wherein the plurality of code word indication information are used for indicating a quantization result obtained when the coefficient to be coded is quantized, the plurality of code word indication information comprise first indication information, and the first indication information is used for indicating a code word obtained when the coefficient to be coded is subjected to vector quantization.

As can be seen from the foregoing step 201, each transform coefficient matrix includes a plurality of coefficients to be encoded, based on which the video encoding apparatus may quantize the plurality of coefficients to be encoded in each transform coefficient matrix.

Illustratively, in one possible implementation manner, the video encoding device performs vector quantization or hybrid quantization on a plurality of coefficients to be encoded in each of a plurality of transform coefficient matrices to obtain at least one codeword information corresponding to each transform coefficient matrix, where the hybrid quantization includes vector quantization and quantization other than vector quantization, and the at least one codeword information corresponding to each transform coefficient matrix includes first indication information, and the first indication information is used for indicating a codeword obtained in the process of performing vector quantization on all or part of coefficients to be encoded in the corresponding transform coefficient matrix.

Taking a first transform coefficient matrix in the multiple transform coefficient matrices as an example, the video coding device may determine whether the number of coefficients to be coded in the first transform coefficient matrix is N times of a codeword dimension, if the number of coefficients to be coded in the first transform coefficient matrix is N times of the codeword dimension, generate N input signal vectors according to all the coefficients to be coded in the first transform coefficient matrix, where N is a positive integer, the dimension of the input signal vector is equal to the codeword dimension, and determine first indication information corresponding to each input signal vector in the N input signal vectors; and taking the determined N pieces of first indication information as at least one piece of code word indication information corresponding to the first transformation coefficient matrix.

It should be noted that the codeword dimension refers to a dimension of a codeword included in a preset reference codebook to be used for vector quantization. The reference codebook comprises a plurality of code words, and each code word is a vector of the dimension of the code word. For example, the codeword dimension may be 8, 16, etc. In the embodiment of the present application, each codeword may correspond to one codeword indication information.

If the number of the coefficients to be coded contained in the first transform coefficient matrix is an integer multiple of the code word dimension, it means that the coefficients to be coded contained in the first transform coefficient matrix can just form N vectors of the code word dimension, and based on this, the video coding apparatus can generate N input signal vectors according to the code word dimension.

For example, assuming that the code word dimension is 8 and the number of coefficients to be encoded contained in the first transform coefficient matrix is 24, the 24 coefficients to be encoded in the first transform coefficient matrix may constitute 3 8-dimensional input signal vectors according to the code word dimension.

After obtaining the N input signal vectors, the video encoding device may search a codeword matching each input signal vector from codewords included in the reference codebook, further obtain indication information of a corresponding codeword according to the searched codeword corresponding to each input signal vector, and use the obtained indication information as first indication information corresponding to the corresponding input signal vector. Thereafter, the video encoding apparatus may use the first indication information corresponding to each input signal vector as a plurality of codeword indication information corresponding to the first transform coefficient matrix.

The first indication information may be a codeword index, codeword index indication information, or indication information of a codeword component included in a codeword.

When the first indication information is a codeword index, the codeword index may be a one-dimensional index value, or may also be a multidimensional index coordinate. For example, for a poly-type codebook, the codeword index of a codeword is a one-dimensional index value, that is, a specific index value can correspond to a codeword; for a lattice codebook, one codeword corresponds to one multidimensional index coordinate, e.g., index coordinate (i, j) corresponds to one codeword.

When the first indication information is codeword index indication information, the codeword index indication information may be information related to a codeword index, that is, the codeword index indication information can indicate a codeword index. For example, a certain codebook includes 31 code words, the 31 code words are divided into 5 groups, each group is numbered, the first group is numbered 0, the second group is numbered 1, the third group is numbered 2, the fourth group is numbered 3, and the fifth group is numbered 4; meanwhile, the code word in the first group is 1, the code words in the second group are 2 and 3, the code word in the third group is 4 to 7, the code word in the fourth group is 8 to 15, and the code word in the fifth group is 16 to 31. And then, obtaining code word index indicating information corresponding to a code word index corresponding to the corresponding code word according to the group number of the group in which the code word is positioned and the offset in the group, wherein the code word index indicating information comprises the group number of the group in which the code word is positioned and the offset in the group.

When the first indication information is indication information of a codeword component included in a codeword, the indication information of the codeword component may be information obtained by directly encoding the codeword component included in the codeword. For example, in some specially constructed codebooks, for some codewords designed artificially, each codeword component in a codeword may be an integer, in which case, each codeword component in a codeword may be directly encoded to obtain indication information of the codeword component.

Alternatively, if the number of coefficients to be encoded in the first transform coefficient matrix is not an integer multiple of the number of dimensions of the codeword, the coefficients to be encoded in the first transform coefficient matrix may be subjected to full vector quantization, or may be subjected to hybrid quantization.

If all the vectors of the coefficients to be coded in the first transformation coefficient matrix are quantized, R input signal vectors are generated according to all the coefficients to be coded in the first transformation coefficient matrix and the dimension of the code word, and the dimension of each input signal vector is equal to the dimension of the code word. Wherein one of the R input signal vectors comprises one or more coefficient fill values. Then, first indication information corresponding to each input signal vector in the R input signal vectors is determined. And taking the determined R pieces of first indication information as at least one piece of code word indication information corresponding to the first transformation coefficient matrix.

If the number of the coefficients to be coded contained in the first transformation coefficient matrix is not an integer multiple of the code word dimension, it means that the coefficients to be coded in the first transformation coefficient matrix cannot form an integer number of vectors of the code word dimension. In this case, if the plurality of coefficients to be encoded are to be subjected to full vector quantization, the video encoding apparatus may first calculate an integer R and a remainder t obtained by dividing the number of coefficients to be encoded by the codeword dimension, and then the video encoding apparatus may generate R input signal vectors from the plurality of coefficients to be encoded included in the first transform coefficient matrix, where R is R +1, and a (codeword dimension-t) coefficient padding value may be included in the last input signal vector.

For example, the first transform coefficient matrix includes 20 coefficients to be encoded, the number of dimensions of the codeword is 8, and the 20 division by 8 is calculated to obtain an integer 2 and a remainder 4, so that 3 8-dimensional input signal vectors can be generated according to the 20 coefficients to be encoded, wherein the last input signal vector in the 3 input signal vectors will include 4 coefficient padding values.

After obtaining the input signal vectors, for each input signal vector, the video encoding apparatus may determine, in the manner described above, first indication information corresponding to each input signal vector, and use the determined first indication information corresponding to each input signal vector as at least one codeword indication information corresponding to the first transform coefficient matrix.

Alternatively, in a case where the number of coefficients to be encoded in the first transform coefficient matrix is not an integer multiple of the codeword dimension, if the coefficients to be encoded in the first transform coefficient matrix are subjected to hybrid quantization, the video encoding apparatus may generate M input signal vectors from a part of the coefficients to be encoded in the first transform coefficient matrix and the codeword dimension, the dimension of the input signal vector being equal to the codeword dimension. And then, determining first indication information corresponding to each input signal vector in the M input signal vectors, and performing other non-vector quantization on the residual coefficients to be coded in the first transformation coefficient matrix to obtain second indication information corresponding to the residual coefficients to be coded respectively.

The video encoding device may first calculate an integer r and a remainder t obtained by dividing the number of coefficients to be encoded by the code dimension, and then, the video encoding device may generate M input signal vectors of the code dimension according to the plurality of coefficients to be encoded included in the first transform coefficient matrix, where M is r. For the remaining t to-be-coded coefficients, other quantization other than vector quantization may be used to quantize the to-be-coded coefficients, so as to obtain second indication information corresponding to each to-be-coded coefficient. In this way, some of the coefficients to be coded in the plurality of coefficients to be coded included in the first transform coefficient matrix are vector quantized to obtain corresponding first indication information, and some of the coefficients to be coded are quantized other than vector quantized to obtain corresponding second indication information. On the basis, the video coding device can use the obtained first indication information and the second indication information as a plurality of code word indication information corresponding to the first transformation coefficient matrix.

For example, the coefficients to be coded in the first transform coefficient matrix are 20 coefficients, the dimension of the vector is 8, and the number of the vectors is 3, in this case, the first 16 coefficients to be coded in the first transform coefficient matrix are generated into 2 8-dimensional input signal vectors, the first indication information corresponding to each input signal vector is obtained, and the remaining 4 coefficients to be coded are subjected to other quantization without vector quantization to obtain the corresponding second indication information. Optionally, other quantization that is not vector-quantized may be performed on the first 4 to-be-encoded coefficients of the 20 to-be-encoded coefficients in the first transform coefficient matrix to obtain corresponding second indication information, and the remaining 16 to-be-encoded coefficients are combined into 2 8-dimensional input signal vectors. Wherein each of the 2 input signal vectors corresponds to the first indication information. And then, using the determined 2 pieces of first indication information and second indication information as a plurality of pieces of code word indication information corresponding to the first transformation coefficient matrix.

It should be noted that, in the above hybrid quantization process, as for the coefficient to be encoded remaining after vector quantization, other quantization that is not vector quantization performed by the video encoding apparatus may refer to at least one of scalar quantization, trellis encoding quantization, or other quantization, which is not limited in this embodiment of the present application.

Optionally, in another possible implementation manner, the video encoding device may perform vector quantization or hybrid quantization on a plurality of coefficients to be encoded in each target transform coefficient matrix that satisfies a vector quantization condition among the plurality of transform coefficient matrices, to obtain at least one codeword indication information corresponding to each target transform coefficient matrix. The mixed quantization comprises vector quantization and other quantization except for the vector quantization, at least one code word information corresponding to each target transformation coefficient matrix comprises first indication information, and the first indication information is used for indicating a code word obtained in the process of performing the vector quantization on all or part of coefficients to be coded in the corresponding transformation coefficient matrix.

It should be noted that, as can be seen from the foregoing description in step 201, the multiple coefficients to be encoded included in each of the multiple transform coefficient matrices may be all transform coefficients included in the corresponding transform coefficient matrix, or may be transform coefficients in the first region in the corresponding transform coefficient matrix. When the plurality of coefficients to be encoded included in each of the plurality of transform coefficient matrices are all transform coefficients included in the corresponding transform coefficient matrix, the vector quantization condition may include at least one of a number of rows of the transform coefficient matrices reaching a first row number threshold, a number of columns of the transform coefficient matrices reaching a first column number threshold, or an area of the transform coefficient matrices reaching a first area threshold. That is, for any one transform coefficient matrix, if the transform coefficient matrix satisfies one or more of the above conditions, all transform coefficients in the corresponding transform coefficient matrix may be vector-quantized or hybrid-quantized. The area of the transform coefficient matrix is the product of the number of rows and the number of columns of the transform coefficient matrix.

Alternatively, when the plurality of coefficients to be encoded included in each of the plurality of transform coefficient matrices are transform coefficients within a first region of the corresponding transform coefficient matrix, the vector quantization condition may include at least one of a number of rows of the first region reaching a second row number threshold, or a number of columns of the first region in the transform coefficient matrix reaching a second column number threshold, or an area of the first region in the transform coefficient matrix reaching a second area threshold. That is, for any one transform coefficient matrix, if the first region in the transform coefficient matrix satisfies one or more of the above-described conditions, the transform coefficients contained in the first region in the corresponding transform coefficient matrix may be vector-quantized or hybrid-quantized. The area of the first region is the product of the number of rows and the number of columns of the first region.

In the embodiment of the present application, the video encoding apparatus may acquire a target transform coefficient matrix satisfying the vector quantization condition from among the plurality of transform coefficient matrices by the vector quantization condition described above. For each target transform coefficient matrix, the video encoding apparatus may perform vector quantization or hybrid quantization on the coefficient to be encoded included in each target transform coefficient matrix with reference to the aforementioned quantization method for the coefficient to be encoded in the first transform coefficient matrix. For the variable coefficient matrix that does not satisfy the vector quantization condition, the video encoding apparatus may quantize the coefficient to be encoded in a quantization manner other than vector quantization and hybrid quantization, for example, quantize the coefficient by using scalar quantization or trellis encoding quantization, which is not limited in this embodiment of the present application. In this way, at least one codeword indication information corresponding to the target transform coefficient matrix satisfying the vector quantization condition may be all the first indication information, and may also include the first indication information and second indication information obtained by other quantization methods other than vector quantization. At least one code word indicating information corresponding to the transformation coefficient matrix which does not meet the vector quantization condition is the second indicating information obtained by adopting other quantization modes which are not vector quantization, namely, the first indicating information is not included.

The above is only a possible implementation manner of two vector quantization conditions provided in this embodiment, and optionally, the vector quantization condition may also be a condition determined according to block division information of a TB corresponding to a transform coefficient matrix, or a condition set according to a quantization parameter corresponding to a specific quantization manner, or another condition, which is not limited in this embodiment of the present application.

Step 203: and generating a video code stream of the target video according to the plurality of code word indication information.

In the embodiment of the application, after the coefficients to be encoded in the multiple transform coefficient matrices corresponding to the target video are quantized to obtain the multiple codeword indication information, the video encoding device may generate the video code stream of the target video according to the multiple codeword indication information.

For example, the video encoding device may perform binary encoding on at least one codeword indication information corresponding to each transform coefficient matrix according to the sequence of the TBs corresponding to each transform coefficient matrix, so as to obtain a video code stream of the target video.

Alternatively, as described in step 202, the coefficients to be coded in each transform coefficient matrix may be all subjected to vector quantization, may be partially subjected to vector quantization, or may not be subjected to vector quantization at all. In this way, at least one codeword indication information corresponding to each transform coefficient matrix may be the first indication information, may also include the first indication information and indication information corresponding to another quantization method such as the second indication information, or may also not include the first indication information. Based on this, in the embodiment of the present application, after the video encoding device performs binary encoding on at least one codeword indication information corresponding to each transform coefficient matrix, a corresponding vector quantization switch identifier may also be added before at least one codeword indication information corresponding to each transform coefficient matrix, so as to obtain a video code stream of the target video. The vector quantization switch identifier may be configured to identify whether the corresponding at least one codeword indication information includes first indication information obtained through vector quantization.

Optionally, after the video encoding device performs binary encoding on the at least one codeword indication information corresponding to each transform coefficient matrix, the video encoding device may further add a corresponding number of vectors or a corresponding number of vector quantization times before the at least one codeword indication information corresponding to each transform coefficient matrix to generate a video code stream of the target video, where the number of vectors or the number of vector quantization times is used to indicate the number of the first indication information included in the corresponding at least one codeword indication information.

For example, when the coefficients to be encoded in a certain transform coefficient matrix form 4 input signal vectors, 4 codewords are obtained according to the 4 input signal vectors, and 4 pieces of first indication information are obtained according to the 4 codewords, at this time, the number of vectors or the number of vector quantization times is 4.

Optionally, a corresponding vector quantization starting position may be further added before the at least one codeword indication information corresponding to each transform coefficient matrix, where the vector quantization starting position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information. In this way, when at least one codeword indication information corresponding to a certain transform coefficient matrix includes both the first indication information and other indication information, it is possible to know from which position the first indication information obtained by vector quantization starts from the vector quantization start position. On the basis, if the number of vectors is also corresponding, which indication information in at least one code word indication information corresponding to the transformation coefficient matrix is the first indication information can be determined according to the vector quantization initial position and the vector quantization number, so that the subsequent decoding is facilitated.

Optionally, the video code stream of the target video may further include an enable identifier of vector quantization, where the enable identifier of vector quantization is used to indicate whether the video code stream of the target video supports vector quantization. That is, if a method of vector quantization or hybrid quantization is supported in the process of quantizing a coefficient to be encoded in a plurality of transform coefficient matrices corresponding to a target video, an enable identifier of vector quantization may be added before a plurality of code word indication information included in a video code stream of the target video. Therefore, the subsequent decoding end can know whether the video code stream of the target video supports vector quantization or not through the enabling identification.

Optionally, the video code stream of the target video may further include at least one vector dimension, where the at least one vector dimension is used to indicate a dimension of a codeword indicated by each piece of first indication information in the video code stream of the target video.

If the codeword dimension adopted when vector quantization is performed on a plurality of coefficients to be coded in each transform coefficient matrix is the same, the video code stream of the target video will include a vector dimension, in which case the vector dimension may be added before a plurality of codeword indication information included in the video code stream of the target video.

If the codeword dimensions used when vector quantizing a plurality of coefficients to be encoded in different transform coefficient matrices are the same or different, and the codeword dimensions used when vector quantizing the coefficients to be encoded in the same transform coefficient matrix are the same, the video code stream of the target video may include a plurality of vector dimensions, and at this time, a corresponding vector dimension may be added before at least one codeword indication information corresponding to each transform coefficient matrix.

For example, the target video corresponds to 10 transform coefficient matrices, wherein, the coefficients to be coded in 5 transform coefficient matrices are vector quantized by using 8-dimensional code words, and the coefficients to be coded in the other 5 transform coefficient matrices are vector quantized by using 16-dimensional code words. In this case, two vector dimensions, 8 and 16 respectively, are included in the video code stream of the target video, and the corresponding vector dimensions may be added before the at least one codeword indication information corresponding to each transform coefficient matrix.

If the codeword dimensions used for vector quantization of the multiple coefficients to be encoded in the transform coefficient matrix may also be different, the video code stream of the target video may include multiple vector dimensions. In this case, the corresponding vector dimension may be bound for the first indication information included in the at least one codeword indication information corresponding to each transform coefficient matrix.

For example, part of coefficients to be encoded in a certain transform coefficient matrix is vector quantized by using 8-dimensional code words, and part of coefficients to be encoded is vector quantized by using 16-dimensional code words, and at this time, for the first indication information obtained by 8-dimensional code word quantization, the vector dimension 8 may be bound, and for the first indication information obtained by 16-dimensional code word quantization, the vector dimension 16 may be bound.

Optionally, the video code stream of the target video may further include at least one codebook index, where the at least one codebook index is used to indicate a codebook used when determining a codeword corresponding to each piece of first indication information in the video code stream of the target video.

When the same codebook is used for vector quantization of a plurality of coefficients to be coded in each of a plurality of transform coefficient matrixes, the video code stream of the target video comprises a codebook index. At this time, the codebook index may be added before the multiple codeword indication information included in the video code stream of the target video, so that a subsequent decoding end obtains a corresponding codebook according to the codebook index to decode the video code stream of the target video.

When different code books are adopted to carry out vector quantization on the coefficient to be coded in the multiple transformation coefficient matrixes, and the code books adopted by the coefficient to be coded in the same transformation coefficient matrix are the same, the video code stream of the target video comprises multiple code book indexes. In this case, a corresponding codebook index may be added before at least one codeword indication information corresponding to each transform coefficient matrix to indicate by which codebook quantization each first indication information in the corresponding transform coefficient matrix is obtained.

When vector quantization is performed on a plurality of coefficients to be coded included in the transform coefficient matrix by using different codebooks, the video stream of the target video also includes a plurality of codebook indexes. In this case, the first indication information included in the at least one codeword indication information corresponding to each transform coefficient matrix may be bound to a corresponding codebook index to indicate by which codebook quantization the corresponding first indication information is obtained.

It should be noted that the video code stream of the target video may include one or more of a vector quantization switch identifier, an enable identifier of vector quantization, a number of vectors or a number of vector quantization times, a vector quantization start position, a vector dimension, and a codebook index, which is not limited in this embodiment of the present application.

In the embodiment of the application, a vector quantization method is adopted to quantize the coefficient to be coded in the multiple transformation coefficient matrixes to obtain the corresponding first indication information, and then the video code stream of the target video is generated according to the multiple code word indication information containing the first indication information. Subsequently, the decoding end can decode the vector quantized coefficient according to the first indication information. Therefore, the embodiment of the application provides a video coding and decoding method based on vector quantization, the application of the vector quantization in video coding and decoding is realized, and compared with scalar quantization and network coding quantization, the quantization performance is improved, and further the performance of the video coding and decoding is improved.

After the target video is encoded by the method to obtain the video code stream of the target video, the video encoding device can send the video code stream of the target video to the video decoding device. Next, the process is repeated. The video decoding device can decode the video code stream of the target video by the video decoding method provided by the embodiment of the application.

Fig. 4 is a flowchart of a video decoding method according to an embodiment of the present application. The method can be applied to a video decoding device in the system architecture shown in fig. 1, and as shown in fig. 4, the method comprises the following steps:

step 401: the method comprises the steps of obtaining a video code stream of a target video, wherein the video code stream of the target video comprises a plurality of code word indicating information, the plurality of code word indicating information comprise first indicating information, and the first indicating information is used for indicating code words obtained when vector quantization is carried out on a coefficient to be coded.

The video decoding device receives the video code stream of the target video sent by the video coding device, or the video decoding device can also acquire the video code stream of the target video sent by the video coding device to the storage device from the storage device.

The video code stream of the target video comprises a plurality of code word indicating information, and the plurality of code word indicating information comprise first indicating information. That is, the video code stream of the target video is obtained by performing partial or complete vector quantization on the coefficients to be encoded included in the plurality of transform coefficient matrices corresponding to the target video. For possible implementation manners of the multiple codeword indication information, reference may be made to the description in step 202 in the foregoing embodiment, and details of the embodiment of the present application are not repeated herein.

Optionally, the video code stream of the target video may further include one or more of a vector quantization switch identifier, an enable identifier of vector quantization, a number of vectors or a number of vector quantization times, a vector quantization start position, a vector dimension, and a codebook index, and the detailed description refers to the description in step 203 in the foregoing embodiment, which is not described herein again in this embodiment of the present application.

Step 402: and acquiring the decoded data of the target video according to the plurality of code word indication information.

After the video decoding device acquires the video code stream of the target video, the video decoding device may sequentially perform inverse quantization on each codeword indication information from the first codeword indication information of the video code stream to obtain a quantization result corresponding to each codeword indication information, and then generate decoded data of the target video according to the obtained quantization result. The quantization result corresponding to the codeword indication information may be a codeword corresponding to the first indication information, or may be a quantization result corresponding to the second indication information.

As can be seen from the foregoing description in step 202, the indication information of the multiple code words of the video stream of the target video may be all the first indication information, or may include the first indication information and other indication information. Based on this, in the embodiment of the present application, the decoded data of the target video can be acquired through the following two implementations.

In a first implementation manner, when all of the plurality of codeword indication information is the first indication information, the video decoding device may sequentially obtain a codeword corresponding to each of the plurality of first indication information; and generating the decoding data of the target video according to the acquired multiple code words.

Illustratively, the video decoding device first acquires a codebook for inverse quantization of a plurality of first indication information, and then performs inverse quantization on each first indication information according to the acquired codebook to obtain a codeword corresponding to each first indication information.

In a first possible case, the plurality of first indication information are quantized by using the same codebook, so that the video encoding device and the video decoding device can agree in advance on the adopted target codebook, and at this time, the video decoding device acquires the target codebook. Thereafter, the video decoding apparatus may obtain a codeword corresponding to each first indication information from the target codebook.

In a second possible case, the multiple first indication information are quantized by using different codebooks, and at least one first indication information corresponding to the same transform coefficient matrix in the multiple codeword indication information will correspond to the same codebook index. At this time, when the video decoding device performs inverse quantization on each piece of first indication information in sequence, when a first codebook index is detected, a target codebook is obtained according to the codebook index, inverse quantization is performed on the first indication information after the codebook index by using the target codebook until a next codebook index is detected, a target codebook corresponding to the next codebook index is obtained again, inverse quantization is performed on the first indication information after the next codebook index according to the newly obtained target codebook, and so on.

In a third possible case, the multiple pieces of first indication information are obtained by using different codebook quantizations, where each piece of first indication information is bound with a codebook index for indicating its corresponding codebook. In this case, the video decoding apparatus obtains the codebook corresponding to the corresponding first indication information according to the codebook index bound by each piece of first indication information and used for indicating the codebook corresponding to itself. Then, the video decoding device may obtain a codeword corresponding to each first indication information from a codebook corresponding to each first indication information.

It should be noted that, when performing inverse quantization according to the first indication information corresponding to the obtained target codebook, the video decoding apparatus may obtain, according to the first indication information, a codeword corresponding to the first indication information from the corresponding target codebook.

When the first indication information is a codeword index and the codeword index is a one-dimensional index value, since one index value can correspond to one codeword, a mapping relationship between the index value and the codeword can be stored in the video decoding device, so that the video decoding device can obtain the codeword corresponding to each codeword index by looking up a table, thereby obtaining the codeword corresponding to each first indication information.

When the first indication information is a codeword index and the codeword index is a multi-dimensional index coordinate, the video decoding device may determine, according to the multi-dimensional index coordinate, a codeword corresponding to the codeword index through a correlation formula.

When the first indication information is codeword index indication information, the video decoding device may first obtain a corresponding codeword index according to the codeword index indication information, and then obtain a corresponding codeword according to the codeword index.

Still illustratively, taking the example that the codeword index indication information described above includes the codeword group number and the offset as an example, the video decoding device may calculate the codeword index according to the group number and the intra-group offset included in the codeword index indication information by the following formula.

Code word index (1< < group number) + offset-1 in group

After obtaining the codeword index, the video decoding device may obtain a corresponding codeword from the mapping relationship table of the codeword index and the codeword according to the codeword index.

When the first indication information is indication information of a codeword component contained in a codeword, the video decoding device may obtain the codeword component through the indication information of the codeword component, and further combine the obtained codeword component into the codeword.

For example, as described in the foregoing embodiment, the first indication information is indication information of a codeword component obtained by encoding each codeword component of a codeword, in this case, the video decoding apparatus may directly decode the indication information of each codeword component, so as to obtain a corresponding codeword component, and combine the obtained codeword components into a codeword.

After obtaining the codeword corresponding to each first indication information, the video decoding device may generate decoded data of the target video according to the multiple codewords.

In one implementation, the multiple code words are directly used as inverse quantization reconstruction values, and the video decoding device generates the decoded data of the target video according to the inverse quantization reconstruction values of the multiple code words.

Optionally, in a possible implementation manner, the multiple code words are respectively calculated through a correlation formula, and inverse quantization reconstruction values corresponding to the multiple code words are generated. For example, a codeword may be multiplied by a number, or added with a number, or shifted to generate an inverse quantized reconstructed value corresponding to the codeword. Then, the video decoding apparatus generates decoded data of the target video from the dequantized reconstructed value obtained by calculating the plurality of code words.

Optionally, in a second implementation manner, when the plurality of codeword indication information includes first indication information and second indication information, the video decoding apparatus may obtain a codeword corresponding to each of the plurality of first indication information, and obtain a quantization result corresponding to each of the plurality of second indication information; and generating the decoded data of the target video according to the acquired multiple code words and the quantization result corresponding to each piece of second indication information.

In this implementation manner, at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information included in a video code stream of the target video may correspond to a vector quantization switch identifier, where the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes first indication information obtained through vector quantization.

Based on this, in a first possible case, the at least one codeword indication information corresponding to each transform coefficient matrix is the first indication information, so that the vector quantization switch identifier corresponding to the at least one codeword indication information will be able to identify whether the at least one codeword indication information is the first indication information or the second indication information. In this case, when the video decoding apparatus performs inverse quantization on the multiple pieces of codeword indication information, and when it is detected that the vector quantization switch identifier is the on identifier, it may be known that the codeword indication information after the vector quantization switch identifier is the first indication information. Then, the video decoding device may refer to the method described in the first implementation manner, obtain a codebook for performing inverse quantization on each first indication information after the vector quantization switch identifier, and obtain a codeword corresponding to each first indication information from the obtained codebook, until a next vector quantization switch identifier is detected, if the next vector quantization switch identifier is still an on identifier, the processing may be continued with reference to the above operation, if the next vector quantization switch identifier is an off identifier, the video decoding device may know that the codeword indication information after the next vector quantization switch identifier is the second indication information, so that the video decoding device may sequentially perform inverse quantization on the second indication information after the next vector quantization switch identifier based on other quantization methods other than vector quantization until the vector quantization switch identifier is detected again, and so on.

Optionally, in the first possible case, the at least one codeword indication information corresponding to each transform coefficient matrix may also correspond to a vector number or a vector quantization number, where the vector number or the vector quantization number is used to indicate the number of the first indication information included in the corresponding at least one codeword indication information. Therefore, when the video decoding device detects that the vector quantization switch identifier is the opening identifier, the vector number or the vector quantization times can be obtained, and according to the vector number or the vector quantization times, the fact that the instruction information of a plurality of code words behind the vector quantization switch identifier is the first instruction information corresponding to one transformation coefficient matrix is obtained.

In a second possible case, the at least one codeword indication information corresponding to each transform coefficient matrix may include both the first indication information and the second indication information, and in this case, the at least one codeword indication information corresponding to each transform coefficient matrix may also correspond to the number of vectors or the number of vectorization times. At this time, in a possible implementation manner, the video decoding apparatus and the video encoding apparatus may agree in advance that the first a after each vector quantization switch identifier is the first indication information, where a is the number of vectors or the number of vector quantization times. On this basis, after detecting that the vector quantization switch identifier is the start identifier, the video decoding device obtains the first a codeword indication information after the start identifier, and at this time, the first a codeword indication information is the first indication information. Then, the video decoding apparatus may obtain codebooks corresponding to the first a pieces of first indication information by using the aforementioned manner of obtaining codebooks, and further obtain a codeword corresponding to each piece of first indication information from the obtained codebooks. After obtaining the codeword corresponding to each first indication information, if the next vector quantization switch identifier has not been detected, it indicates that the remaining codeword indication information is the second indication information, and at this time, the video decoding apparatus may perform inverse quantization on the remaining second indication information based on another quantization method other than vector quantization to obtain a corresponding quantization result, and perform processing with reference to the above operation until the next vector quantization switch identifier is detected.

Optionally, in the second possible case, in another possible implementation manner, the at least one codeword indication information corresponding to each transform coefficient matrix corresponds to not only the vector number or the vectorization number, but also a vector quantization starting position, where the vector quantization starting position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information. Based on this, after detecting that the vector quantization switch identifier is the on identifier, the video decoding apparatus may determine, according to the vector quantization starting position, from the first codeword indication information after the on identifier, where the several codeword indication information is the first indication information, that is, where the first indication information after the on identifier is located. In this way, for the codeword indication information preceding the first indication information, the video decoding apparatus inverse-quantizes it based on other quantization methods other than vector quantization to obtain a corresponding quantization result. And then, acquiring corresponding number of code word indicating information from the first indicating information according to the number of vectors or the vector quantization times, wherein the acquired code word indicating information is the first indicating information, acquiring codebooks corresponding to the first indicating information by referring to the method, and further acquiring code words corresponding to each first indicating information from the acquired codebooks. After obtaining the codeword corresponding to each first indication information, if the next vector quantization switch identifier has not been detected, it indicates that the remaining codeword indication information is the second indication information, and at this time, the video decoding apparatus may perform inverse quantization on the remaining second indication information based on another quantization method other than vector quantization to obtain a corresponding quantization result, and perform processing with reference to the above operation until the next vector quantization switch identifier is detected.

After obtaining the quantization result corresponding to each codeword indication information in the multiple codeword indication information, the video decoding device may refer to the method described in the first implementation manner, obtain an inverse quantization reconstruction value according to each quantization result, and further generate the decoded data of the target video according to the inverse quantization reconstruction value, which is not described herein again in this embodiment of the present application.

It should be noted that, in the first implementation manner and the second implementation manner, the video stream of the target video may further include at least one vector dimension.

When the vector dimension is included before the multiple pieces of codeword indication information of the video code stream of the target video, the video decoding device can learn the codeword dimension corresponding to the first indication information in the multiple pieces of codeword indication information included in the video code stream after detecting the vector dimension. Under the condition that the video code stream of the target video does not carry the codebook index, the video decoding device can also determine the corresponding codebook according to the vector dimension, which is not limited in the embodiment of the present application.

When a vector dimension corresponds to at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information of a video code stream of a target video, after detecting a vector dimension, a video decoding device can obtain the codeword dimension corresponding to the first indication information in the codeword indication information between the vector dimension and the next vector dimension.

When each first indication information in a plurality of code word indication information of a video code stream of a target video is bound with a vector dimension, when video decoding equipment performs inverse quantization on each first indication information, the code word dimension of the code word corresponding to the first indication information can be obtained through the vector dimension corresponding to the corresponding first indication information. At this time, the vector dimension can also be used to indicate that the codeword indication information of the binding is the first indication information obtained by vector quantization.

Optionally, in this embodiment of the present application, before the multiple code word indication information included in the video code stream of the target video, an enable identifier for vector quantization may also be included, so that when the video decoding device starts decoding the video code stream of the target video, it can be known whether the video code stream of the target video supports vector quantization through the enable identifier.

In the embodiment of the application, a video code stream of a target video contains a plurality of code word indication information, the plurality of code word indication information contains first indication information obtained through vector quantization, and a video decoding device can generate decoding data corresponding to the video code stream according to the first indication information.

Next, a video encoding apparatus provided in an embodiment of the present application will be described.

Referring to fig. 5, an embodiment of the present application provides a video encoding apparatus 500, where the apparatus 500 includes:

an obtaining module 501, configured to obtain multiple transform coefficient matrices corresponding to a target video, where each transform coefficient matrix in the multiple transform coefficient matrices includes multiple coefficients to be encoded;

a quantization module 502, configured to quantize a coefficient to be encoded in a plurality of transform coefficient matrices to obtain a plurality of codeword indication information, where the plurality of codeword indication information are used to indicate a quantization result obtained when the coefficient to be encoded is quantized, and the plurality of codeword indication information include first indication information, where the first indication information is used to indicate a codeword obtained when the coefficient to be encoded is vector quantized;

a generating module 503, configured to generate a video code stream of the target video according to the multiple code word indication information.

Optionally, the quantization module 502 is mainly used for:

the method comprises the steps of carrying out vector quantization or mixed quantization on a plurality of coefficients to be coded in each of a plurality of transformation coefficient matrixes to obtain at least one piece of code word information corresponding to each transformation coefficient matrix, wherein the mixed quantization comprises vector quantization and other quantization except for the vector quantization, the at least one piece of code word information corresponding to each transformation coefficient matrix comprises first indication information, and the first indication information is used for indicating code words obtained in the process of carrying out the vector quantization on all or part of the coefficients to be coded in the corresponding transformation coefficient matrix.

Optionally, the quantization module 502 is further configured to:

and performing vector quantization or mixed quantization on a plurality of coefficients to be coded in each target transform coefficient matrix which meets the vector quantization condition in the plurality of transform coefficient matrices to obtain at least one piece of codeword indication information corresponding to each target transform coefficient matrix, wherein the mixed quantization comprises vector quantization and other quantization except for the vector quantization, the at least one piece of codeword information corresponding to each target transform coefficient matrix comprises first indication information, and the first indication information is used for indicating codewords obtained in the process of performing the vector quantization on all or part of the coefficients to be coded in the corresponding transform coefficient matrix.

Optionally, the multiple coefficients to be encoded included in each of the multiple transform coefficient matrices are transform coefficients in a first region in the corresponding transform coefficient matrix, where the first region is a region scanned by a scanning region-based SRCC encoding technique.

Optionally, the vector quantization condition includes at least one of a number of rows of the first region in the transform coefficient matrix reaching a second row number threshold, or a number of columns of the first region in the transform coefficient matrix reaching a second column number threshold, or an area of the first region in the transform coefficient matrix reaching a second area threshold.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a video code stream of the target video corresponds to a vector quantization switch identifier, and the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes first indication information obtained through vector quantization.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a video code stream of the target video corresponds to a vector number or a vector quantization frequency, and the vector number or the vector quantization frequency is used for indicating the number of the first indication information included in the corresponding at least one codeword indication information.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a video code stream of the target video further corresponds to a vector quantization start position, where the vector quantization start position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information.

Optionally, the quantization module 502 is further configured to:

if the number of the coefficients to be coded in the first transformation coefficient matrix is N times of the dimension of the code word, generating N input signal vectors according to all the coefficients to be coded in the first transformation coefficient matrix, wherein N is a positive integer, the dimension of the input signal vector is equal to the dimension of the code word, the first transformation coefficient matrix is any one of a plurality of transformation coefficient matrices, or the first transformation coefficient matrix is one of the plurality of transformation coefficient matrices which meets the vector quantization condition;

Optionally, the quantization module 502 is further configured to:

if the number of the coefficients to be coded in the first transformation coefficient matrix is not an integral multiple of the code word dimension, generating M input signal vectors according to part of the coefficients to be coded in the first transformation coefficient matrix and the code word dimension, wherein the dimension of the input signal vector is equal to the code word dimension, and the first transformation coefficient matrix is any one of a plurality of transformation coefficient matrices, or the first transformation coefficient matrix is one of the plurality of transformation coefficient matrices which meets the vector quantization condition;

determining first indication information corresponding to each input signal vector in the M input signal vectors, and performing other non-vector quantization on the residual coefficient to be coded in the first transformation coefficient matrix to obtain second indication information corresponding to the residual coefficient to be coded;

Optionally, the quantization module 502 is further configured to:

if the number of the coefficients to be coded in the first transformation coefficient matrix is not an integer multiple of the code word dimension, generating R input signal vectors according to all the coefficients to be coded and the code word dimension in the first transformation coefficient matrix, wherein the dimension of the input signal vector is equal to the code word dimension, one input signal vector in the R input signal vectors comprises one or more coefficient filling values, the first transformation coefficient matrix is any one of a plurality of transformation coefficient matrices, or the first transformation coefficient matrix is one of the plurality of transformation coefficient matrices which meets the vector quantization condition;

Optionally, the video code stream of the target video includes an enable identifier of vector quantization, and the enable identifier of vector quantization is used to indicate whether the video code stream of the target video supports vector quantization.

Optionally, the video code stream of the target video includes at least one vector dimension, and the at least one vector dimension is used to indicate a dimension of a codeword indicated by each piece of first indication information in the video code stream of the target video.

Next, a video decoding apparatus provided in an embodiment of the present application will be described.

Referring to fig. 6, an embodiment of the present application provides a video encoding apparatus 600, where the apparatus 600 includes:

a first obtaining module 601, configured to obtain a video code stream of a target video, where the video code stream of the target video includes multiple pieces of codeword indication information, the multiple pieces of codeword indication information are used to indicate a quantization result obtained when a coefficient to be coded in multiple transform coefficient matrices corresponding to the target video is quantized, the multiple pieces of codeword indication information include first indication information, the first indication information is used to indicate a codeword obtained when a coefficient to be coded is subjected to vector quantization, and each transform coefficient matrix in the multiple transform coefficient matrices includes multiple coefficients to be coded;

a second obtaining module 602, configured to obtain decoded data of the target video according to the multiple codeword indication information.

Optionally, the multiple codeword indication information is multiple first indication information, and the second obtaining module 602 is mainly configured to:

acquiring a code word corresponding to each piece of first indication information in a plurality of pieces of first indication information;

Optionally, the second obtaining module 602 is mainly configured to:

using the multiple code words as inverse quantization reconstruction values, or processing the multiple code words to obtain inverse quantization reconstruction values;

Optionally, the plurality of codeword indication information includes a plurality of first indication information and a plurality of second indication information, the second indication information indicating a quantization result obtained by other quantization than vector quantization; the second obtaining module 602 is further configured to:

Optionally, the video code stream of the target video further includes an enable identifier of vector quantization, and the enable identifier of vector quantization is used to indicate whether the video code stream of the target video supports vector quantization.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information included in a video code stream of the target video corresponds to a vector quantization switch identifier, and the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes first indication information obtained through vector quantization.

Optionally, at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information included in a video code stream of the target video corresponds to a vector number or a vector quantization frequency, and the vector number or the vector quantization frequency is used to indicate the number of first indication information included in the corresponding at least one codeword indication information.

In summary, in the embodiment of the present application, a vector quantization method may be used to quantize coefficients to be encoded in a plurality of transform coefficient matrices to obtain corresponding first indication information, and further generate a video code stream of a target video according to a plurality of codeword indication information including the first indication information. Subsequently, the decoding end can decode the vector quantized coefficient according to the first indication information. Therefore, the embodiment of the application provides a video coding and decoding method based on vector quantization, the application of the vector quantization in video coding and decoding is realized, and compared with scalar quantization and network coding quantization, the quantization performance is improved, and further the performance of the video coding and decoding is improved.

It should be noted that, in the video encoding and decoding apparatus provided in the foregoing embodiment, only the division of the above functional modules is taken as an example for the purpose of video encoding and video decoding, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the functions described above. In addition, the video encoding and decoding apparatus and the video encoding and decoding method embodiments provided by the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.

Fig. 7 is a schematic diagram illustrating a server architecture in accordance with an example embodiment. The functions of the video encoding or video decoding server in the above embodiments can be implemented by the server shown in fig. 7. The server may be a server in a cluster of background servers. Specifically, the method comprises the following steps:

the server 700 includes a Central Processing Unit (CPU) 701, a system Memory 704 including a Random Access Memory (RAM) 702 and a Read-Only Memory (ROM) 703, and a system bus 705 connecting the system Memory 704 and the CPU 701. The server 700 also includes a basic Input/Output system (I/O system) 706 that facilitates information transfer between devices within the computer, and a mass storage device 707 for storing an operating system 713, application programs 714, and other program modules 715.

The basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 708 and the input device 709 are connected to the central processing unit 701 through an input output controller 710 connected to the system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 710 may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable media provide non-volatile storage for the server 700. That is, the mass storage device 707 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory device, CD-ROM, DVD (Digital Versatile disk), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 704 and mass storage device 707 described above may be collectively referred to as memory.

According to various embodiments of the present application, server 700 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the server 700 may be connected to the network 712 through a network interface unit 711 connected to the system bus 705, or the network interface unit 711 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU. The one or more programs include instructions for performing the video encoding and decoding methods provided by embodiments of the present application.

Embodiments of the present application further provide a computer-readable storage medium, where instructions, when executed by a processor of a server, enable the server to perform the video encoding and decoding methods provided by the foregoing embodiments. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. It is noted that the computer-readable storage medium referred to in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the video encoding or video decoding method provided by the above-described embodiments.

The above description should not be taken as limiting the embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims

1. A method of video encoding, the method comprising:

2. The method according to claim 1, wherein the quantizing the coefficients to be encoded in the plurality of transform coefficient matrices to obtain a plurality of codeword indication information includes:

3. The method according to claim 1, wherein the quantizing the coefficients to be encoded in the plurality of transform coefficient matrices to obtain a plurality of codeword indication information includes:

4. The method according to claim 3, wherein the plurality of coefficients to be encoded included in each of the plurality of transform coefficient matrices are all transform coefficients included in the respective transform coefficient matrix.

5. The method of claim 4, wherein the vector quantization condition comprises at least one of a number of rows of the matrix of transform coefficients reaching a first row number threshold, a number of columns of the matrix of transform coefficients reaching a first column number threshold, or an area of the matrix of transform coefficients reaching a first area threshold.

6. The method according to claim 3, wherein the plurality of coefficients to be encoded included in each of the plurality of transform coefficient matrices are transform coefficients in a first region of the corresponding transform coefficient matrix, the first region being a region scanned by a SRCC technique based on scanned region coefficients.

7. The method of claim 6, wherein the vector quantization condition comprises at least one of a number of rows of a first region in the transform coefficient matrix reaching a second row number threshold, or a number of columns of the first region in the transform coefficient matrix reaching a second column number threshold, or an area of the first region in the transform coefficient matrix reaching a second area threshold.

8. The method according to claim 3, wherein the at least one codeword indication information corresponding to each transform coefficient matrix in the video stream of the target video corresponds to a vector quantization switch identifier, and the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes the first indication information obtained by vector quantization.

9. The method according to claim 2 or 3, wherein the at least one codeword indication information corresponding to each transform coefficient matrix in the video code stream of the target video corresponds to a vector number or a vector quantization number, and the vector number or the vector quantization number is used to indicate the number of the first indication information included in the corresponding at least one codeword indication information.

10. The method of claim 9, wherein the at least one codeword indication information corresponding to each transform coefficient matrix in the video stream of the target video further corresponds to a vector quantization starting position, and the vector quantization starting position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information.

11. The method of claim 1, wherein quantizing the coefficients to be encoded in the plurality of transform coefficient matrices comprises:

12. The method of claim 1, wherein quantizing the coefficients to be encoded in the plurality of transform coefficient matrices comprises:

13. The method of claim 1, wherein quantizing the coefficients to be encoded in the plurality of transform coefficient matrices comprises:

14. The method according to claim 1, wherein the first indication information is a codeword index, codeword index indication information, or indication information of codeword components included in a codeword, and the codeword index is a one-dimensional index value or the codeword index is a multi-dimensional index coordinate.

15. The method of claim 1, wherein the video bitstream of the target video comprises an enable flag of vector quantization, and wherein the enable flag of vector quantization is used to indicate whether the video bitstream of the target video supports vector quantization.

16. The method of claim 1, wherein at least one vector dimension is included in the video code stream of the target video, and the at least one vector dimension is used to indicate a dimension of the codeword indicated by each first indication information in the video code stream of the target video.

17. The method according to claim 1, wherein the video stream of the target video further includes at least one codebook index, and the at least one codebook index is used to indicate a codebook used in determining the codeword corresponding to each first indication information in the video stream of the target video.

18. A method of video decoding, the method comprising:

19. The method of claim 18, wherein the plurality of codeword indication information is a plurality of first indication information, and the obtaining the decoded data of the target video according to the plurality of codeword indication information comprises:

20. The method according to claim 19, wherein the generating decoded data of the target video according to the obtained multiple code words comprises:

21. The method according to claim 18, wherein the plurality of codeword indication information includes a plurality of first indication information and a plurality of second indication information, the second indication information indicating quantization results obtained by other quantization than vector quantization;

22. The method according to any of claims 18-21, wherein the first indication information is a codeword index, codeword index indication information, or indication information of codeword components included in a codeword, and the codeword index is a one-dimensional index value, or the codeword index is a multi-dimensional index coordinate.

23. The method of claim 18, further comprising at least one codebook index in the video bitstream of the target video, wherein the at least one codebook index is used to indicate a codebook used in determining the codeword corresponding to each first indication information in the plurality of codeword indication information.

24. The method of claim 18, wherein the video bitstream of the target video further comprises an enable flag for vector quantization, and wherein the enable flag for vector quantization is used to indicate whether the video bitstream of the target video supports vector quantization.

25. The method according to claim 18, 20 or 24, wherein at least one codeword indication information corresponding to each transform coefficient matrix in a plurality of codeword indication information included in the video stream of the target video corresponds to a vector quantization switch identifier, and the vector quantization switch identifier is used to indicate whether the corresponding at least one codeword information includes the first indication information obtained by vector quantization.

26. The method of claim 18, wherein at least one of the codeword indication information corresponding to each transform coefficient matrix in the multiple codeword indication information included in the video stream of the target video corresponds to a vector number or a vector quantization time, and the vector number or the vector quantization time is used to indicate a number of the first indication information included in the corresponding at least one codeword indication information.

27. The method of claim 26, wherein the at least one codeword indication information corresponding to each transform coefficient matrix further corresponds to a vector quantization start position, and the vector quantization start position is used to indicate a position of a first indication information in the corresponding at least one codeword indication information.

28. The method of claim 18, wherein at least one vector dimension is included in the video code stream of the target video, and the at least one vector dimension is used to indicate a dimension of the codeword indicated by each first indication information in the video code stream of the target video.

29. A video encoding apparatus, characterized in that the apparatus comprises:

30. A video decoding apparatus, characterized in that the apparatus comprises:

31. A computer-readable storage medium, in which a computer program is stored, which, when executed by a computer, implements the video encoding method of any one of claims 1 to 17.

32. A computer-readable storage medium, in which a computer program is stored, which, when executed by a computer, implements the video decoding method of any one of claims 18 to 28.