CN111641832A

CN111641832A - Encoding method, decoding method, device, electronic device and storage medium

Info

Publication number: CN111641832A
Application number: CN201910157697.6A
Authority: CN
Inventors: 姚佳宝
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2020-09-08
Anticipated expiration: 2039-03-01
Also published as: CN111641832B

Abstract

The application discloses an encoding method, a decoding device, electronic equipment and a storage medium, and belongs to the technical field of data processing. The method comprises the following steps: dividing an image to be coded into a plurality of image blocks, performing feature transformation on a first image block in the plurality of image blocks through a transformation convolutional neural network to obtain a first transformation domain component corresponding to the first image block, and quantizing the first transformation domain component to obtain a quantization result of the first transformation domain component; determining a probability distribution of quantization results of the first transform domain components by a probability estimation network; the quantization result of the first transform domain component is encoded based on a probability distribution of the quantization result of the first transform domain component. The sizes of the plurality of image blocks may be different, and therefore, on the basis of feature change of the transformed convolutional neural network, after the image blocks of different sizes are encoded, the compression rate and the distortion rate of the encoded image can be further ensured.

Description

Encoding method, decoding method, device, electronic device and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to an encoding method, a decoding method, an apparatus, an electronic device, and a storage medium.

Background

In the data processing process, especially in the data transmission and storage process, the data to be processed is encoded according to a certain mode to achieve the purpose of data compression, thereby improving the data transmission efficiency and reducing the space occupied by data storage. Then, decoding can be performed in a manner corresponding to the encoding manner, so as to read data.

In the related art, a video codec is often used to encode and decode video data. The video coder-decoder mainly comprises a prediction module, a transformation module, a quantization module and an entropy coding module, and the coding and decoding of video data are realized through different combinations of the prediction module, the transformation module, the quantization module and the entropy coding module.

However, in the encoding process of video data, selection and control of each decision mode in each module need to be decided through an RDO (Rate Distortion Optimized), and the final decision result of the RDO is supplemented to the obtained code stream after encoding the video data, thereby increasing the overhead of the code stream.

Disclosure of Invention

The application provides an encoding method, a decoding device, an electronic device and a storage medium, which can solve the problem that the compression rate of an image cannot be ensured due to high code stream overhead of the related technology. The technical scheme is as follows:

in a first aspect, an encoding method is provided, and the method includes:

dividing an image to be coded into a plurality of image blocks;

for a first image block in the plurality of image blocks, performing feature transformation through a transformation convolution neural network based on the first image block to obtain a first transform domain component corresponding to the first image block, wherein the first image block refers to any one of the plurality of image blocks;

quantizing the first transform domain component to obtain a quantization result of the first transform domain component;

inputting the quantization result of the first transform domain component into a probability estimation network, and determining the probability distribution of the quantization result of the first transform domain component through the probability estimation network;

encoding the quantization result of the first transform domain component based on a probability distribution of the quantization result of the first transform domain component.

In a possible implementation manner, the performing, based on the first image block, feature transformation by transforming a convolutional neural network to obtain a first transform domain component corresponding to the first image block includes:

taking the first image block as an input of the transformed convolutional neural network, and determining a first transform coefficient corresponding to the first image block through the transformed convolutional neural network;

determining a transform domain component corresponding to the first transform coefficient based on an image compression rate of the image to be encoded, a maximum value of a compression rate range in which the image compression rate is located, and the number of channels of the first transform coefficient;

determining a transform domain component corresponding to the first transform coefficient as the first transform domain component.

In a possible implementation manner, the quantizing the first transform domain component to obtain a quantization result of the first transform domain component includes:

determining a quantization step corresponding to the first transform domain component;

and inputting the first transform domain component and the quantization step corresponding to the first transform domain component into a quantizer for quantization to obtain a quantization result of the first transform domain component.

In one possible implementation, the determining a quantization step corresponding to the first transform domain component includes:

taking the first transform domain component as an input of a quantization convolutional neural network, and determining a quantization step corresponding to the first transform domain component through the quantization convolutional neural network, wherein the number of convolutional kernels of convolutional layers included in the quantization convolutional neural network is the same as the number of convolutional kernels of convolutional layers included in the transform convolutional neural network;

alternatively, the first and second electrodes may be,

determining a transform domain component combined by channels based on the image compression ratio of the image to be coded, the maximum value of the compression ratio range of the image compression ratio and the first transform domain component, taking the transform domain component combined by the channels as the input of the quantization convolutional neural network, and determining the quantization step corresponding to the first transform domain component through the quantization convolutional neural network.

In one possible implementation, the probability estimation network comprises a first probability estimation network and a second probability estimation network;

said inputting the quantization result of the first transform domain component to a probability estimation network, determining a probability distribution of the quantization result of the first transform domain component by the probability estimation network, comprising:

inputting the quantized result of the first transform domain component to the first probability estimation network, obtaining a first probability distribution parameter reflecting a probability distribution of the quantized result of the first transform domain component output by the first probability estimation network processing;

preprocessing the first probability distribution parameter and the quantization result of the first transform domain component and inputting the preprocessed result into the second probability estimation network, and acquiring a second probability distribution parameter which is processed and output by the second probability estimation network and reflects the probability distribution of the quantization result of the first transform domain component;

determining a probability distribution of the quantization result of the first transform domain component based on the first probability distribution parameter and the second probability distribution parameter.

In one possible implementation, the first probability estimation network comprises a first encoder network and a first decoder network, the first encoder network and the first decoder network being connected by a quantizer therebetween;

said inputting the quantized result of the first transform domain component to the first probability estimation network, deriving first probability distribution parameters reflecting a probability distribution of the first transform domain component output by the first probability estimation network processing, comprising:

inputting the quantization result of the first transform domain component into the first encoder network, and processing and outputting the quantization result by the first encoder network to obtain a first parameter;

quantizing the first parameter to obtain a quantized first parameter;

and inputting the quantized first parameter into the first decoder network, and processing and outputting the quantized first parameter by the first decoder network to obtain the first probability distribution parameter.

In one possible implementation, the second probability estimation network includes a second encoder network and a first human decoder network, and the second encoder network and the second decoder network are connected through a quantizer;

the preprocessing the first probability distribution parameter and the quantization result of the first transform domain component and inputting the result into the second probability estimation network to obtain a second probability distribution parameter which is processed and output by the second probability estimation network and reflects the probability distribution of the quantization result of the first transform domain component, includes:

preprocessing the first probability distribution parameter and the quantization result of the first transform domain component, inputting the preprocessed first probability distribution parameter and the quantization result of the first transform domain component into the second encoder network, and processing and outputting the preprocessed first probability distribution parameter and the quantization result of the first transform domain component by the second encoder network to obtain a second parameter;

quantizing the second parameter to obtain a quantized second parameter;

and inputting the quantized second parameter into the second decoder network, and processing and outputting the second parameter by the second decoder network to obtain the second probability distribution parameter.

In one possible implementation, the determining a probability distribution of the quantization results of the first transform domain component based on the first probability distribution parameter and the second probability distribution parameter includes:

modeling a quantization result of the first transform domain component based on the first probability distribution parameter and the second probability distribution parameter to obtain a probability model;

and determining the probability distribution of the quantization result of the first transform domain component according to the probability model and the quantization step corresponding to the first transform domain component.

In one possible implementation, the probability distribution is a gaussian distribution, the first probability distribution parameter is a mean of a corresponding gaussian distribution model, and the second probability distribution parameter is a variance of the corresponding gaussian distribution model; alternatively, the first and second electrodes may be,

the probability distribution is a laplace distribution, the first probability distribution parameter is a position parameter in a corresponding laplace distribution model, and the second probability distribution parameter is a scale parameter in the laplace distribution model.

In a possible implementation manner, after the encoding the quantization result of the first transform domain component based on the probability distribution of the quantization result of the first transform domain component, the method further includes:

determining the quantized first parameter, the quantized second parameter and the quantization step corresponding to the first transform domain component as side information of the first image block;

encoding the side information to obtain a second code stream;

and determining the coding stream of the first image block by the second code stream and the first code stream obtained by coding the quantization result of the first transform domain component.

In a possible implementation manner, after dividing the image to be encoded into a plurality of image blocks, the method further includes:

determining a predicted image block for each image block of the plurality of image blocks through a predictive convolutional neural network;

determining a residual block for each image block of the plurality of image blocks based on the plurality of image blocks and a predicted image block for each image block;

the obtaining a first transform domain component corresponding to the first image block by performing feature transformation through a transformed convolutional neural network based on the first image block includes:

and performing feature transformation through the transformation convolutional neural network based on the residual block of the first image block to obtain a first transform domain component corresponding to the first image block.

In a second aspect, a decoding method is provided, the method comprising:

acquiring an encoded stream of a first image block in a plurality of image blocks, wherein the encoded stream comprises a first code stream and a second code stream, the first code stream is obtained by encoding a quantization result of a first transform domain component corresponding to the first image block, and the second code stream is encoded data of side information of the first image block;

the image blocks are obtained by dividing an image, and the first image block is any one of the image blocks;

decoding the second code stream to obtain side information of the first image block;

determining probability distribution corresponding to the first image block according to the side information;

decoding the first code stream based on the probability distribution corresponding to the first image block to obtain a quantization result of a first transform domain component corresponding to the first image block;

determining reconstruction information of the first image block through an inverse transform convolutional neural network based on a quantization result of the first transform domain component;

and reconstructing the first image block based on the reconstruction information of the first image block.

In one possible implementation, the side information includes a quantized first parameter and a quantized second parameter, and the probability estimation network includes a first decoder network and a second decoder network;

the determining the probability distribution corresponding to the first image block according to the side information includes:

inputting the quantized first parameter to the first decoder network, obtaining a first probability distribution parameter reflecting a probability distribution of a quantization result of the first transform domain component, which is processed and output by the first decoder network;

inputting the quantized second parameter into the second decoder network, and acquiring a second probability distribution parameter which is processed and output by the second decoder network and reflects the probability distribution of the quantization result of the first transform domain component;

and determining the probability distribution corresponding to the first image block based on the first probability distribution parameter, the second probability distribution parameter and the second code stream.

In a possible implementation manner, after obtaining the encoded stream of the first image block of the plurality of image blocks, the method further includes:

determining prediction information of the first image block through an inverse prediction convolutional neural network based on the coding stream of the first image block;

reconstructing the first image block based on the reconstruction information of the first image block includes:

and reconstructing the first image block based on the prediction information and reconstruction information of the first image block.

In a third aspect, an encoding apparatus is provided, the apparatus comprising:

the image coding device comprises a dividing module, a coding module and a decoding module, wherein the dividing module is used for dividing an image to be coded into a plurality of image blocks;

a first determining module, configured to perform, on a first image block of the plurality of image blocks, feature transformation on the first image block through a transformed convolutional neural network to obtain a first transform domain component corresponding to the first image block, where the first image block is any one of the plurality of image blocks;

a second determining module, configured to quantize the first transform domain component to obtain a quantization result of the first transform domain component;

a third determining module, configured to input the quantization result of the first transform domain component to a probability estimation network, and determine a probability distribution of the quantization result of the first transform domain component through the probability estimation network;

an encoding module configured to encode the quantization result of the first transform domain component based on a probability distribution of the quantization result of the first transform domain component.

In a possible implementation manner, the first determining module is mainly configured to:

In a possible implementation manner, the second determining module is mainly configured to:

In one possible implementation manner, the second determining module is further configured to:

alternatively, the first and second electrodes may be,

the third determining module is mainly configured to:

the third determining module is further configured to:

quantizing the first parameter to obtain a quantized first parameter;

the third determining module is further configured to:

quantizing the second parameter to obtain a quantized second parameter;

In one possible implementation manner, the third determining module is further configured to:

In one possible implementation, the apparatus further includes:

an obtaining module, configured to determine a quantized first parameter, a quantized second parameter, and a quantization step corresponding to the first transform domain component as side information of the first image block; coding the side information to obtain a second code stream; and determining the coding stream of the first image block by the second code stream and the first code stream obtained by coding the quantization result of the first transform domain component.

In one possible implementation, the apparatus further includes:

a fourth determining module, configured to determine a predicted image block of each image block of the plurality of image blocks through a predictive convolutional neural network;

a fifth determining module, configured to determine a residual block of each image block of the plurality of image blocks based on the plurality of image blocks and a predicted image block of each image block;

the first determination module is further to:

In a fourth aspect, there is provided a decoding apparatus, the apparatus comprising:

the image processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an encoded stream of a first image block in a plurality of image blocks, the encoded stream comprises a first code stream and a second code stream, the first code stream is obtained by encoding a quantization result of a first transform domain component corresponding to the first image block, the second code stream is encoded data of side information of the first image block, the plurality of image blocks are obtained by dividing an image, and the first image block is any one of the plurality of image blocks;

a first decoding module, configured to decode the second code stream to obtain edge information of the first image block;

the first determining module is used for determining the probability distribution corresponding to the first image block according to the side information;

a second decoding module, configured to decode the first code stream based on the probability distribution corresponding to the first image block to obtain a quantization result of a first transform domain component corresponding to the first image block;

a second determining module, configured to determine reconstruction information of the first image block through an inverse transform convolution neural network based on a quantization result of the first transform domain component;

and the reconstruction module is used for reconstructing the first image block based on the reconstruction information of the first image block.

In one possible implementation, the side information includes a first parameter and a second parameter, the probability estimating network includes a first probability estimating network and a second probability estimating network;

the first determining module is mainly used for:

In one possible implementation, the apparatus further includes:

a third determining module, configured to determine, based on an encoded stream of the first image block, prediction information of the first image block through an inverse prediction convolutional neural network;

the reconstruction module is mainly used for:

In a fifth aspect, an electronic device is provided, which includes:

the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

the memory is used for storing computer programs; the processor is configured to execute the program stored in the memory to implement the steps of any of the methods provided in the first aspect or the second aspect.

A sixth aspect provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the methods provided in the first or second aspects.

The technical scheme provided by the application can at least bring the following beneficial effects:

in the embodiment of the application, the image to be coded is subjected to block division to obtain a plurality of image blocks, and on the basis of the determined image compression ratio, after a plurality of transform domain components which are in one-to-one correspondence with the plurality of image blocks are determined through a transform convolutional neural network, the coded image can be ensured to reach the image compression ratio. Then, quantizing each transform domain component, modeling through a probability estimation model based on the obtained quantization results of the plurality of transform domain components to obtain a probability model, and coding on the basis of probability distribution determined by the probability model to realize coding of each image block. The image blocks are obtained by dividing according to the texture features of the image, so that the sizes of the image blocks may be different, and therefore, on the basis of feature change of the transformed convolutional neural network, after the image blocks with different sizes are encoded, the compression rate and the distortion rate of the encoded image can be further ensured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a coding/decoding system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an encoding method according to an embodiment of the present application;

fig. 3 is an execution structure diagram of an encoding method provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a cascade classifier provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a transformed convolutional neural network provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a quantized convolutional neural network provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a first trellis decoding convolutional neural network according to an embodiment of the present application;

fig. 8 is a flowchart illustrating an encoding method according to an embodiment of the present application;

fig. 9 is a schematic flowchart of an encoding method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a first predictive convolutional neural network provided in an embodiment of the present application;

fig. 11 is a flowchart illustrating a decoding method according to an embodiment of the present application;

fig. 12 is an execution structure diagram of a decoding method provided in an embodiment of the present application;

fig. 13 is a schematic structural diagram of an encoding apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a decoding apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of a coding and decoding system according to an embodiment of the present application. As shown in fig. 1, the codec system includes an encoder 01, a decoder 02, a storage device 03, and a link 04. The encoder 01 may communicate with the storage 03 and the encoder 01 may also communicate with the decoder 02 via the link 04. The decoder 02 may also communicate with the storage device 03.

The encoder 01 is configured to acquire a data source, encode the data source, and transmit the encoded code stream to the storage device 03 for storage, or directly transmit the encoded code stream to the decoder 02 through the link 04. The decoder 02 may obtain the code stream from the storage device 03 and decode the code stream to obtain the data source, or decode the code stream after receiving the code stream transmitted by the encoder 01 through the link 04 to obtain the data source. The data source may be a captured image or a captured video. Both the encoder 01 and the decoder 02 may be provided as separate electronic devices. Storage 03 may comprise any of a variety of distributed or locally accessed data storage media. Such as a hard disk drive, blu-ray disc, read-only disc, flash memory, or other suitable digital storage medium for storing encoded data. Link 04 may include at least one communication medium, which may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines.

Fig. 2 is a flowchart of an encoding method provided in an embodiment of the present application, where the method is applied to an electronic device. Referring to fig. 2, the method includes the following steps. The execution structure of the method can be as shown in fig. 3.

Step 201: an image to be encoded is divided into a plurality of image blocks.

In order to ensure the compression rate and the distortion rate of the image to be encoded, the image may be divided according to certain texture features to obtain a plurality of image blocks. The image to be encoded may be any image or a video frame image.

The image can be finely divided through manual control, or can be divided through a fine division network after being roughly divided through manual control, or can be divided through a neural network decision device obtained through pre-training, or can be divided through a cascade classifier obtained through pre-training, wherein the cascade classifier is obtained by cascading a plurality of classifiers.

Exemplarily, assuming that the size of an image to be encoded is 256 × 256, when the image is divided by a pre-trained cascade classifier, as shown in fig. 4, the image may be input into a classifier 1 of the cascade classifier, and when the output result of the classifier 1 is 1, the image is divided into 4 image blocks each having a size of 128 × 128. And sequentially inputting the four image blocks into a classifier 2 of the cascade classifier, continuously dividing any image block into 4 image blocks of 64 × 64 for any image block in the four image blocks when the output result of the classifier 2 is 1, and finishing the division of the image block when the output result of the classifier 2 is 0. The same applies to the other image blocks of the four image blocks. And for 4 image blocks of 64 × 64 obtained after division, the image blocks may be continuously input into the classifier 3 of the cascade classifier until all the image blocks are judged by the classifier of the cascade classifier and 0 is output.

In this way, the image blocks obtained by dividing the image to be encoded may have 128 × 128 image blocks or 64 × 64 image blocks, that is, the sizes of the plurality of image blocks obtained by dividing the image to be encoded may be different. Of course, after the image block to be encoded is divided, the sizes of the obtained plurality of image blocks may be the same. The size of an image block may refer to the number of pixels of the image, for example, when the image block is 64 × 64, the number of pixels of the image block is 64 × 64.

When dividing an image to be encoded, the image may be divided according to other features in addition to texture features. For example, the image may be divided according to the gray value of each pixel point of the image.

Step 202: and for a first image block in the plurality of image blocks, performing feature transformation on the first image block through a transformation convolutional neural network to obtain a first transform domain component corresponding to the first image block.

In a first possible implementation manner, after an image to be encoded is divided into a plurality of image blocks, an image compression ratio of the image to be encoded may be determined, and a network parameter corresponding to the image compression ratio is determined, then the initial transformed convolutional neural network is initialized by using the network parameter corresponding to the image compression ratio, and then the first image block is input into the initialized transformed convolutional neural network for spatial sparsification, so as to obtain a first transform domain component corresponding to the first image block.

It should be noted that, when the characteristics of the plurality of image blocks are changed through the transformed convolutional neural network, the plurality of image blocks may be sequentially input into the transformed convolutional neural network, that is, the implementation manner described above is also used. Of course, the plurality of image blocks may be input to the transformed convolutional neural network at the same time.

The network parameters in the transformed convolutional neural network are determined according to the currently determined image compression ratio, i.e., the network parameters in the transformed convolutional neural network are related to the image compression ratio. Therefore, after the characteristic transformation is carried out through the transformed convolutional neural network initialized by the network parameters, the coded image can reach the currently determined image compression rate.

As shown in fig. 5, the transform convolutional neural network may include an input layer, an intermediate layer, and an output layer, where the input layer may be configured to input a plurality of image blocks, the intermediate layer may be configured to perform downsampling and convolution operations on each input image block to determine a transform domain component corresponding to each image block, and the output layer may be configured to output the transform domain component corresponding to each image block. The intermediate layer may include multiple sets of LSTM (Long Short-Term Memory) Layers and CNNL (Convolutional Layers) corresponding to each set of LSTM Layers, each set of LSTM Layers includes the same number of LSTMs, an input end of each set of LSTM Layers is used for being connected with an output end of the input layer, an output end of each set of LSTM Layers is connected with an input end of the corresponding CNNL, and an output end of each CNNL is used for being connected with the output layer. The LSTMs in each group of LSTM layers are sequentially and unidirectionally connected, and the LSTMs at the same position in the plurality of groups of LSTM layers are sequentially and unidirectionally connected.

In a second possible implementation manner, after an image to be encoded is divided into a plurality of image blocks, an image compression rate of the image to be encoded is determined, the first image block is used as an input of a transformed convolutional neural network, and a first transform coefficient corresponding to the first image block is determined through the transformed convolutional neural network. And determining a transform domain component corresponding to the first transform coefficient based on the image compression ratio of the image to be coded, the maximum value of the compression ratio range where the image compression ratio is located and the channel number of the first transform coefficient, and determining the transform domain component corresponding to the first transform coefficient as the first transform domain component.

The first transform coefficient may be determined by determining a ratio between the number of channels of the first transform coefficient and a maximum value of a compression ratio range in which an image compression ratio is located, multiplying the ratio by the image compression ratio to obtain a product, and replacing the number of channels of the first transform coefficient with the number of transform channels to obtain a transform domain component corresponding to the first transform coefficient.

It should be noted that, for the second possible implementation, the network parameters in the transformed convolutional neural network are not determined based on the image compression rate, so that after the feature transformation is performed by the transformed convolutional neural network, the number of channels of the first transform coefficient may be reduced based on the image compression rate according to the above method, and then the compression of the image block may be implemented based on the first transform domain component obtained after the number of channels is reduced. Thus, the image after being coded can reach the image compression rate determined currently.

Step 203: and quantizing the first transform domain component to obtain a quantization result of the first transform domain component.

Specifically, a quantization step corresponding to the first transform domain component is determined, and the first transform domain component and the quantization step corresponding to the first transform domain component are input to a quantizer for quantization, so as to obtain a quantization result of the first transform domain component.

The specific operation of quantization by the quantizer may be: and determining the ratio of the first transform domain component to the quantization step corresponding to the first transform domain component, rounding the ratio, and taking the product of the rounded value and the quantization step corresponding to the first transform domain component as the quantization result of the first transform domain component.

The quantization step corresponding to the first transform domain component may be set manually, or may be determined as follows. Specifically, the first transform domain component is used as an input of a quantization convolutional neural network, and a quantization step corresponding to the first transform domain component is determined through the quantization convolutional neural network.

The quantity of convolution kernels of convolution layers included in the quantization convolutional neural network is the same as that of convolution kernels of convolution layers included in the transformation convolutional neural network. The network parameters in the quantized convolutional neural network can be obtained by training samples. As shown in fig. 6, the quantized convolutional neural network may include a max pooling layer, an average pooling layer, a plurality of layers of fully-connected layers connected in sequence, and an activation function. The input end of the maximum pooling layer and the input end of the average pooling layer are both used for obtaining a first transform domain component, maximum pooling operation and average pooling operation are respectively carried out on the first transform domain component, the multilayer full-connection layer is used for respectively calculating a result after the maximum pooling operation and a result after the average pooling operation, summation operation is carried out on the obtained calculation results, the activation function can be an S-shaped function, threshold calculation is carried out on the result of the summation operation through the activation function, and the quantization step length corresponding to the first transform domain component is obtained. After performing a maximum pooling operation or an average pooling operation on the first transform domain component, the resulting pooled component features may have a size of 1 x 1.

Step 204: and inputting the quantization result of the first transform domain component into a probability estimation network, and determining the probability distribution of the quantization result of the first transform domain component through the probability estimation network.

Wherein, in determining the probability distribution of the quantized result of the first transform domain component by the probability estimation network, the probability estimation network may comprise a first probability estimation network and a second probability estimation network. At this time, the specific implementation process of the above steps can be implemented as the following steps (1) to (3).

(1) And inputting the quantization result of the first transform domain component into a first probability estimation network, and acquiring a first probability distribution parameter reflecting the probability distribution of the quantization result of the first transform domain component, which is output by the first probability estimation network.

Since the first probability estimation network may comprise a first encoder network and a first decoder network, the first encoder network and the first decoder network are connected by a quantizer therebetween. Therefore, the specific implementation manner of the steps can be as follows: inputting the quantization result of the first transform domain component into a first encoder network, and processing and outputting the quantization result by the first encoder network to obtain a first parameter; quantizing the first parameter to obtain a quantized first parameter; and inputting the quantized first parameter into a first decoder network, and processing and outputting the quantized first parameter by the first decoder network to obtain a first probability distribution parameter.

The first encoder network may have the same structure as the transformed convolutional neural network, and the network parameters in the first encoder network may be obtained through sample training. The first decoder network may comprise an input layer, which may be configured to input quantized first parameters, an intermediate layer, which may be configured to upsample and deconvolute the input quantized first parameters to determine first probability distribution parameters, and an output layer, which may be configured to output the first probability distribution parameters, as shown in fig. 7. The number of convolution kernels of CNNL included in the first decoder network is the same as the number of convolution kernels of CNNL included in the transformed convolutional neural network, and network parameters in the first decoder network can be obtained by training samples.

(2) And preprocessing the first probability distribution parameter and the quantization result of the first transform domain component and then inputting the preprocessed result into a second probability estimation network, and acquiring a second probability distribution parameter which is processed and output by the second probability estimation network and reflects the probability distribution of the quantization result of the first transform domain component.

Since the second probability estimation network may comprise a second encoder network and a second decoder network, the second encoder network and the second decoder network are connected by a quantizer. Therefore, the specific implementation process of the steps can be as follows: preprocessing the first probability distribution parameter and the quantization result of the first transform domain component, inputting the preprocessed first probability distribution parameter and the quantization result of the first transform domain component into a second encoder network, and processing and outputting the preprocessed first probability distribution parameter and the quantization result of the first transform domain component by the second encoder network to obtain a second parameter; quantizing the second parameter to obtain a quantized second parameter; and inputting the quantized second parameter into a second decoder network, and processing and outputting the second parameter by the second decoder network to obtain a second probability distribution parameter.

Wherein, the first probability distribution parameter and the quantization result of the first transform domain component may be preprocessed by: a difference between the absolute value of the first probability distribution parameter and the absolute value of the result of quantization of the first transform domain component is determined. That is, the value input into the second encoder network is the difference between the absolute value of the first probability distribution parameter and the absolute value of the result of the quantization of the first transform domain component.

The structure of the second encoder network may be the same as that of the variable convolutional neural network, the structure of the second decoder network may be the same as that of the first decoder network, and the network parameters in the second encoder network and the network parameters in the second decoder network may be obtained by training samples.

(3) And determining a probability distribution of the quantization result of the first transform domain component based on the first probability distribution parameter and the second probability distribution parameter.

Specifically, the quantization result of the first transform domain component may be modeled based on the first probability distribution parameter and the second probability distribution parameter, to obtain a probability model; and determining the probability distribution of the quantization result of the first transform domain component according to the probability model and the quantization step corresponding to the first transform domain component.

The probability distribution may be a gaussian distribution or a laplace distribution. When the probability distribution is Gaussian distribution, the first probability distribution parameter is the mean value of the corresponding Gaussian distribution model, and the second probability distribution parameter is the variance of the corresponding Gaussian distribution model. That is, the first probability distribution parameter is used as the mean of the gaussian distribution model, the second probability distribution parameter is used as the variance of the gaussian distribution model, and the quantization result of the first transform domain component is modeled to obtain the gaussian distribution. When the probability distribution is Laplace distribution, the first probability distribution parameter is a position parameter in the corresponding Laplace distribution model, and the second probability distribution parameter is a scale parameter in the Laplace distribution model. That is, the first probability distribution parameter is used as a position parameter of the laplacian distribution model, the second probability distribution parameter is used as a scale of the laplacian distribution model, and a quantization result of the first transform domain component is modeled to obtain the laplacian distribution.

Wherein, according to the probability model and the quantization step corresponding to the first transform domain component, the probability distribution of the quantization result of the first transform domain component can be determined according to the following formula (1).

Wherein p refers to the probability distribution of the quantization result of the first domain component,

refers to a quantization result of the first transform domain component, m refers to a first probability distribution parameter, ν refers to a second probability distribution parameter, s refers to a quantization step size of the quantization result of the first transform domain component, N refers to a probability model of the quantization result of the first transform domain component, and μ refers to a quantization interval corresponding to the quantization step size.

Step 205: the result of quantization of the first transform domain component is encoded based on a probability distribution of the result of quantization of the first transform domain component.

In particular, the quantization result of the first transform domain component may be encoded in an arithmetic coding manner based on a probability distribution of the quantization result of the first transform domain component. Of course, the quantization result of the first transform domain component may be encoded by other encoding methods, such as huffman encoding.

Further, after the quantization result of the first transform domain component is encoded, the quantized first parameter, the quantized second parameter, and the quantization step corresponding to the first transform domain component may also be determined as side information of the first image block, the side information is encoded to obtain a second code stream, and the second code stream and the first code stream obtained by encoding the quantization result of the first transform domain component determine an encoded stream of the first image block.

It should be noted that after the first code stream corresponding to the first transform domain component is obtained, the quantized first parameter, the quantized second parameter, and the quantization step corresponding to the first transform domain component may also be obtained, and the quantized first parameter, the quantized second parameter, and the quantization step corresponding to the first transform domain component are encoded to obtain a second code stream, so that the first code stream and the second code stream are used as the encoded stream of the first image block. Of course, the first code stream, the quantized first parameter, the quantized second parameter, and the quantization step corresponding to the first transform domain component may also be determined as an encoded stream.

After obtaining the encoded stream of each of the plurality of image blocks, the encoded stream of each of the plurality of image blocks may be sent to a decoding end as the encoded stream of the image.

The encoding method for the quantized first parameter, the quantized second parameter, and the quantization step corresponding to the first transform domain component may be arithmetic encoding. Of course, the encoding may be performed by other encoding methods, such as huffman encoding.

Fig. 8 is a flowchart of an encoding method provided in an embodiment of the present application, where the method is applied to an electronic device. Referring to fig. 8, the method includes the following steps.

Step 801: an image to be encoded is divided into a plurality of image blocks.

The specific implementation manner of this step may be the same as or similar to the implementation manner of step 201, and is not described herein again in this embodiment of the application.

Step 802: and for a first image block in the plurality of image blocks, performing feature transformation on the first image block through a transformation convolutional neural network to obtain a first transform domain component corresponding to the first image block.

After an image to be coded is divided into a plurality of image blocks, the first image block is used as the input of a transformation convolution neural network, and a first transformation coefficient corresponding to the first image block is determined through the transformation convolution neural network. And determines a first transform coefficient as a first transform domain component corresponding to the first image block.

It should be noted that, in the embodiment of the present application, different image compression rates correspond to the same network parameters in the transformed convolutional neural network.

Step 803: a quantization step corresponding to the first transform domain component is determined.

Specifically, the image compression rate of the image to be encoded is determined, the transform domain component combined by channels is determined based on the image compression rate of the image to be encoded, the maximum value of the compression rate range where the image compression rate is located and the first transform domain component, the transform domain component combined by the channels is used as the input of the quantization convolutional neural network, and the quantization step corresponding to the first transform domain component is determined through the quantization convolutional neural network.

The specific operation of determining the transform domain component of the channel combination may be: and determining the ratio between the image compression ratio and the maximum value of the compression ratio range in which the image compression ratio is positioned, rounding up the ratio, adding the rounded ratio and the number of channels of the first transform domain component to obtain a channel combination value, and replacing the number of channels in the first transform domain component with the channel combination value to obtain the transform domain component after channel combination. And then, inputting the transform domain components after channel combination into a quantization convolution neural network, and obtaining a quantization step corresponding to the first transform domain component after calculation by the quantization convolution neural network.

Step 804: and inputting the first transform domain component and the quantization step corresponding to the first transform domain component into a quantizer for quantization to obtain a quantization result of the first transform domain component.

Step 805: and inputting the quantization result of the first transform domain component into a probability estimation network, and determining the probability distribution of the quantization result of the first transform domain component through the probability estimation network.

The specific implementation manner of this step may be the same as or similar to the implementation manner in step 204, and details are not described herein in this embodiment of the application.

Step 806: the result of quantization of the first transform domain component is encoded based on a probability distribution of the result of quantization of the first transform domain component.

The specific implementation manner of this step may be the same as or similar to the implementation manner in step 205, and is not described herein again in this embodiment of the application.

In the embodiment of the application, an image to be coded is divided to obtain a plurality of image blocks, and then a plurality of transform domain components which are in one-to-one correspondence with the plurality of image blocks are determined through a transform convolutional neural network. The image blocks are obtained by dividing according to the texture features of the image, so that the sizes of the image blocks may be different, and the compression rate and the distortion rate of the encoded image can be further ensured after the image blocks with different sizes are encoded on the basis of feature change of the transformed convolutional neural network. In addition, the quantization step corresponding to the first transform domain component can be determined by the quantization convolutional neural network in the embodiment of the present application, that is, different transform domain components correspond to different quantization steps, so that the encoding effect can be further improved after encoding.

Fig. 9 is a flowchart of an encoding method provided in an embodiment of the present application, where the method is applied to an electronic device. Referring to fig. 9, the method includes the following steps.

Step 901: an image to be encoded is divided into a plurality of image blocks.

The specific implementation manner of step 801 may be the same as or similar to the implementation manner of step 201, and is not described herein again in this embodiment of the application.

Further, after obtaining the plurality of image blocks, in order to avoid reducing the compression rate of each image block due to spatial redundancy, temporal redundancy, and the like in each image block, the residual block of each image block in the plurality of image blocks may be determined as follows. Specifically, a predicted image block of each image block of the plurality of image blocks may be determined by a predictive convolutional neural network, and a residual block of each image block of the plurality of image blocks may be determined based on the plurality of image blocks and the predicted image block of each image block.

For any image block, the upper image block and the left image block of the image block can be determined according to the position of the image block in the image to be coded, and then the upper image block and the left image block are input to the first prediction convolutional neural network to obtain a prediction image block of the image block. And then subtracting the pixel values of the pixel points at the same position in the any image block and the prediction image block of the any image block to obtain a residual block of the any image block. In this way, a residual block for each of the plurality of image blocks can be determined.

The network parameters in the first predictive convolutional neural network may be obtained through training of sample data, and as shown in fig. 10, the first predictive convolutional neural network may include a prediction function, a plurality of fully-connected layers, and an activation function. The prediction function is used for acquiring an upper image block and a left image block, the prediction function can be a concat function, the activation function can be a sigmoid function, and the full-connection layers and the activation function are used for performing full-connection operation on a prediction result of the prediction function to obtain and output the prediction image block.

When the image block is a boundary image block, there may be no upper image block or a left image block in the image block, and a filler block having the same size as the image block may be used as the upper image block or the left image block. The pixel value of each pixel point in the padding block may be 0.

Step 902: and for a first image block in the plurality of image blocks, performing feature transformation on the first image block by a transformation convolutional neural network based on a residual block of the first image block to obtain a first transform domain component corresponding to the first image block.

The determining manner of the first transform domain component corresponding to the first image block may be the same as or similar to the first possible implementation manner in step 202, and is not repeated herein in this embodiment of the present application.

Step 903: and quantizing the first transform domain component to obtain a quantization result of the first transform domain component.

The specific implementation manner of this step may be the same as or similar to the implementation manner in step 203, and details are not described herein in this embodiment of the application.

Step 904: and inputting the quantization result of the first transform domain component into a probability estimation network, and determining the probability distribution of the quantization result of the first transform domain component through the probability estimation network.

Step 905: the result of quantization of the first transform domain component is encoded based on a probability distribution of the result of quantization of the first transform domain component.

In the embodiment of the application, after an image to be coded is subjected to block division to obtain a plurality of image blocks, a residual block of each image block is determined through a first prediction convolutional neural network, so that spatial domain redundancy and time domain redundancy on each image block are removed. Therefore, on the basis of the determined image compression rate, after a plurality of transform domain components corresponding to the plurality of image blocks are determined through transforming the convolutional neural network, the encoded image can be ensured to reach the image compression rate. In addition, since the plurality of image blocks are obtained by dividing according to the texture features of the image, the sizes of the plurality of image blocks may be different, and therefore, on the basis of feature change of the transformed convolutional neural network, after the image blocks of different sizes are encoded, the compression rate and the distortion rate of the encoded image can be further ensured.

Fig. 11 is a flowchart of a decoding method provided in an embodiment of the present application, where the method is applied to an electronic device. Referring to fig. 11, the method includes the following steps. The execution structure of the method can be as shown in fig. 12.

Step 1101: the method comprises the steps of obtaining an encoding stream of a first image block in a plurality of image blocks, wherein the encoding stream comprises a first code stream and a second code stream.

Specifically, an encoded stream of each of a plurality of image blocks sent by an encoder may be obtained, so as to obtain an encoded stream of a first image block, where the first image block is any one of the plurality of image blocks.

The first code stream is obtained by encoding a quantization result of a first transform domain component corresponding to the first image block, the second code stream is encoded data of side information of the first image block, and the plurality of image blocks are obtained by dividing an image.

Further, after the first code stream of the first image block is obtained, the prediction information of the first image block may be determined through an inverse prediction convolutional neural network based on the first code stream of the first image block.

The specific implementation manner of determining the prediction information of the first image block may be: the upper image block and the left image block of the first image block can be determined according to the position of the first image block in the image, and then the upper image block and the left image block are input to the inverse prediction convolutional neural network to obtain prediction information of the first image block.

When the first image block is a boundary image block, the first image block may not have an upper image block or a left image block, and at this time, a filler block having the same size as the first image block may be used as the upper image block or the left image block, and a pixel value of each pixel point in the filler block may be 0. The network parameters in the inverse prediction convolutional neural network can be obtained by training sample data, the network structure of the inverse prediction convolutional neural network can be the same as that of the prediction convolutional neural network, namely the inverse prediction convolutional neural network can be composed of an activation function and a plurality of full connection layers.

Step 1102: and decoding the second code stream to obtain the side information of the first image block.

In a possible implementation manner, the second code stream may be decoded in an arithmetic decoding manner, and of course, may also be decoded in other decoding manners, such as huffman decoding.

Step 1103: and determining the probability distribution corresponding to the first image block according to the side information.

Since the side information may comprise the quantized first parameter and the quantized second parameter, the probability estimation network comprises a first decoder network and a second decoder network, and thus the quantized side information comprises the quantized first parameter and the quantized second parameter. In this case, the specific implementation manner of step 1102 may be: the quantized first parameter is input to a first decoder network, and a first probability distribution parameter reflecting a probability distribution of a quantization result of the first transform domain component, which is processed and output by the first decoder network, is acquired. And inputting the quantized second parameter into a second decoder network, and acquiring a second probability distribution parameter which is processed and output by the second decoder network and reflects the probability distribution of the quantization result of the first transform domain component. And determining the probability distribution corresponding to the first image block based on the first probability distribution parameter, the second probability distribution parameter and the second code stream.

The specific implementation described above may be implemented as in step 204 above, except that the second code stream needs to be arithmetically decoded to obtain the quantization step size of the first transform domain component.

Step 1104: and decoding the first code stream based on the probability distribution corresponding to the first image block to obtain a quantization result of the first transform domain component corresponding to the first image block.

Specifically, the first code stream information of the first image block is analyzed to obtain a quantization step corresponding to the first image block, and then the quantization result of the first transform domain component corresponding to the first image block is determined according to the similar implementation manner in step (3) in step 204.

Step 1105: and determining reconstruction information of the first image block through an inverse transform convolutional neural network based on the quantization result of the first transform domain component.

The structure of the deconvolution neural network can be the same as or similar to that of the transform convolution neural network, and the difference is that an input layer in the deconvolution neural network is used for obtaining a quantization result of the first transform domain component, and after upsampling and deconvolution calculation is performed through an intermediate layer in the deconvolution neural network, reconstruction information corresponding to the quantization result of the first transform domain component is output through an output layer of the deconvolution neural network, so that reconstruction information of the first image can be obtained. The network parameters in the deconvolution neural network can be obtained by training samples.

Step 1106: and reconstructing the first image block based on the reconstruction information of the first image block.

It should be noted that, when determining the prediction information of each image block in the plurality of image blocks based on the second prediction convolutional neural network, the implementation of step 1106 may be: and reconstructing the first image block based on the prediction information and the reconstruction information of the first image block.

Specifically, the prediction information and the reconstruction information of the first image block are accumulated by pixel values to obtain reconstruction information of the first image block, and the first image block is reconstructed based on the reconstruction information of the first image block.

It should be noted that, after the first image block is obtained by reconstruction, the above steps may be repeated to obtain each reconstructed image block, so that a reconstructed image of the image is obtained according to each reconstructed image block.

In the embodiment of the application, after the code stream of each image block in a plurality of image blocks is obtained, the obtained code stream is analyzed to obtain the quantization step of each image block, the quantization result of the first parameter and the quantization result of the second parameter, and then the mean value and the variance of the probability model are determined on the basis of the third decoding convolutional neural network and the fourth decoding convolutional neural network to obtain the corresponding probability model. And then, the reconstruction information of each image block is determined according to the probability of each element in the quantization interval corresponding to the quantization step size, the prediction information of each image block is predicted through the second prediction convolutional neural network, and each image block is reconstructed by adding the determined reconstruction information and the prediction information, so that the accuracy of reconstructing the image block is improved, and the distortion rate of each image block is ensured.

It should be noted that, in the above embodiments, the transforming convolutional neural network, the quantizing convolutional neural network, the first division coding convolutional neural network, the second division coding convolutional neural network, the first division decoding convolutional neural network, the second division decoding convolutional neural network, the third division decoding convolutional neural network, the fourth division decoding convolutional neural network, the first prediction convolutional neural network, and the second prediction convolutional neural network are convolutional neural networks, and in order to distinguish the convolutional neural networks, the above embodiments are named according to the function of each convolutional neural network.

The implementation process of training each convolutional neural network to obtain corresponding network parameters may be as follows: and cutting and normalizing the plurality of sample images to obtain an image training set, wherein the image training set comprises a plurality of normalized image blocks. And carrying out forward calculation of a convolutional neural network on the image blocks included in the image training set to obtain a reconstructed image component of each image block. And then, taking the mean square error formula as a loss function, and calculating the reconstructed image component of each image block to obtain the loss value of each image block. And then, repeating the operation according to the following formula (2) until the iteration number reaches the maximum iteration number, wherein at the moment, the network parameters of each convolutional neural network can be considered to be converged. In this way, the network parameters that finally converge may be determined as the network parameters for each convolutional neural network.

Wherein M represents the number of image blocks included in the image training set, I_nRepresenting an input image block, F (I)_n|Θ_i) Represents the convolutional neural network at theta_iThe reconstructed image component after feature encoding and feature decoding under the parameter set,

representing image blocks I_nThe feature components after feature encoding are processed,

is represented at theta_iAnd (5) the probability after modeling under the parameter set, wherein i is the current iteration number.

Fig. 13 is a schematic structural diagram of an encoding apparatus provided in an embodiment of the present application, where the encoding apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of the two. As shown in fig. 13, the apparatus includes:

a dividing module 1301, configured to divide an image to be encoded into a plurality of image blocks;

a first determining module 1302, configured to perform, on a first image block of the plurality of image blocks, feature transformation on the first image block through a transformed convolutional neural network to obtain a first transform domain component corresponding to the first image block, where the first image block refers to any one of the plurality of image blocks;

a second determining module 1303, configured to quantize the first transform domain component to obtain a quantization result of the first transform domain component;

a third determining module 1304, configured to input the quantization result of the first transform domain component to a probability estimation network, and determine a probability distribution of the quantization result of the first transform domain component through the probability estimation network;

an encoding module 1305, configured to encode the quantization result of the first transform domain component based on a probability distribution of the quantization result of the first transform domain component.

alternatively, the first and second electrodes may be,

the third determining module is mainly configured to:

the third determining module is further configured to:

quantizing the first parameter to obtain a quantized first parameter;

the third determining module is further configured to:

quantizing the second parameter to obtain a quantized second parameter;

In one possible implementation, the apparatus further includes:

the first determination module is further to:

In the embodiment of the application, the image to be coded is subjected to block division to obtain a plurality of image blocks, and on the basis of the determined image compression ratio, after a plurality of transform domain components which are in one-to-one correspondence with the plurality of image blocks are determined through a transform convolutional neural network, the coded image can be ensured to reach the image compression ratio. The image blocks are obtained by dividing according to the texture features of the image, so that the sizes of the image blocks may be different, and the compression rate and the distortion rate of the encoded image can be further ensured after the image blocks with different sizes are encoded on the basis of feature change of the transformed convolutional neural network.

It should be noted that: in the encoding apparatus provided in the above embodiment, when encoding image or video data, only the division of the above functional modules is illustrated, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions. In addition, the encoding apparatus and the encoding method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 14 is a schematic structural diagram of a decoding apparatus provided in an embodiment of the present application, where the decoding apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of the two. As shown in fig. 14, the apparatus includes:

an obtaining module 1401, configured to obtain an encoded stream of a first image block in a plurality of image blocks, where the encoded stream includes a first code stream and a second code stream, the first code stream is obtained by encoding a quantization result of a first transform domain component corresponding to the first image block, the second code stream is encoded data of side information of the first image block, the plurality of image blocks are obtained by dividing an image, and the first image block is any one of the plurality of image blocks;

a first decoding module 1402, configured to decode the second code stream to obtain side information of the first image block;

a first determining module 1403, configured to determine, according to the side information, a probability distribution corresponding to the first image block;

a second decoding module 1404, configured to decode the first code stream based on the probability distribution corresponding to the first image block to obtain a quantization result of a first transform domain component corresponding to the first image block;

a second determining module 1405, configured to determine reconstruction information of the first image block through an inverse transform convolutional neural network based on a quantization result of the first transform domain component;

a reconstructing module 1406 configured to reconstruct the first image block based on reconstruction information of the first image block.

the first determining module is mainly used for:

In one possible implementation, the apparatus further includes:

the reconstruction module is mainly used for:

It should be noted that: in the decoding apparatus provided in the above embodiment, when decoding a code stream of image or video data, only the division of the above functional modules is used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the decoding apparatus and the decoding method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 15 is a schematic structural diagram of an electronic device 1500 according to an embodiment of the present disclosure, where the electronic device 1500 may be an encoder or a decoder, and the electronic device 1500 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1501 and one or more memories 1502, where a communication interface is provided between the processors 1501 and the memories 1502, and communication between the processors 1501 and the memories 1502 is implemented through a communication bus. The memory 1502 stores at least one instruction that is loaded and executed by the processor 1501 to implement the methods of fig. 2, 8, 9 or 11 in the embodiments described above. Of course, the electronic device 1500 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the electronic device 1500 may also include other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in an electronic device to perform the methods illustrated in fig. 2, 8, 9, and 11 in the above embodiments is also provided. For example, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (random access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It is noted that the computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

The above-mentioned embodiments are not intended to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application should be included in the protection scope of the present application.

Claims

1. A method of encoding, the method comprising:

dividing an image to be coded into a plurality of image blocks;

for a first image block in the plurality of image blocks, performing feature transformation through a transformation convolutional neural network based on the first image block to obtain a first transform domain component corresponding to the first image block, wherein the first image block refers to any one of the plurality of image blocks;

inputting the quantization result of the first transform domain component into a probability estimation network, and determining a probability distribution of the quantization result of the first transform domain component through the probability estimation network;

2. The method of claim 1, wherein the performing a feature transformation based on the first image block by transforming a convolutional neural network to obtain a first transform domain component corresponding to the first image block comprises:

3. The method of claim 1, wherein said quantizing said first transform domain component to obtain a quantized result of said first transform domain component comprises:

4. The method of claim 3, wherein said determining a quantization step size for said first transform domain component comprises:

taking the first transform domain component as an input of a quantization convolutional neural network, and determining a quantization step corresponding to the first transform domain component through the quantization convolutional neural network, wherein the quantization convolutional neural network comprises the same number of convolutional kernels as the convolutional kernels of the convolutional layers;

alternatively, the first and second electrodes may be,

5. The method of claim 1, wherein the probability estimation networks comprise a first probability estimation network and a second probability estimation network;

the inputting the quantization result of the first transform domain component to a probability estimation network, determining a probability distribution of the quantization result of the first transform domain component by the probability estimation network, comprising:

6. The method of claim 5, wherein the first probability estimation network comprises a first encoder network and a first decoder network, the first encoder network and the first decoder network connected by a quantizer;

said inputting the quantized result of the first transform domain component to the first probability estimating network, obtaining first probability distribution parameters reflecting a probability distribution of the first transform domain component output by the first probability estimating network processing, comprising:

quantizing the first parameter to obtain a quantized first parameter;

7. The method of claim 5, wherein the second probability estimation network comprises a second encoder network and a first human decoder network, the second encoder network and the second decoder network connected by a quantizer;

quantizing the second parameter to obtain a quantized second parameter;

and inputting the quantized second parameter into the second decoder network, and processing and outputting the quantized second parameter by the second decoder network to obtain the second probability distribution parameter.

8. The method of claim 5, wherein said determining a probability distribution of the quantized result of the first transform domain component based on the first probability distribution parameter and the second probability distribution parameter comprises:

9. The method of claim 5, wherein the probability distribution is a gaussian distribution, the first probability distribution parameter is a mean of a corresponding gaussian distribution model, and the second probability distribution parameter is a variance of a corresponding gaussian distribution model; alternatively, the first and second electrodes may be,

10. The method of claim 1, wherein after encoding the quantized result of the first transform-domain component based on the probability distribution of the quantized result of the first transform-domain component, further comprising:

encoding the side information to obtain a second code stream;

11. The method of claim 1, wherein after dividing the image to be encoded into the plurality of image blocks, further comprising:

12. A method of decoding, the method comprising:

13. The method of claim 12, wherein the side information comprises a quantized first parameter and a quantized second parameter, the probability estimation network comprises a first decoder network and a second decoder network;

inputting the quantized first parameter into the first decoder network, obtaining a first probability distribution parameter reflecting a probability distribution of a quantization result of the first transform domain component, which is processed and output by the first decoder network;

14. The method of claim 12 or 13, wherein after obtaining the encoded stream for a first tile of the plurality of tiles, further comprising:

15. An encoding apparatus, characterized in that the apparatus comprises:

16. An apparatus for decoding, the apparatus comprising:

a first decoding module, configured to decode the second code stream to obtain side information of the first image block;

the second decoding module is used for decoding the first code stream based on the probability distribution corresponding to the first image block to obtain a quantization result of a first transform domain component corresponding to the first image block;

a second determining module, configured to determine reconstruction information of the first image block through an inverse transform convolutional neural network based on a quantization result of the first transform domain component;

17. An electronic device, characterized in that the electronic device comprises:

the memory is for storing a computer program and the processor is for executing the program stored on the memory to perform the steps of the method of any of claims 1-11 or 12-14.

18. A computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform the steps of the method of any one of claims 1-11 or 12-14.