CN113256744B

CN113256744B - Image coding and decoding method and system

Info

Publication number: CN113256744B
Application number: CN202010085235.0A
Authority: CN
Inventors: 肖云雷; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2023-03-24
Anticipated expiration: 2040-02-10
Also published as: CN113256744A

Abstract

The invention discloses an image coding and decoding method and system, which are based on a variational self-coding network and are composed of a coding model and a decoding model.

Description

Image coding and decoding method and system

Technical Field

The invention relates to the technical field of image recognition, in particular to an image coding and decoding method and system.

Background

With the advent of the big data information era, image signals are becoming the main body of information storage and transmission, playing an extremely important role in promoting national economy, guaranteeing social security, transmitting advanced culture and the like.

There are a lot of redundancies within the image signal, including spatial redundancy, structural redundancy, and visual redundancy, among others. Just because of such redundancy, the image data can be compressed. In the development of the internet, people acquire information by watching videos and reading pictures, and the space occupied by the videos and the pictures is very large, which results in occupying a large amount of network bandwidth and affecting the transmission speed, so how to compress image data in the aspect of image information transmission to save space is an urgent problem to be solved.

The traditional image coding algorithm bpg is easy to cause exponential rise of the calculated amount and increase of the calculation complexity along with the increase of the quantization level, so that the coding efficiency is low, and the compression performance is low.

Accordingly, there is a need for improvements and developments in the art.

Disclosure of Invention

Therefore, it is necessary to provide a method and a system for encoding and decoding an image, aiming at using different activation functions Softplus, relu and leakyrelu in a variational self-encoding network formed by an encoding model and a decoding model in the process of encoding and decoding an image, so as to realize the best fitting variance, reduce the calculation error, further reduce the display amount and the calculation amount used by a user by reducing the number and the number of channels of a residual block, improve the compression performance and the compression ratio, and achieve the best compression effect.

In order to achieve the purpose, the invention adopts the following technical scheme:

an image encoding method, comprising the steps of:

the method comprises the steps that an original picture is used as an input image and is input into a lossy coding network, and an initial characteristic diagram corresponding to the original picture is output through the lossy coding network;

inputting the initial quantization characteristic diagram obtained after the initial characteristic diagram is quantized into a lossless coding network, and outputting an initial probability diagram corresponding to the initial quantization characteristic diagram through the lossless coding network;

and carrying out arithmetic coding on the initial quantization characteristic graph and the initial probability graph to obtain a compressed intermediate file.

Optionally, the lossy coding network and the lossless coding network form a coding model, and the lossy coding network includes an up-sampling module, a down-sampling module, and a connection module; the outputting of the initial feature map corresponding to the original picture through the lossy coding network specifically includes:

inputting an original picture into an up-sampling module, and outputting a first feature map corresponding to the original picture through the up-sampling module;

inputting the first feature map into a down-sampling module, and outputting a plurality of second feature maps through the down-sampling module;

inputting the first feature map and the plurality of second feature maps into the connection module, and outputting the initial feature map corresponding to the original picture through the connection module.

Optionally, the upsampling module includes a plurality of first convolution layers, the inputting the original picture into the upsampling module, and the outputting the first feature map corresponding to the original picture by the upsampling module specifically includes:

sequentially inputting the original pictures into each first convolution layer, and sequentially outputting a plurality of intermediate first characteristic diagrams through each first convolution layer;

until the first characteristic diagram corresponding to the original picture is output through the last first convolution layer.

Optionally, the down-sampling module includes a plurality of second convolution layers arranged in parallel, the inputting of the first feature map into the down-sampling module, and the outputting of the plurality of second feature maps by the down-sampling module specifically includes:

acquiring a plurality of intermediate first characteristic diagrams output by each first convolution layer;

and correspondingly inputting the intermediate first characteristic diagrams to the second convolution layers respectively, and outputting a plurality of second characteristic diagrams through the second convolution layers.

Optionally, the connecting module includes a connecting layer and a third convolution layer, the inputting the first feature map and the plurality of second feature maps into the connecting module, and the outputting the initial feature map corresponding to the original picture through the connecting module specifically includes:

acquiring a first characteristic diagram and a plurality of second characteristic diagrams corresponding to an original picture;

inputting the first feature map and all the second feature maps into a connecting layer, and outputting a third feature map through the connecting layer;

and inputting the third feature map into the third convolution layer, and outputting an initial feature map corresponding to the original picture through the third convolution layer.

Optionally, the lossless coding network includes a probability coding module and a priori estimation module, where the priori estimation module is used to assist the probability coding module and obtain a pre-estimated variance according to the initial quantization feature map; the probability coding module includes a fourth convolutional layer and a fifth convolutional layer,

the step of inputting the initial quantized feature map obtained by quantizing the initial feature map into the lossless coding network, and the step of outputting the initial probability map corresponding to the initial quantized feature map through the lossless coding network specifically includes:

carrying out rounding quantization processing on the initial characteristic graph to obtain an initial quantization characteristic graph;

carrying out slicing operation on the initial quantization characteristic diagrams to obtain a plurality of slice characteristic diagrams corresponding to the initial quantization characteristic diagrams;

respectively inputting each slice characteristic diagram into a fourth convolution layer, and outputting a plurality of first slice characteristic diagrams through the fourth convolution layer and a leakyrlelu activation function;

and inputting each first slice characteristic diagram into a fifth convolutional layer, fitting the pre-estimated variance through the fifth convolutional layer and a softplus activation function, and outputting an initial probability diagram corresponding to each initial quantization characteristic diagram.

The present invention also provides an image decoding method, including the steps of:

inputting an intermediate file to be decompressed into a lossless decoding network, and outputting an initial quantization characteristic diagram through the lossless decoding network; wherein the intermediate file is based on the intermediate file according to claim 1;

inputting the initial quantization feature map into a lossy decoding network, and outputting an initial picture through the lossy decoding network;

and inputting the initial picture into a post-enhancement network, and outputting an original picture through the post-enhancement network.

Optionally, the lossless decoding network, the lossy decoding network, and the post-enhancement network form a decoding model, the lossy decoding network includes a plurality of first residual blocks and a plurality of sixth convolution layers that are sequentially connected, the initial quantization feature map is input to the lossy decoding network, and outputting the initial picture through the lossy decoding network specifically includes:

inputting the initial quantization feature map into a first residual block, and outputting a middle map corresponding to the initial quantization feature map through the first residual block;

and inputting the intermediate graph into a sixth convolution layer, and outputting an initial picture corresponding to the initial quantization characteristic graph through the sixth convolution layer and an inverse normalization operation.

Optionally, the post-enhancement network includes two seventh convolution layers, a plurality of second residual blocks are disposed between the two seventh convolution layers, the inputting the initial picture into the post-enhancement network, and the outputting the original picture through the post-enhancement network specifically includes:

inputting the obtained initial picture into a seventh convolution layer, and outputting a first enhanced picture corresponding to the initial picture through the seventh convolution layer;

sequentially inputting the first enhanced picture into each second residual block, and outputting a second enhanced picture corresponding to the first enhanced picture through each second residual block;

and inputting the second enhanced picture to a seventh convolution layer of the last layer, and outputting a third enhanced picture corresponding to the initial picture through the seventh convolution layer of the last layer, wherein the third enhanced picture is the original picture.

The invention also provides a system, which comprises a sending terminal and a receiving terminal, wherein the sending terminal and the receiving terminal both comprise a processor and a memory connected with the processor, and for the sending terminal, the memory stores an image coding program which can run on the processor, and the image coding program realizes the image coding method when being executed by the processor; for a receiving terminal, the memory stores an image decoding program operable on the processor, and the image decoding program, when executed by the processor, implements the steps of the image decoding method described above.

Has the advantages that:

compared with the prior art, the image coding and decoding method and system provided by the invention have the advantages that the coding model and the decoding model form a variational self-coding network, the invention adopts the activating functions Softplus, relu and LEAKyrelu in the process of coding and decoding the image through the variational self-coding network, the best fitting variance is realized, the calculation error is reduced, meanwhile, the display quantity and the calculation quantity used by a user are further reduced by reducing the number and the number of channels of the residual block, the compression performance is improved, and the best compression effect is achieved.

Drawings

Fig. 1 is a flowchart of an image encoding method according to the present invention.

Fig. 2 is a block diagram of the overall structure of the coding model and the decoding model provided by the present invention.

Fig. 3 is a structural block diagram of the prior estimation module Z provided in the present invention.

Fig. 4 is a functional schematic block diagram of a probability encoding module AE provided in the present invention.

Fig. 5 is a flowchart of an image decoding method according to the present invention.

Fig. 6 is a block diagram of a first residual block RB according to the present invention.

Fig. 7 is a block diagram of a post-processing enhanced network according to the present invention.

Fig. 8 is a block diagram of a second residual block in the post-processing enhanced network according to the present invention.

Fig. 9 is a functional block diagram of a system provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The image coding method and the image decoding method adopted by the invention are based on a variational self-coding network, the variational self-coding network (VAE) is an unsupervised learning network, can automatically learn characteristics from unmarked data, is a neural network which takes the reconstruction of data which is similar to input data as far as possible as a target, can give better characteristic description than original data, has stronger characteristic learning capability, replaces the original data with the characteristics generated by a common variational self-coding network in deep learning, reduces the interference among wave segments in the original data, reduces the dimensionality of the original data so as to obtain better effect, and therefore, the variational self-coding network provides a better processing means in the aspect of coding and decoding the image. The invention is applied to the fields of image processing, video, multimedia and the like.

Example 1

Referring to fig. 1 and fig. 2, fig. 1 is a flowchart of an image encoding method according to the present invention, and fig. 2 is a block diagram of an image encoding method and an image decoding method according to the present invention, it should be noted that the image encoding according to the embodiment of the present invention is not limited to the steps and the sequence in the flowchart shown in fig. 1, and the steps in the flowchart may be added, removed, or changed according to different requirements.

As shown in fig. 1 and 2, the image encoding method provided by the present invention includes the following steps:

and S10, inputting the original picture serving as an input image into a lossy coding network, and outputting an initial characteristic diagram corresponding to the original picture through the lossy coding network.

As shown in fig. 2, a coding model is set up and trained to obtain a trained coding model, the coding model comprises a lossy coding network and a lossless coding network, the lossy coding network YE adopts a nonlinear coding network, the lossy coding network YE comprises an up-sampling module 1, a down-sampling module 2 and a connection module 3, and outputting an initial feature map corresponding to an original picture through the lossy coding network specifically comprises inputting the original picture into the up-sampling module 1 and outputting a first feature map corresponding to the original picture through the up-sampling module 1; inputting the first feature map into a down-sampling module 2, and outputting a plurality of second feature maps through the down-sampling module 2; inputting the first feature map and the plurality of second feature maps into the connection module 3, and outputting the initial feature map corresponding to the original picture through the connection module 3.

Specifically, the upsampling module 1 includes a plurality of first convolution layers 110, the inputting of the original picture to the upsampling module 1, and the outputting of the first feature map corresponding to the original picture by the upsampling module 1 specifically includes: inputting the original pictures to each first convolution layer 110 in sequence, and outputting a plurality of intermediate first characteristic diagrams through each first convolution layer 110 in sequence; until the first feature map corresponding to the original picture is output through the last first convolution layer 110.

In this embodiment, as shown in fig. 2, the number of the first convolution layers 110 is 4, the structure and the parameters of each first convolution layer 110 are the same, the convolution kernel size of each first convolution layer 110 is 5 × 5, the convolution shift step size is 2, and 192 channels are used for each first convolution layer.

The original picture is input to the first convolution layer 110 to be convoluted, an intermediate first feature map corresponding to the original picture is output, then the intermediate first feature map is input to the second first convolution layer 110 after being subjected to normalization processing (GDN), the same operation is repeated until the intermediate first feature map is input to the fourth first convolution layer 110 to be only convoluted, normalization processing is not performed, and the first feature map corresponding to the original picture is output.

Therefore, the original picture is normalized after the features are extracted, the sizes are unified, the accuracy of the training result is improved, and the processing efficiency is improved.

The down-sampling module 2 includes a plurality of second convolution layers 210 arranged in parallel, the inputting of the first feature map into the down-sampling module 2 and the outputting of the plurality of second feature maps by the down-sampling module 2 specifically include: acquiring a plurality of intermediate first characteristic diagrams output by each first convolution layer 110; the intermediate first characteristic maps are input to the second convolution layers 210, and the second characteristic maps are output through the second convolution layers 210.

In the present embodiment, as shown in fig. 2, each second convolution layer 210 is connected between adjacent first convolution layers 110, and the number of second convolution layers 210 is 1 less than the number of first convolution layers 110, so that the number of second convolution layers 210 is 3, and the structure and parameters of each second convolution layer 210 are different. The convolution kernel size of the first second convolution layer is 9 × 9, the convolution shift step size is 8, the convolution kernel size of the second convolution layer 210 is 5 × 5, the convolution shift step size is 4, the convolution kernel size of the third second convolution layer 210 is 3 × 3, the convolution shift step size is 2, and all second convolution layers 210 process 192 channels.

That is, the 3 intermediate feature maps obtained after the 3 times normalization processing are respectively input to the corresponding second convolution layers 210, and the 3 second feature maps are respectively output through convolution operations of the 3 second convolution layers 210.

The connection module 3 includes a connection layer 310 and a third convolution layer 311, the inputting the first feature map and the plurality of second feature maps into the connection module 3, and outputting the initial feature map corresponding to the original picture through the connection module 3 specifically includes: acquiring a first characteristic diagram and a plurality of second characteristic diagrams corresponding to an original picture; inputting the first feature map and all the second feature maps into a connection layer 310, and outputting a third feature map through the connection layer 310; inputting the third feature map into the third convolution layer 311, and outputting an initial feature map corresponding to the original picture through the third convolution layer 311.

In the present embodiment, as shown in fig. 2, the convolution kernel size of the third convolution layer 311 is 1 × 1, and there is no convolution shift step, which uses 192 channels.

Namely, the first feature map output by the 4 th first convolution layer 110 and the 3 second feature maps output by the 3 second convolution layers 210 are input to the connection layer 310 for splicing processing to obtain a third feature map, and then the third feature map is input to the third convolution layer 311 for convolution calculation to extract features and output an initial feature map corresponding to an original picture.

Thus, compared with the existing image coding algorithm, the size and the number of the convolution kernels are changed, so that the coded image has small amplitude improvement on Index Peak Signal-to-Noise Ratio (psnr) and Index Multi-Scale structure Similarity (MS-SSIM), the psnr is improved by 0.2, and the Index MS-SSIM is improved by 0.0004.

The number of convolution layers, the convolution kernel size, and the convolution shift step size are not limited.

And S20, inputting the initial quantization characteristic diagram obtained after the initial characteristic diagram is quantized into the lossless coding network, and outputting an initial probability diagram corresponding to the initial quantization characteristic diagram through the lossless coding network.

Specifically, the initial feature map obtained in step S10 is subjected to quantization processing, which is rounding quantization (i.e., quantiz), that is, the initial feature map is rounded to obtain an initial quantized feature map YQ.

Therefore, the method adopts an integral quantization mode, which is different from the existing mode of adding random noise to carry out quantization, so that the index peak signal-to-noise ratio and the index multi-level structure similarity are further improved by a small amplitude, namely the index psnr is improved by 0.5, and the index MS-SSIM is improved by 0.015.

As shown in fig. 3, the lossless coding network includes a probability coding module AE and an a priori estimation module Z, the a priori estimation module Z is used to assist the probability coding module AE, and the a priori estimation module Z obtains an estimated variance according to the initial quantization feature map; the priori estimation module Z includes an encoding estimation unit ZE and a decoding estimation unit ZD, the encoding estimation unit ZE includes a plurality of eighth convolutional layers 410 connected in sequence, and before every two eighth convolutional layers 410 are input, a leak relu activation function is adopted, so that efficiency can be improved, and calculation dimension can be reduced.

In a specific implementation, the encoding estimation unit ZE includes 5 eighth convolutional layers 410, all of the convolutional kernels 410 have the same size, 3 × 3, and 128 channels are used, and only the third eighth convolutional layer 410 (3) and the last eighth convolutional layer 410 (5) have convolution shift step sizes of 2, and the other eighth convolutional layers 410 have no convolution shift step size.

The decoding estimation unit ZD and the encoding estimation unit ZE have the same structure and parameters, and the decoding estimation unit ZD includes a plurality of ninth convolutional layers 510, and after every two ninth convolutional layers 510 are input, by using a leak relu activation function, only the last ninth convolutional layer 510 is output by a softplus activation function.

In a specific implementation, the number of the ninth convolutional layers 510 is 5, all the convolutional kernels of the ninth convolutional layers 510 have the same size, 3 × 3, 128 channels are adopted, the convolutional shift step length of only the second ninth convolutional layer 510 (2) and the fourth ninth convolutional layer 510 (4) is 2, and no convolutional shift step length exists in the other ninth convolutional layers 510.

The obtaining of the pre-estimated variance according to the initial quantization feature map by the prior estimation module Z specifically includes:

1. solving a priori estimation characteristic diagram Z of an initial quantization characteristic diagram YQ

That is, the initial quantized feature map YQ is subjected to absolute value processing (i.e., ABS), the processed YQ is input to the first eighth convolutional layer 410 for convolution operation, the output data passes through the activation function leak relu and is input to the second eighth convolutional layer 410 and the third eighth convolutional layer 410 for convolution operation, the output data passes through the activation function leak relu again and is input to the last 2 eighth convolutional layers 410 for convolution operation, and a series of feature extractions are performed to obtain the prior estimated feature map Z corresponding to the initial quantized feature map.

2. And carrying out rounding quantization processing on the prior estimation characteristic diagram Z to obtain a quantized prior estimation characteristic diagram ZQ.

3. In order to accelerate the processing efficiency of the intermediate file, the quantized prior estimation feature map ZQ is input to a decoding prior estimation unit ZD to obtain an estimated variance sigma.

Specifically, after ZQ is input to the first 2 ninth convolutional layers 510 for convolution operation, the activation function leak relu is input again through the activation function leak relu, and then the 2 ninth convolutional layers 510 are input again, the convolution operation is repeatedly performed and the activation function leak relu is adopted until ZQ is input to the last ninth convolutional layer 510 for convolution operation, and then the activation function softplus is adopted, so that the estimated variance σ is output.

Referring to fig. 4, as shown in fig. 4, the probability coding module AE includes a fourth convolutional layer 420 and a fifth convolutional layer 520, the inputting the initial quantization feature map into the lossless coding network, and outputting the initial probability map corresponding to the initial quantization feature map through the lossless coding network specifically includes: carrying out quantization processing on the initial characteristic diagram to obtain an initial quantization characteristic diagram; carrying out slicing operation slice on the initial quantization characteristic graphs to obtain a plurality of slice characteristic graphs corresponding to the initial quantization characteristic graphs; inputting each slice feature map into a fourth convolutional layer 420, and outputting a plurality of first slice feature maps through the fourth convolutional layer 420 and a LEAKyrelu activation function; inputting each first slice feature map into the fifth convolutional layer 520, fitting the estimated variance σ through the fifth convolutional layer 520 and a softplus activation function, and outputting an initial probability map corresponding to each initial quantization feature map.

It should be noted that the softplus activation function is used for the last layer output of the probability coding module AE, and the estimated variance of each initial quantization feature map can be smoothly and nonlinearly fitted, unlike the relu function used in the prior art, the fitted value is not the estimated variance, and the image compression effect is poor.

As shown in fig. 4, 4 slice feature maps subjected to slicing operations, namely z _1_00, z _1_11, z _1_01, z _1_10, are extracted by odd-even extraction, and the slicing operation in this embodiment is performed only 2 times for each slice feature map, which is simpler and more convenient than the infinite slicing operation in the prior art, thereby not only reducing the computational dimension and complexity, but also improving the efficiency and saving the time. For example, the first slicing operation, acquiring z _1_00, the second slicing operation, acquiring z _2_00, completes two slices, is not segmented, as the object parameter for probability calculation.

As shown in fig. 4, in this embodiment, the convolution kernels of the fourth convolution layer 420 and the fifth convolution layer 520 are 3 × 3, but 32 channels are used for the fourth convolution layer 420, and 128 channels are used for the fifth convolution layer 520, and there is no convolution shift step value.

In some embodiments, when the structure of the slice feature map of the plurality of slice feature maps obtained by the slicing operation cannot reach the expected slice structure (is insufficient), the probability calculation needs to be performed after the completion, that is, the probability calculation is performed automatically through the setting of the program and the code, so that the probability calculation is performed uniformly, and the processing efficiency and the data accuracy are improved. Therefore, before the slice feature map is input to the probability coding module AE, it is necessary to determine whether all slice feature maps have a slice structure that does not satisfy the expectation to determine which probability coding module AE to use, if not, the structure of the probability coding module AE is as described above, and if yes, the probability coding module AE selected at this time is: the convolution kernels of the fourth convolution layer 420 and the fifth convolution layer 520 are 3 × 3, but 32 channels are used for the fourth convolution layer 420, and 256 channels are used for the fifth convolution layer 520, and both have no convolution shift step value.

For example, as in FIG. 4, z _1 _11and z _2 _11fill up a structured prefix pattern to the sliced slice feature map via the device pad, resulting in z _1_11_ pad and z _2_11_ pad. And merging the previous slice characteristic image after the same slice with the current complemented slice characteristic image, and performing probability calculation to obtain an initial probability image.

The following describes the technical scheme for obtaining the initial probability map, taking z _1 \/00 (slice structure satisfied) and z _1 \/11 (slice structure insufficient) as examples:

as shown in fig. 4, after the sliced slice feature map z _1 _00is sent to the fourth convolutional layer 420 for convolution operation, an activation function leak Relu is selected for fitting, then the slice feature map is sent to the fifth convolutional layer 520 for convolution operation, an activation function Softplus is selected for fitting in combination with the estimated variance σ, and the probability (represented by P or PDF) is calculated according to the formulas (1) and (2), so that the probability value P _1 _11of the slice feature map z _1 _00is obtained.

In the formula (1), p _z (z) is the probability corresponding to the quantitative a priori estimated feature map ZQ,

is a mathematical expectation, a normal distribution expression; j denotes the jth quantized a priori estimated feature map.

Meanwhile, the initial quantization feature map YQ is directly input to the probability coding module AE, and the probability of the initial quantization feature map YQ is calculated by the formula (2) in combination with the above-mentioned estimated variance σ.

In the formula (2), p _y (y) is an initial probability map corresponding to the initial quantization feature map YQ,

is a mathematical expectation, a normal distribution expression; i denotes the ith initial quantization profile.

The slice feature map z _1 _11is subjected to a device page complementing structure to obtain an expected slice feature map z _1_11_ page, after merging operation merge is performed on the expected slice feature map z _1 _00and an adjacent previous slice feature map z _1_00, the expected slice feature map is sent to a fourth convolutional layer 420 for convolution operation, after fitting is performed by adopting an activation function Leaky Relu, the expected slice feature map is sent to a fifth convolutional layer 520 for convolution operation, the last layer is subjected to activation function Softplus, and the estimated variance σ is fitted to obtain a merged probability value (PDF) of the expected slice feature map after feature merging, and then the merged probability value enters a classifier, namely split operation split is required to be performed, so that two probability values, namely P _1 _10and P _1 _01are obtained. For example, after two 192 channels pass through merge, the 384 channels are obtained, and after the 384 channels pass through split, the two 192 channels are separated.

And S30, carrying out arithmetic coding on the initial quantization characteristic graph and the initial probability graph to obtain a compressed intermediate file.

That is, the initial probability map corresponding to the initial quantization feature map YQ obtained in step S20 and the initial quantization feature map YQ obtained in step S10 are obtained, and binary conversion is performed on the YQ and the initial probability map to obtain the intermediate file bits in the binary format.

Therefore, the compression ratio of the intermediate file obtained by the invention is improved under the condition of ensuring that the image is not distorted through a trained coding model, so that the flow and the broadband occupied by transmission are greatly reduced, and the transmission efficiency is improved.

Example 2

The present invention also provides an image decoding method, as shown in fig. 5, the image decoding method includes the following steps:

the decoding model comprises a lossless decoding network, a lossy decoding network and a post-enhancement network. S001, inputting the intermediate file to be decompressed into a lossless decoding network, and outputting an initial quantization characteristic diagram through the lossless decoding network; wherein, the intermediate file is the intermediate file obtained in the step S40.

As shown in fig. 2, a decoding model is built and trained to obtain a trained decoding model, where the trained decoding model includes a lossless decoding network, a lossy decoding network, and a post-enhancement network. Wherein the structure and parameters of the lossless decoding network are the same as those of the lossless encoding network in embodiment 1. The lossless decoding network comprises a probability decoding module AD and a priori estimating module Z, wherein the priori estimating module Z is used for assisting the probability decoding module AD, and the structure of the priori estimating module Z is the same, so that the detailed description is omitted.

Specifically, the intermediate file is input to the probability decoding module AD for decoding, the quantized prior estimated feature map ZQ is decoded and input to the prior estimation module Z, and the estimated variance σ is obtained. Next, an initial probability map corresponding to the initial quantization feature map YQ is obtained according to the formula (2) in the above embodiment 1, and the initial probability map is arithmetically encoded to obtain the initial quantization feature map YQ.

And S002, inputting the initial quantization feature map into a lossy decoding network, and outputting an initial picture through the lossy decoding network.

As shown in fig. 2, the lossy decoding network adopts a nonlinear decoding network, which includes a plurality of first residual blocks RB and a plurality of sixth convolution layers 610 connected in sequence, the inputting the initial quantization feature map YQ into the lossy decoding network, and outputting an initial picture through the lossy decoding network specifically includes:

inputting the initial quantization feature map into a first residual block RB, and outputting a middle map corresponding to an initial quantization feature map YQ through the first residual block RB;

inputting the intermediate map into a sixth convolution layer 610, and restoring an initial picture corresponding to the initial quantized feature map YQ through the sixth convolution layer 610 and an inverse normalization operation.

In this embodiment, as shown in fig. 2, the number of the first residual blocks RB is 3, the structure and the parameters of each first residual block RB are the same, the specific structure of each first residual block RB is as shown in fig. 6, the first residual block RB includes two layers of residual convolutional layers, the structures and the parameters of the two layers of residual convolutional layers are the same, the sizes of convolutional kernels are both 3 × 3, 192 channels are processed, no convolutional shift step length value is provided, and the data output between the two layers of residual convolutional layers adopts an activation function relu. The invention adds 3 first residual blocks RB when inputting to the lossy coding network, which is different from the prior art that the first residual blocks RB are not additionally added, thus the invention can further improve the index peak signal-to-noise ratio and the multi-level structure similarity by a small range, namely =, namely psnr is improved by 0.4, and the index MS-SSIM is improved by 0.015, so that the decompression quality is better.

With reference to fig. 2, the number of the sixth convolutional layers 610 is 8, and the sixth convolutional layers are divided into 4 groups, for example, the sixth convolutional layer 1 and the sixth convolutional layer 2 are 1 group, the sixth convolutional layer 3 and the sixth convolutional layer 4 are 2 groups, and so on, the sizes of the convolutional cores of the 2 sixth convolutional layers in the same group are all 3 × 3, and 192 channels are processed, but the sixth convolutional layer input first in the same group, for example, the sixth convolutional layer 1, has a convolutional moving step size, and has a value of 2, and the sixth convolutional layer input later, for example, the sixth convolutional layer 2, does not need to be moved and divided.

After the YQ is processed by the first residual block RB 3 times, the YQ is input to the sixth convolutional layer 1 and the sixth convolutional layer 2 of the 1 st group for convolution operation, the output data is subjected to inverse normalization processing (i.e., IGDN), then input to the sixth convolutional layer 3 and the sixth convolutional layer 4 of the 2 nd group, the inverse normalization processing is performed after the convolution operation, and the operation is repeated until the last sixth convolutional layer 8 is input for convolution operation, and then an initial picture corresponding to the initial quantization feature map YQ is output.

And S003, inputting the initial picture into a post-enhancement network, and outputting the original picture through the post-enhancement network.

In order to make the initial picture output in step S002 closer to the original picture, the initial picture needs to be enhanced to ensure that the enhanced initial picture is restored to the original picture.

Specifically, as shown in fig. 7, the post-enhancement network 700 includes two seventh convolution layers 710, a plurality of second residual blocks 720 are disposed between the two seventh convolution layers 710, the inputting the initial picture into the post-enhancement network 700, and the outputting the original picture through the post-enhancement network specifically includes: inputting the obtained initial picture into a seventh convolution layer 710, and outputting a first enhanced picture corresponding to the initial picture through the seventh convolution layer 710; sequentially inputting the first enhanced picture to each second residual block 720, and outputting a second enhanced picture corresponding to the first enhanced picture through each second residual block 720; the second enhanced picture is input to the seventh convolution layer 710 of the last layer, and a third enhanced picture corresponding to the initial picture is output through the seventh convolution layer 710 of the last layer, that is, the third enhanced picture is the original picture.

In this embodiment, the two seventh convolution layers 710 have the same structure and parameters, the convolution kernel size is 3 × 3, 32 channels are adopted, and there is no convolution step. In the present invention, 3 second residual blocks 720 are disposed between two seventh convolution layers 710, unlike the prior art. Each second residual block 720 has the same structure and parameters, the structure of each second residual block 720 is shown in fig. 8, the second residual block 720 includes two layers of enhanced residual convolutional layers, the structures and parameters of the two layers of enhanced residual convolutional layers are the same, the sizes of convolutional kernels are both 3 × 3, 32 channels are adopted, and no convolutional shift step is provided. The learklelu activation function is employed before each enhancement residual convolutional layer is input.

Therefore, compared with the existing method adopting 64 channels, the number of channels of 3 second residual blocks in the post-enhancement network is reduced, and the number of the second residual blocks is reduced, so that the method further reduces the calculation complexity, improves the processing efficiency, reduces the display quantity and is convenient for users.

In the training process, a coding model and a decoding model are built, and parameter learning of a lossy coding network, a lossless decoding network, a lossy decoding network and a post-enhancement network is guided according to a loss value loss obtained by a loss function, so that the coding model and the decoding model are trained, and the trained coding model and the trained decoding model are obtained.

Specifically, a plurality of preset weights λ are obtained, and a loss value loss between the original picture and the third enhanced picture obtained in step S003 is calculated by using Mean square Error MSE (Mean Squared Error) and entropy coding loss function entrypy _mse Then, the total loss value loss is calculated according to the formula (3). Wherein, theThe weight is used for balancing the compression ratio and the image quality, and the larger the lambda is, the smaller the compression ratio is, but the better the recovered image quality is.

loss＝λ*loss _mse +loss _entropy (3)

Thus, the original picture to be compressed is input to the trained variational self-coding network, and the obtained target picture is the original picture.

The coding model and the decoding model form a variational self-coding network, the best fitting estimation variance is realized by adopting the activating functions Softplus, relu and leakyrelu in the process of coding and decoding pictures, the calculation error is reduced, the display amount and the calculation amount used by a user are further reduced by reducing the number and the number of channels of residual blocks, the compression performance is improved, and the best compression effect is achieved.

The effectiveness of the method adopted by the invention is verified through experiments:

the machine test platform is as follows: an E5-2680 processor, a 128G memory, a GTX1080TI 11G video card and a linux system. This experiment was tested on 50 1920x1280 size pictures and the results averaged. The experimental results include where λ is 640, 2560 and 7680, as shown in table 1 below. The 640 weight, 2560 weight and 7680 weight means the image codec algorithm employed by the present invention.

Name type	PSNR	MS-SSIM	bpp
				640 weight	36.2363	0.9808	0.2393
BPG	36.520	0.980	0.250
				2560 weight	39.1225	0.9896	0.443
BPG	39.779	0.989	0.464
				7680 weighting	41.2722	0.9935	0.6842
BPG	41.747	0.993	0.691

TABLE 1

Table 1 shows the comparison of the average indexes of different image algorithms, where psnr refers to the peak signal-to-noise ratio, MS-SSIM refers to the multi-level structural similarity, and bpp (bitsperpixel) refers to the number of pixel bits and the depth of pixel. BPG is an existing image coding algorithm.

As can be seen from Table 1, under the same index parameters (under the same bpp, the higher the PSNR and Ms-ssim, the better), the method adopted by the invention has higher quality, higher compression ratio, reduced computational complexity and reduced flow occupation space compared with the prior art.

Example 3

Based on the above image encoding and decoding methods, the present invention also provides a system, as shown in fig. 9, the system includes a sending terminal 800 and a receiving terminal 900, for the sending terminal 800, the sending terminal processor 12 is connected to a sending terminal memory 11 of the sending terminal processor 12, the sending terminal memory 11 stores and can run an image encoding program on the sending terminal processor 12; for the receiving terminal 900, the receiving terminal 900 includes a receiving terminal processor 112 and a receiving terminal memory 111 connected to the receiving terminal processor 112, and the receiving terminal memory 111 stores an image decoding program executable on the receiving terminal processor 112. FIG. 9 shows only some of the components of the system, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The sending terminal storage 11 and the receiving terminal storage 111 may in some embodiments be internal storage units of the respective terminals, such as the internal memory of the terminals. In other embodiments, the sending terminal memory 11 and the receiving terminal memory 111 may also be external storage devices of the corresponding terminal, such as a plug-in usb disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the transmitting terminal memory 11 and the receiving terminal memory 111 may also include both an internal storage unit and an external storage device of the terminal. The transmitting terminal memory 11 and the receiving terminal memory 111 are used to store application software installed in the terminal and various data, for example, the transmitting terminal memory 11 stores an image encoding program, and the receiving terminal memory 111 stores an image decoding program. The transmitting terminal memory 11 and the receiving terminal memory 111 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the sending terminal memory 11 and the receiving terminal memory 111 have stored thereon image encoding and decoding programs, respectively, that can be executed by the processor of the corresponding terminal to implement the image encoding and decoding method, as described above.

The sending terminal processor 12 and the receiving terminal processor 112 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor, a mobile phone baseband processor or other data Processing chip, and are configured to run program codes or process data stored in the sending terminal memory 11 and the receiving terminal memory 111, for example, to execute the image encoding and decoding methods, and the like, as described in the above method.

In summary, the present invention provides an image encoding and decoding method and system, in which the encoding model and the decoding model form a variational self-encoding network, and the present invention uses the activation functions Softplus, relu and leakyrelu in the process of encoding and decoding the image through the variational self-encoding network, so as to realize the best fitting to estimate the variance, reduce the calculation error, and at the same time, by reducing the number of channels and the number of residual blocks, further reduce the amount of display and calculation used by the user, improve the compression performance, and achieve the best compression effect.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program instructing relevant hardware (such as a processor, a controller, etc.), and the program may be stored in a computer readable storage medium, and when executed, the program may include the processes of the above method embodiments. The storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. An image encoding method, characterized by comprising the steps of:

carrying out arithmetic coding on the initial quantization characteristic graph and the initial probability graph to obtain a compressed intermediate file;

the lossy coding network and the lossless coding network form a coding model, and the lossy coding network comprises an up-sampling module, a down-sampling module and a connecting module;

the lossless coding network comprises a probability coding module and a priori estimation module, wherein the priori estimation module is used for assisting the probability coding module and acquiring a pre-estimated variance according to the initial quantization characteristic diagram; the probability coding module comprises a fourth convolutional layer and a fifth convolutional layer;

inputting each slice characteristic diagram into a fourth convolution layer respectively, and outputting a plurality of first slice characteristic diagrams through the fourth convolution layer and a LEAKyrelu activation function;

2. The image encoding method according to claim 1, wherein the outputting the initial feature map corresponding to the original picture via the lossy encoding network specifically includes:

3. The image encoding method of claim 2, wherein the upsampling module includes a plurality of first convolution layers, the inputting the original picture into the upsampling module, and the outputting the first feature map corresponding to the original picture by the upsampling module specifically includes:

4. The image encoding method of claim 3, wherein the downsampling module includes a plurality of second convolution layers arranged in parallel, and the step of inputting the first feature map into the downsampling module and outputting the plurality of second feature maps through the downsampling module specifically includes:

5. The image encoding method according to claim 4, wherein the connection module includes a connection layer and a third convolution layer, the inputting the first feature map and the plurality of second feature maps into the connection module, and the outputting the initial feature map corresponding to the original picture by the connection module specifically includes:

6. An image decoding method, characterized by comprising the steps of:

7. The image decoding method according to claim 6, wherein the lossless decoding network, the lossy decoding network, and the post-enhancement network form a decoding model, the lossy decoding network includes a plurality of first residual blocks and a plurality of sixth convolutional layers, which are sequentially connected, the inputting the initial quantization feature map into the lossy decoding network, and the outputting the initial picture through the lossy decoding network specifically includes:

and inputting the intermediate graph into a sixth convolution layer, and outputting an initial picture corresponding to the initial quantized feature graph through the sixth convolution layer and an anti-normalization operation.

8. The image decoding method of claim 7, wherein the post-enhancement network includes two seventh convolutional layers, a plurality of second residual blocks are disposed between the two seventh convolutional layers, the inputting the initial picture to the post-enhancement network, and the outputting the original picture through the post-enhancement network specifically includes:

9. A system comprising a transmitting terminal and a receiving terminal, each comprising a processor and a memory connected to the processor, the memory storing an image encoding program executable on the processor for the transmitting terminal, the image encoding program when executed by the processor implementing the image encoding method of any one of claims 1 to 5; for a receiving terminal, the memory stores an image decoding program executable on the processor, the image decoding program implementing the steps of the image decoding method according to any one of claims 6 to 8 when executed by the processor.