CN111683250A

CN111683250A - Generation type remote sensing image compression method based on deep learning

Info

Publication number: CN111683250A
Application number: CN202010404524.2A
Authority: CN
Inventors: 种衍文; 翟亮; 潘少明
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-09-18
Anticipated expiration: 2040-05-13
Also published as: CN111683250B

Abstract

The technical scheme of the invention provides a deep learning-based generation-type remote sensing image compression method. The invention adopts a Pythroch deep learning framework for training, takes an auto encoder (AutoEncoder) + generation confrontation model (GAN) as an example, and a network model is mainly divided into three parts, namely an encoder, a pre-quantization and quantization module, a decoder (generator) and a discriminator. The framework is suitable for compression processing of homologous remote sensing images with any spectral dimension, compression and transmission of remote sensing images under the conditions of low bandwidth and low code rate, has excellent image reconstruction capability, is optimized for the scale and the running speed of a deep neural network, and is convenient for deployment and popularization of equipment facing the Internet of things.

Description

Generation type remote sensing image compression method based on deep learning

Technical Field

The invention belongs to the field of remote sensing image compression, and particularly relates to a method for compressing and decompressing a remote sensing image by using a deep learning framework.

Background

Compared with natural images, the spectrum dimension of the remote sensing image contains richer information, and the remote sensing images are various in types and large in data volume. By utilizing the spectral curve difference of different ground objects, the remote sensing image is widely applied to various fields of national economy. With the popularization of the application of the high-resolution remote sensing imaging technology, the challenges of how to effectively compress the transmission and storage data volume and the like caused by the obvious improvement of the spectrum and the spatial resolution of the remote sensing image are the problems to be solved urgently in the application process of the remote sensing image.

The emerging image processing method Deep Learning (Deep Learning) accomplishes a specific task by Learning features of a target from a large number of training samples. Deep learning has achieved a great deal of success in the fields of image processing, such as image classification, object detection, pedestrian re-recognition, and the like.

At present, the existing deep learning technology is mostly used for compressing common visible light images, and the remote sensing image compression technology based on deep learning is still less. Toderici (reference Toderici G, Vincent D, Johnston N, et al. full Resolution Image Compression with Current Neural Networks [ J ]. arXiv prediction arXiv:1608.05148,2016.) et al propose variable ratio Image Compression algorithms based on long and short time memory Networks. The algorithm inputs a 32 x 32 image into the network, realizes the compression of the image by reducing the scale of the image and adjusting the number of characteristic graphs, and then realizes the restoration of the image information by decoding the network. Ball (references Ball J, laprara V, simocelii E p. end-to-end Optimized Image Compression [ J ]. arXiv preprint arXiv:1611.01704,2016.) and the like use convolutional neural networks to achieve Compression of images. The network comprises three parts of an analysis transformation structure, a quantization structure and a synthesis transformation structure, wherein the structures mainly comprise a convolution layer, an image down-sampling layer, a GD N normalization layer and the like. Li (refer to Li M, Zuo W, Gu S, et al. left connecting conditional Networks for Content-weighted Image Compression [ J ]. arXiv preprinting Xiv:1703.10553,2017.) and so on, proposes an Image Compression technology based on Image Content weighting, the method uses different bit rate coding for different Image contents, adds a significance map concept on the basis of the traditional self-encoder structure, and realizes the code rate control of different Image contents through the significance map. However, the methods proposed by these authors are directed to compression of visible light images, not remote sensing images. In addition, with the improvement of related computing power schemes such as a supercomputing chip, the condition that the deep learning model is deployed in the on-satellite environment is increasingly mature, and how to overcome the barriers of the deep model on scale and time is also an important issue.

In summary, the current remote sensing image compression algorithm needs to design a set of relatively universal compression scheme aiming at the huge difference existing in the spectrum number of different remote sensing images so as to automatically adapt to the remote sensing image compression processing under the condition of different spectrum numbers; meanwhile, in order to solve the problem of rapid compression processing of massive remote sensing images, higher rate-distortion compression algorithm performance needs to be realized; in addition, in order to meet the application requirements of compressing and deploying the remote sensing images on small Internet of things facilities such as the satellite and the like, the provided compression algorithm and model need to meet the limitation of the limited resource scale requirement and the less inference time requirement of a deployment platform.

Disclosure of Invention

In order to solve the problems, the invention provides a deep learning generation type remote sensing image compression method, which adopts a mode of 'Auto-Encoder) + generation countermeasure model (GAN'), and completes the compression processing of the self-adaptive remote sensing image meeting the requirement of a small-size Internet of things deployment environment through the processing of three parts of an Encoder, a quantizer, a decoder (generator) and a discriminator.

The invention relates to a deep learning generation type remote sensing image compression method, which adopts the technical scheme that: the image tensor is compressed by an encoder network to obtain a hidden representation tensor of 1/128 scales of an original image, the hidden representation tensor is input to a quantizer network to be subjected to pre-quantization and quantization to obtain a binary code stream, the quantized binary code stream is input to a decoder (generator) to obtain a reconstructed image, the reconstructed image is input to a discriminator network to be subjected to discrimination, and the generator (decoder) and the discriminator are subjected to limited game (training) to achieve a Nash equilibrium state (network convergence) and achieve rate distortion optimization of the image.

The encoder network comprises a channel-adaptor module and a downlink block (downblock) module, wherein the channel-adaptor module is a convolutional layer (kernel-size 3, padding ng 1) with a reserved space dimension, image tensors (B, C, H, W) are unchanged through the channel-adaptor space dimension (H, W), the channel number is changed into 4 × max 8, C), wherein B is the batch processing number, C is an image channel, H is the image height, W is the image width, the formula C is a specific channel number, m (usually, 3, 4, 5) downblocks constructed based on a dense layer network (dense) in the encoder are combined by a dense module (d-dense block) and a downsampling module (downblock), the d-dense block is formed by 4 dense units (d-discrete units) and a downsampling module (downblock), the d-dense block is formed by a dense module (d-dense module) and a dense module (dense module) and a normalized block (compressed block) which are combined by a compressed module (compressed-compressed unit (compressed-compressed) and a compressed block (compressed-compressed block) and a compressed-compressed¹⁰) And (5) compressing the multiplying power.

The quantizer network includes a pre-quantization and quantization processing module. According to the scheme, a pre-quantization processing module based on discrete neural network learning is introduced into a quantizer network, a code stream (B x C H W) is mapped into an embedded popular space (C x (B x H W)) in a bottleneck layer (bottleneck), loss functions constructed by KL divergence are used, the learning parameters are the category distribution of the dimension B x H W of C, and clustering of structures is achieved. The pre-quantization module is implemented by matrix vector operation. The quantization processing module is used for carrying out { -1,1} binarization processing on the feature maps (feature maps) after pre-quantization to obtain a code stream.

The decoder (generator) and the arbiter together form a generative confrontation model (GAN): the decoder (generator) is composed of m (usually taking the value of 3, 4 and 5) ascending blocks (upblocks) constructed based on dense networks (densenet); each upblock consists of a u-denseblock and an upsample (pixel-buffer) module; the u-densenblock consists of 4 u-densenites, the output of the u-densenblock is the sum of the splices of all u-densenites in the C dimension; the u-densecuit is composed of convolution layer with IGDN (inverse GENERALIZED NORMALIZATION), Leaky Relu activation and output C as m. The basic structure of the discriminator network is 4 stack-type connected convolution layers, and the characteristic distance of the last layer of convolution layer is taken as distance measurement.

According to the deep learning generation type remote sensing image compression method, a loss function L used for model training is as follows:

L＝(1-MSSSIM)+MSE+0.01×PSNR+Pro_Q_diff+GAN_loss

MSSSIM represents the similarity of an image multi-scale structure, MSE is mean square error, PSNR is the peak signal-to-noise ratio of an image signal, Pro _ Q _ diff is the loss of a pre-quantization module, and GAN _ loss is the generation of countermeasure loss.

(1) The scheme has better performance for high-resolution remote sensing images and even high-spectrum images with spectrum numbers close to natural images. For an image tensor (C H W) with a spectral dimension (C) of n, a 3X 3 convolution layer is input before encoding, and nonlinear processing is carried out to output a tensor with a height H and a width W of 32. The encoder consists of m (generally taking values of 3, 4 and 5) downllocks constructed based on densenet; each downblock consists of a denseblock module and a downsample module; the denseblock consists of 4 denseblocks, the output of which is the sum of the splices in the C dimension of all the denseblocks; the densunit is composed of a convolutional layer with a GDN (generalized NORMALIZATION transport) NORMALIZATION, a LeakyRelu activation and an output C of m in sequence. The corresponding decoder consists of m upsblocks constructed based on densenet; each upblock consists of a denseblock module and an upsample module; the following elements of the denseblock hierarchy are in accordance with the encoder section.

(2) The invention designs a low-code-rate compression scheme aiming at the characteristic of huge data volume of the remote sensing image. The scheme adopts a paradigm of 'Auto-encoder) + generation of a countermeasure model (GAN'). The image features (feature maps) are extracted from the image X by an encoder and mapped to a hidden space Z, then the image is reconstructed by a decoder (generator) and tries to be verified by a discriminator after the processes of pre-quantization, quantization and entropy coding, and the discriminator tries to falsify the reconstructed image. The image compression framework of the discriminator-generator mode is suitable for extremely low code ratesIn the scheme, the original image is implemented by an encoder to be n/(m × 2)¹⁰) And compressing the multiplying power, and then carrying out pre-quantization, quantization and entropy coding to obtain a final code stream.

(3) The invention designs a compression scheme for fusing the three parts aiming at the inter-spectral correlation, the spatial correlation and the texture characteristic of the remote sensing image. The framework is designed on the basis of densener at an encoder, integrates the spatial-spectral information sequence again to extract feature maps (feature maps) while reducing dimension, and decouples the context of the feature maps (feature maps) by utilizing a self-attention mechanism so as to inhibit noise and eliminate redundant information. Meanwhile, a pre-quantization module based on discrete neural network learning is introduced into a quantization module, a loss function constructed by KL divergence in an embedding popular space (C (B) H W)) is mapped to a code stream (B C H W) in a bottleneck layer (bottleeck), the class distribution of the dimension B H W with the learning parameter of C is realized, and the clustering of the structure is realized on the basis of an attention mechanism.

(4) Aiming at the difficulty that a deep neural network model is deployed on small-sized Internet of things equipment, the network model is optimized, the encoder-decoder is constructed by adopting the densenet unit, the cross-layer connection between the encoder and the decoder realizes the high fusion of information, the utilization rate of model parameters is greatly improved, and compared with the conventional common residual neural network (resnet) unit structure with the same performance, the model parameter scale is reduced by half.

Therefore, the invention has the following advantages: the method is suitable for compression processing of homologous remote sensing images with any spectral dimension, namely, a network can directly process the remote sensing images in homologous data sets, and end-to-end remote sensing image compression is realized without preprocessing aiming at the spectral dimension of the images. The framework is very suitable for remote sensing image compression and transmission under the conditions of low bandwidth and low code rate, and has excellent image reconstruction capability. In consideration of the limitation of the environment of small Internet of things equipment such as the satellite, the framework is optimized for the scale and the running speed of the deep neural network, and the deployment and the popularization of the Internet of things equipment are facilitated.

Drawings

Fig. 1 is a schematic diagram of a paradigm network of "Auto-Encoder (Auto-Encoder) + generation countermeasure model (GAN)" in an embodiment of the present invention.

Fig. 2 is a schematic diagram of an encoder-decoder-pre-quantization-discriminator module according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a dense-unit in the embodiment of the invention.

Fig. 4 is a schematic diagram of a denseblock structure in the embodiment of the present invention.

FIG. 5 is a schematic diagram of a densener encoder according to an embodiment of the present invention.

FIG. 6 is a schematic diagram of a densener decoder according to an embodiment of the present invention.

FIG. 7 is a schematic diagram of the principle of pre-quantization (quantization-quantization) in the embodiment of the present invention.

FIG. 8 is a reconstruction effect diagram of a high-resolution remote sensing image at a compression rate of 0.104bpp in an embodiment of the present invention, where (a) (c) is an original image, and (b) (d) is a reconstruction effect diagram.

Detailed Description

The following explains a specific compression flow with reference to examples and drawings.

The method takes a 3 × 64 × 64 image as a training image and a 3 × 512 × 512 image as a test image, and mainly comprises the following steps:

1. data set preparation and neural network hyper-parameters:

1.1 about 8000 Hi-Bid remote sensing images were randomly cropped into image blocks of size 64 × 64 × 3.

1.2, converting the clipped image blocks into tensors with 8 × 64 × 64 × 3 specification and with the blocksize as 8, preparing an input network model for training, iterating all data for 100 times, and using a loss function L in the training as follows:

L＝(1-MSSSIM)+MSE+0.01×PSNR+Pre_Q_diff+GAN_loss

wherein MSSSIM represents image multi-scale structure similarity, MSE is mean square error, PSNR is image signal peak signal-to-noise ratio (loss of MSSSIM, MSE, PSNR as encoder network), Pre _ Q _ dif is loss of Pre-quantization module (loss of quantization processing module is small and negligible), GAN _ loss is generation countermeasure loss (the decoder (generator) and the discriminator together form generation countermeasure model (GAN)).

2. And (3) encoding:

the original 8 × 3 × 64 × 64 image tensor enters an encoder network composed of 5 downblocks, and the downblocks are connected with each other through recursive skip layers, as shown in fig. 2; the down block consists of a d-denseblock and a down sampling module (down sample), as shown in figure 5; the d-denseblock consists of 4 d-dense-units, the d-dense-units are connected in a sequential recursive layer, and the d-denseblock splices and fuses the outputs of all the previous d-dense-units in the channel dimension through layer-skipping connection (coordination), as shown in the attached figure 4; consisting of GDN (GENERALIZEDNORMALIZATION TRANSFORMATION) normalization, LeakyRelu activation, and convolutional layer in this order, as shown in FIG. 3. The encoder gradually transfers the spatial information of the original image tensor to the dimensionality between spectrums in the down-sampling process, and the front and back context information is connected in series and integrated by utilizing the characteristics of a densenert network structure. Aiming at the characteristics of 'same-spectrum foreign matter, same-object different spectrum' of the remote sensing image, the design effectively and jointly refines the information of the empty spectrum, removes redundancy and realizes high-efficiency compression of data. The hidden token tensor of 8 × 24 × 2 × 2 obtained by the encoder processing realizes 1/128-magnification compression compared with the original image tensor of 8 × 3 × 64 × 64.

3. Pre-quantization and quantization:

3.1 Pre-quantization (printing-quantization) the 8 × 24 × 2 × 2 scale hidden representation tensor z, output by the encoder network_e(x) Mapping to hidden embedding space e ∈ R^k×d(k-24, d-8 × 2 × 2) d is the embedding space vector e_j∈R^dK is the vector e_j∈R^dThe number of categories. z is a radical of_e(x) Following the posterior class distribution q with the parameter k (z ═ k | x), one-hot (one-hot) is encoded in the following way:

z_e(x) Through network learning, mapping to nearest neighbor embedding space e, at R^k×dImplementing discretized merged representation (discrete) in spaceetization clustering representation) to obtain z_q(x) In that respect As shown in the following equation:

z_q(x)＝e_k，where k＝argmin_j||z_e(x)-e_j||₂

e_krepresenting the vector embedded in space e.

3.2 quantification: output z of pre-quantization_q(x) And then, the quantization calculation is carried out, in order to reduce the storage space and the transmission bandwidth,

the above-described type data needs to be subjected to { -1,1} binarization processing.

4. And (3) decoding:

the code stream obtained by quantization is input into a decoder, and the 8 multiplied by 24 multiplied by 2 tensor enters a decoder network consisting of 5 upblocks, as shown in the attached figure 1; the upblock is composed of a u-denseblock and an up-sampling module (pixel-buffer), as shown in FIG. 6; the u-denseblock consists of 4 u-denseblocks, the u-denseblocks are connected with each other in a sequential recursive layer, and the u-denseblocks splice and fuse the outputs of all the u-denseblocks in the channel dimension through layer-skipping connection (concatenation), as shown in the attached figure 4; each u-dense-unit consists of batchnormal, GDN-activated, and convolutional layers, as shown in FIG. 3. The decoder gradually transfers the inter-spectrum information of the code stream to the spatial dimension in the up-sampling process to obtain a tensor (tensor) with the scale of 8 multiplied by 3 multiplied by 64, and the reconstruction of the image is realized.

5. The decoder (generator) and arbiter process:

the original image and the reconstructed image are input to a discriminator which attempts to verify the image generated by the pseudo generator (decoder), and the GAN _ loss is output as part of the overall loss function. And realizing image rate distortion optimization in the two continuous iterative games.

The network model of the invention is divided into three parts, namely an encoder, a pre-quantization and quantization module, a decoder (generator) and a discriminator, so that the training of the model is also divided into three stages. The implementation stage inputs the images (data) into the converged model in sequence to implement, and can realize the effects that the compression rate is 0.104bpp, the MS-SSIM of the reconstructed image is 0.976, and the PSNR is 29.01.

Claims

1. A generation type remote sensing image compression method based on deep learning is characterized by comprising the following steps:

in the network model training stage, a training image is input into the constructed network model for training until convergence, and the method specifically comprises the following steps:

step 1, after the image tensor is subjected to network compression processing of an encoder, a hidden representation tensor of 1/128 scale of an original image is obtained;

step 2, inputting the hidden representation tensor into a quantizer network, and obtaining a binary code stream through pre-quantization processing and quantization processing;

step 3, inputting the quantized binary code stream into a decoder to obtain a reconstructed image, inputting the reconstructed image into a discriminator for discrimination, achieving a Nash equilibrium state after the decoder and the discriminator play games for a limited time, and realizing distortion optimization of the image, wherein the decoder and the discriminator jointly form a generation countermeasure model GAN;

the encoder network comprises a channel-adaptor and a downlink block, wherein the channel-adaptor is a convolution layer for reserving space dimension, image tensors (B, C, H and W) are unchanged through the channel-adaptor space dimension (H and W), the number of channels is 4 × max {8, C }, wherein B is the batch processing number, C is an image channel, H is the image height, W is the image width, the formula C is a specific channel number, the encoder network comprises m denoclock modules constructed based on densisets, each of the denoclock modules consists of a dense module d-densiseclock and a downsampling module downsample, the d-densiseclock consists of 4 dense units d-densiseunit, the output of the d-densiseclock is the sum of all d-densiseunit spliced on the C dimension, the d-densiseunit is sequentially composed of GDN normalization, original image and activation output of a render 352, and the output of a convolution encoder is 352/(C) through a convolutional encoder, and the encoder is realized by means of GDN, C, and activation¹⁰) Compressing the multiplying power;

the decoder and arbiter: the decoder consists of m uplink blocks constructed based on densenet; each upblock consists of a u-denseblock and an up-sampling module upsamplle; the u-densenblock consists of 4 u-densenites, the output of which is the sum of the splices in the C dimension of all of them; the u-densunit is composed of convolution layers with IGDN reverse normalization, LeakyRelu activation and output C being m in sequence; the basic structure of the discriminator network is 4 stack type connection convolution layers;

and a network model testing stage, namely inputting the image into the trained network model to obtain a compressed image.

2. The deep learning-based generative remote sensing image compression method as claimed in claim 1, wherein: the specific process of the quantizer network is as follows,

the quantizer network comprises a pre-quantization and quantization processing module, the pre-quantization processing module maps the code stream B C H W into the embedded popular space C (B H W) at the bottleneck layer bottleneck, a loss function constructed by KL divergence is used for learning the class distribution of the dimension B H W with the parameter C, and the clustering of the structure is realized on the basis of the attention mechanism; and the quantization processing module is used for carrying out { -1,1} binarization processing on the feature maps after pre-quantization to obtain a code stream.

3. The deep learning-based generative remote sensing image compression method as claimed in claim 1, wherein: the process of the pre-quantization module is as follows,

hidden representation tensor z of encoder network output_e(x) Mapping to hidden embedding space e ∈ R^k×dWhere d is the embedding space vector e_j∈R^dK is the vector e_j∈R^dNumber of classes, z_e(x) Following the posterior class distribution q (z ═ k | x) with parameter k, one-hot coded as follows:

z_e(x) Through network learning, mapping to nearest neighbor embedding space e, at R^k×dThe discretization merging expression is realized in the space, and the following formula is shown:

z_q(x)＝e_k，wherek＝argmin_j||z_e(x)-e_j||₂

e_krepresenting the vector embedded in space e.

4. The deep learning-based generative remote sensing image compression method as claimed in claim 1, wherein: the loss function used for the training of the network model is as follows,

L＝(1-MSSSIM)+MSE+0.01×PSNR+Pro_Q_diff+GAN_loss

MSSSIM represents the similarity of an image multi-scale structure, MSE is mean square error, PSNR is the peak signal-to-noise ratio of an image signal, MSSSIM, MSE and PSNR are used as the loss of an encoder network, Pro _ Q _ diff is the loss of a pre-quantization module, GAN _ loss is the generation countermeasure loss, and a decoder and a discriminator jointly form a generation countermeasure model GAN.