CN114972076A

CN114972076A - Image defogging method based on layered multi-block convolutional neural network

Info

Publication number: CN114972076A
Application number: CN202210484696.4A
Authority: CN
Inventors: 李渝舟; 王帆; 熊琪龙; 黄金雕
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-08-30
Anticipated expiration: 2042-05-06
Also published as: CN114972076B

Abstract

The invention belongs to the field of computer vision image defogging, and discloses an image defogging method based on a layered multi-block convolutional neural network, which comprises the following steps: (1) constructing a training data set; (2) constructing a layered multi-block convolutional neural network model; (3) selecting a filling mode; (4) designing a loss function and selecting an optimizer; (5) training a model; (6) and (5) testing the model. The invention carries out defogging treatment on a fog image by using a layered multi-block convolutional neural network model with a specific structure and corresponding data processing, wherein the whole network model comprises a plurality of layers, each layer comprises a pair of encoders and decoders, adjacent layers are connected through multi-path connection (channel direction connection), and jump connection is arranged between the encoders and the decoders.

Description

Image defogging method based on layered multi-block convolutional neural network

Technical Field

The invention belongs to the field of computer vision image defogging, and particularly relates to an image defogging method based on a hierarchical multi-block convolutional neural network.

Background

Fog is a common natural weather phenomenon. Visibility is reduced due to scattering of airborne water droplets, dust or other particles, which weather is known as fog. The concept of fog is now broader, including weather with low visibility, such as fog, haze, etc. The image shot in the foggy scene has the problems of blurring, color distortion, low contrast and the like, so that subsequent advanced visual tasks such as target detection are influenced.

The existing image defogging methods are mainly divided into two types: (1) an image defogging method based on the fog image formation model; (2) an image defogging method based on deep learning. The former estimates the transmissivity and the atmospheric light through a fog pattern forming model and certain priori knowledge, and then recovers a clear image through the fog pattern forming model, and the problem that the transmissivity and the atmospheric light are inaccurate to estimate due to the fog pattern forming model is solved. The latter recovers sharp images directly or indirectly through a deep learning network, resulting in poor model generalization due to lack of paired, high quality hazy and haze-free images. In addition to this, both types of methods have the following problems: the detail information of the recovered image is lost, halation exists, and the color is unnatural. Therefore, how to effectively defogge and retain more detailed information becomes the focus of image defogging.

Disclosure of Invention

In view of the above drawbacks and needs of the prior art, the present invention provides an image defogging method based on a hierarchical multi-block convolutional neural network, wherein a fog image is defogged by using a hierarchical multi-block convolutional neural network model with a specific structure and corresponding data processing, the entire network model comprises a plurality of layers, each layer comprises a pair of encoders (encoders) and decoders (decoders), adjacent layers are connected by a multi-path connection (channel direction connection), and a jump connection is arranged between the encoders and decoders. The invention adopts an end-to-end mode for training, and the trained layered multi-block convolutional neural network model has good defogging effect.

In order to achieve the above object, according to the present invention, there is provided an image defogging method based on a hierarchical multi-block convolutional neural network, comprising the steps of:

(1) construction of a training dataset: on the basis of a pre-selected clear image database, synthesizing fog processing is carried out on a plurality of original clear images in the database by using a fog image forming model, and clear defogged images and synthesized fog images which correspond to one another before and after processing are obtained and serve as a training data set; wherein, the number of the original clear images is selected in advance;

(2) constructing a hierarchical multi-block convolutional neural network model: the layered multi-block convolutional neural network model comprises n levels, wherein n is a preselected natural number which is greater than or equal to 2;

any one level comprises an encoder and a decoder which are sequentially arranged, the encoder is composed of a plurality of convolution layers, the decoder is composed of a plurality of convolution layers and deconvolution layers, the encoder and the decoder are symmetrical to each other, and jump connection is further arranged between the encoder and the decoder;

for a certain fog image to be processed, i is more than or equal to 1 at the ith level<n, its image input is that the complete fog image to be processed is correspondingly divided into 2 ^n-i A small block B _i (ii) a N-th level of image input B _n The complete fog picture to be processed is obtained; and, for the 2 inputted in the ith hierarchy ^n-i Small blocks, which are combined two by two and correspond to 2 input by the (i + 1) th level ^n-i-1 Each small block;

for the ith level, i is more than or equal to 1 and less than or equal to n, the encoder processes the output feature map E _i ⁰ Firstly, two are combined and then input into a decoder; wherein, the feature map obtained by combining two features is marked as E _i ¹ (ii) a The output characteristic graph after the decoder processes is marked as D _i ，D _i Has 2 ^n-i-1 Each small block;

for the ith level, 2 ≦ i ≦ n, the encoder input for this level is image input B _i And output D of the i-1 th level _i-1 Characteristic diagrams obtained by connecting channel directions; and its decoder input is the encoder output E of this level _i ⁰ Feature map E obtained intermediate to the i-1 th level _i-1 ¹ Characteristic diagrams obtained by connecting channel directions;

the final output of the model is D _n Namely, a clear image corresponding to the defogged image;

(3) selection of filling mode: using edge repeat filling;

(4) loss function design, optimizer selection: the loss functions comprise weighted sums of loss functions of each layer, and the loss functions of each layer use Mean Square Error (MSE); the optimizer selects Rmpp or Adam;

(5) training a model: end-to-end training the layered multi-block convolutional neural network model constructed in the step (2) by using the training data set constructed in the step (1); the synthetic fog map in the training data set is used as a fog map to be processed;

(6) and (3) testing the model: inputting the fog image to be processed to be defogged into the layered multi-block convolution neural network model trained in the step (5), and finally outputting an image D _n Namely the defogged image.

As a further preferred aspect of the present invention, in the step (2), for any one of the layers: the encoder consists of 7 residual blocks and two convolution layers, and the residual blocks are connected in sequence; the decoder consists of 7 residual blocks and two deconvolution layers, and the residual blocks are connected in sequence; and the number of the first and second electrodes,

for the encoder, two convolution layers of the encoder are respectively positioned between the third residual block and the fourth residual block and between the fifth residual block and the sixth residual block;

for the decoder, the two deconvolution layers of the decoder are located between the second residual block and the third residual block, and between the fourth residual block and the fifth residual block, respectively.

As a further preference of the present invention, the convolutional layer in the encoder and the deconvolution layer in the decoder both use a 3 × 3 convolutional kernel, the step size of the convolutional layer and the deconvolution layer is set to 2; the step sizes of convolutional layers contained in the residual block are all set to 1.

As a further preferred aspect of the present invention, in the step (4), the loss function of the single layer uses MSE, and the weight of the loss function of the i-th layer is the weight of the loss function of the i + 1-th layer

As a further preferred aspect of the present invention, in the step (4), the optimizer adopts an Rmsprop optimizer;

in the step (5), the Batchsize is set to 2, the initial learning rate is set to 0.0001, and the learning rate is multiplied by 0.1 every 20 batches, thereby training 100 batches.

As a further preferred aspect of the present invention, in the step (6), the evaluation indexes used in the test are Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM).

In a further preferred embodiment of the present invention, in the step (1), the synthetic fog process is performed by using a fog pattern model, where the atmospheric light is set to 1 and the scattering coefficient is randomly selected in the interval [0,0.1 ].

In a further preferred embodiment of the present invention, in the step (2), n is 2.

Generally, compared with the prior art, the above technical solution conceived by the present invention can achieve the following beneficial effects:

(1) the layered multi-block convolutional neural network model in the invention uses the jump connection to connect the encoder and the decoder, thereby enhancing the characteristic fusion capability of the network and being beneficial to recovering the detail information of the image. In the layered multi-block convolutional neural network model, two adjacent upper and lower layers are communicated through the connection in the direction of two channels (the connection in the direction of the two channels is respectively: (B) _i +D _i-1 ，②E _i ⁰ +E _i-1 ¹ (ii) a + indicates a channel direction connection), the feature fusion capability of the network can be further enhanced.

(2) The filling method preferably uses edge repeated filling (full 0 filling is not used any more), so that edge vignetting caused by imbalance of information of the image edge and other parts is effectively reduced, and the edge details of the restored image are richer.

(3) More preferably, a gradual loss function (i.e., a multi-path fusion loss function) is used, and the monitoring is strengthened layer by layer, which is beneficial to enhancing the defogging effect.

Compared with the common defogging method, the invention connects the encoder and the decoder through jumping, which is beneficial to the fusion of the decoder module to the characteristics, preferentially designs the multipath fusion loss function to monitor layer by layer and gradually, and is beneficial to enhancing the defogging effect. The method of the invention obtains good defogging effect on the synthetic fog image and the natural fog image, simultaneously reserves more detail information and reduces the halation at the edge of the defogged image.

Drawings

Fig. 1 is a schematic diagram of a defogging network structure adopted by the image defogging method based on a hierarchical multi-block convolutional neural network (the network has a 4-layer structure, wherein B1 of the layer 1 input is 8 small blocks into which an original image is segmented, B2 of the layer 2 input is 4 small blocks into which the original image is segmented, B3 of the layer 3 input is 2 small blocks into which the original image is segmented, and B4 of the layer 4 input corresponds to an undivided original image).

Fig. 2 is a schematic diagram of a defogging network according to an embodiment of the invention.

Fig. 3 is a schematic diagram of an encoder structure according to an embodiment of the present invention.

FIG. 4 is a block diagram of a decoder according to an embodiment of the present invention.

Fig. 5 is a graph comparing the effect of the original fog pattern with that of the defogging method according to the embodiment of the present invention and other existing defogging methods, wherein the column (a) in fig. 5 corresponds to the original fog pattern, the column (b) in fig. 5 corresponds to the effect of the existing AOD-Net method, and the column (c) in fig. 5 corresponds to the effect of the method according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides an image defogging method based on a layered multi-block convolutional neural network, the structure of the layered multi-block convolutional neural network is shown in figures 1 and 2, and the whole defogging network comprises a plurality of levels (the total level number can be recorded as n, wherein n is presetA fixed, natural number equal to or greater than 2), each level contains a pair of encoders and decoders. For the ith level, the encoder therein extracts the input image or feature map B of the ith level _i When i is more than or equal to 2, the encoder input of the level is a feature graph obtained by connecting the block image and the output of the previous level according to the channel direction, and the decoder recovers the defogged image D of the ith level according to the extracted features _i . The number of the hierarchical blocks is halved layer by layer, and the hierarchical blocks are connected in the channel direction (the left side of the figure 2 is plus) to realize defogging layer by layer. Compared with a common defogging method, the invention connects the encoder and the decoder through jumping, which is beneficial to the fusion of the decoder module to the characteristics, designs the multipath fusion loss function to monitor layer by layer and is beneficial to enhancing the defogging effect. In general, the invention efficiently realizes image defogging, reserves rich detail information, reduces edge vignetting, and improves clear input images for subsequent advanced visual tasks.

The image defogging method based on the layered multi-block convolutional neural network can comprise the following steps of:

step 1: and (4) data set construction, namely constructing a pair of clear defogging image and a synthetic fog image according to the fog image forming model based on a preselected target detection database.

For example, atmospheric light can be set to 1 based on a fog map forming model by using a Pascal VOC 2007 data set, the scattering coefficient is randomly valued in an interval [0,0.1], the depth is estimated, and a fog data set is synthesized.

Step 2: constructing a layered multi-block convolutional neural network model: the model comprises a plurality of layers, each layer comprises a pair of an encoder and a decoder, the layers are connected through a connection structure, and a plurality of jump connections are added between the encoder and the decoder. The encoder is composed of a series of convolutional layers, and the decoder is composed of convolutional layers and deconvolution layers, which are opposite in structure.

Specifically, the encoder and the decoder may be constructed first, then the single-layer multi-block convolutional neural network may be constructed, and finally the encoder and the decoder may be connected through a jump connection. The construction process is as follows: (1) encoder and decoder: the encoder includes a series of convolutional layers, and the decoder includes a series of convolutional layers and deconvolution layers, which are opposite in structure. (2) Single-layer multi-block convolutional neural network: firstly, an input image is blocked, a characteristic diagram is obtained through an encoder, then a single-layer output image is obtained through up-sampling of a decoder, and the encoder and the decoder are connected through jumping. (3) Layering a plurality of convolutional neural networks: and constructing a plurality of single-layer multi-block convolutional neural networks, wherein the block number of the next-layer image is half of that of the previous-layer image, the lower-layer output image and the upper-layer input image are connected in the channel direction, and the channel direction is connected with the output of the next-layer encoder after the output of the upper-layer encoder is horizontally connected.

In addition, considering the defogging effect and efficiency, the layered multi-block convolutional neural network can comprise two levels, wherein the input of the first level is divided into two blocks, the second level does not need to be divided into blocks, and the two levels are communicated through the connection of two channel directions. The encoder is composed of 7 residual blocks and two convolutional layers, and the decoder is composed of 7 residual blocks and two deconvolution layers, and the structures of the two residual blocks and the two deconvolution layers are opposite. Two convolutional layers of the encoder are located after the third residual block and after the fifth residual block, respectively, and two de-convolutional layers of the decoder are located after the second residual block and after the fourth residual block, respectively. Because of the use of the residual structure, both the encoder and decoder use a 3 × 3 convolution kernel, the step size of the residual block convolutional layer convolution is set to 1, and the step size of the two convolutional and deconvolution layers alone is set to 2.

And step 3: selecting a filling mode: edge repeat padding is used instead of the usual all 0 padding.

Specifically, the convolution layer and the deconvolution layer of the layered multi-block convolutional neural network are filled by adopting edge repetition. The fill size is first determined, followed by edge-refilling.

And 4, step 4: loss function design, optimizer selection: the loss function comprises a weighted sum of the loss functions of each layer using Mean Square Error (MSE), with the optimizer selecting Rmsprop or Adam.

For exampleThe single-layer loss function uses MSE, the loss function of the whole network is designed as the weighted sum of the loss functions of all layers, and the higher the level is, the higher the corresponding weight of the loss function is, the more the supervision is strengthened layer by layer. The weight of the loss function of the i-th layer is the weight of the loss function of the i + 1-th layer

And 5: training a model: and (3) training the layered multi-block convolutional neural network end to end by using the data set constructed in the step (1).

For example, a hierarchical multi-block convolutional neural network may be trained end-to-end using a synthetic fog map using an Rmsprop optimizer. Batchsize was set to 2, the initial learning rate was set to 0.0001, and the learning rate was multiplied by 0.1 every 20 batches for a total of 100 batches.

Step 6: testing and evaluation of results: and (3) testing the defogging effect on the test set of the data set constructed in the step 1 and the natural fog image.

For example, the defogging effect may be first tested on the synthetic fog map, and then further verified on the natural fog map. The evaluation index uses Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM).

The embodiment is as follows:

the specific processing steps in this example are as follows:

step 1: and (5) constructing an integral defogging network. The whole defogging network is based on a layered multi-block convolutional neural network, defogging is realized progressively layer by layer in a plurality of layers, the structures of all the layers are similar, a pair of encoder and decoder structures with opposite structures are included, and the number of input blocks is halved layer by layer.

In particular, adjacent levels are connected, and the output of the lower level (D) is _i ) Input image connected to a higher level (B) _i+1 ) The channel direction is connected as input to the encoder in the higher level. To facilitate information fusion, the encoder and decoder are connected by multiple hops. In addition, the output characteristic maps of the lower layer encoders are connected horizontally (fig. 2 "cat"; the lower layer is referred to as the ith layer, and the output of the encoder is not connected horizontally until the output of the encoder is connected horizontallyThe characteristic diagram is marked as E _i ⁰ And the characteristic diagram after horizontal connection is marked as E _i ¹ ) Output characteristic diagram (E) of later and higher level encoders _i+1 ⁰ ) In the channel direction (the middle "+" of fig. 2; i.e. E _i ¹ +E _i+1 ⁰ , + denotes channel direction connection) further facilitating feature fusion. In general, the features are fused from the same level and different levels through various connections, and the network feature fusion capability is improved.

Each layer network structure includes a pair of oppositely structured encoders and decoders. The encoder network structure is composed of a plurality of convolutional layers (Conv) as shown in fig. 3, and the residual error structure (Res) is used for deepening the network, so that the feature extraction of the input image is realized, and a feature map is obtained. The activation function uses the Mish function (refer to the prior art document: Misra, Diganta. "Mish: A self regulated non-monomeric activation function." arXiv preprint arXiv:1908.08681,2019), which is a smooth non-monotonic neural activation function and allows slight negative values, enabling better information to enter the neural network, and allowing better accuracy and generalization, and is defined as:

Mish＝x ^* tanh(ln(1+e ^x )) (1)

starting from the second level, the number of input channels for the first convolutional layer of the encoder is 6. The decoder structure is opposite to the encoder structure, and comprises a plurality of convolution layers and two deconvolution layers (Deconv) responsible for upsampling, and the activation function also uses Mish function, and the structure diagram is shown in FIG. 4. The decoder structure of all levels is the same as that of the first level. For better fusion of feature information, a skip connection is used to connect the encoder and decoder, the data stream of the skip connection going from the encoder to the decoder.

Step 2: a data set is constructed. Based on a fog map forming model known in the prior art (in the related prior art, reference can be made to k.he, j.sun, and x.tang. "Single image haze removal dark channel prior," in proc.ieee CVPR, miam, FL,20-25jun.2009, pp.1956-1963), a Pascal VOC 2007 dataset is used (of course, other image databases can be used), atmospheric light is set to 1, scattering coefficients are randomly valued in the interval [0,0.1], and a fog dataset can be synthesized in cooperation with common depth estimation. The conversion formula of the fog map forming model which is widely accepted in academia is as follows

I(x)＝J(x)t(x)+A(1-t(x)) (2)

t(x)＝e ^-βd(x) (3)

Where I is the haze image, J is the scene radiation (i.e. the sharp image to be restored), a represents the atmospheric light value, t represents the transmittance, β represents the scattering coefficient, and d represents the scene depth. J (x) t (x) is a direct decay term representing direct degradation of the original scene; a (1-t (x)) represents the superposition of atmospheric light. t decays exponentially with d, i.e. the greater the depth the lower the transmission.

And step 3: and (5) an image recovery process. In each level, an input image or a feature map is divided into a certain number, the feature map is obtained by extracting features through an encoder, then the feature map passes through each convolution layer and each deconvolution layer of a decoder, the corresponding encoder feature map and the decoder feature map are connected when jumping connection is met, and then the feature map continues to pass through other layers of the decoder. The data ends the level after passing through the decoder and then connects to the next level, the output of the level connects to the input image of the next level in the channel direction, and the obtained feature map is connected as the input of the next level. The number of the division of the input feature map of the next layer is half of that of the previous layer, and a new feature map is obtained by extracting features through an encoder, wherein the feature map is connected with the output feature map of the encoder of the previous layer after horizontal connection in the channel direction (namely E) _i-1 ¹ +E _i ⁰ (ii) a + indicates channel direction connection), the connected feature map continues to pass through the decoder, obtains output again, and then enters the next level, and loops until a restored image is obtained ("D2" in fig. 2).

And 4, step 4: and (4) designing a loss function. The loss function of the whole network is the weighted sum of the loss functions of all levels, the weight of the loss function of the high level is 2 times of that of the loss function of the low level, the monitoring is carried out progressively layer by layer, and the MSE is used for the loss function of the single level.

And 5: training settings and defogging performance. And training the layered multi-block convolutional neural network end to end by using an Rmpp optimizer and utilizing the constructed data set. Batchsize was set to 2, the initial learning rate was set to 0.0001, and the learning rate was multiplied by 0.1 every 20 batches to train 100 batches. The programming language used Python and the deep learning framework used Keras. All realizations were performed on Nvidia RTX 2080Ti (12G video memory). Compared with the AOD-Net image defogging algorithm known in the prior art (refer to the prior art: B.Li, X.Peng, Z.Wang, J.xu and D.Feng. "AOD-Net: All-in-One Dehazing Network," in Proc.IEEE ICCV, Venice, Italy, Oct.2017, pp.4780-4788), the image defogging algorithm designed by the invention has higher PSNR and SSIM, wherein the PSNR and SSIM of the two levels are 17.23 and 0.63 respectively, and the PSNR and SSIM of the two levels are 17.58 and 0.67 respectively. In addition, the AOD-Net defogged image is dark on the whole, and the defogged image obtained by the image defogging algorithm is brighter. The defogging effect pair of the two is shown in fig. 5.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image defogging method based on a hierarchical multi-block convolutional neural network is characterized by comprising the following steps:

(1) construction of a training dataset: on the basis of a pre-selected clear image database, carrying out synthetic fog processing on a plurality of original clear images in the database by using a fog image forming model to obtain clear defogged images and synthetic fog images which correspond to each other one by one before and after processing and are used as a training data set; wherein, the number of the original clear images is selected in advance;

for a certain fog image to be processed, i is more than or equal to 1 at the ith level<n, the image input of which is that the complete fog image to be processed is correspondingly divided into 2 ^n-i A small block B _i (ii) a N-th level of its image input B _n The complete fog picture to be processed is obtained; and, for the 2 inputted in the ith hierarchy ^n-i Small blocks, which are combined two by two and correspond to 2 input by the (i + 1) th level ^n-i-1 Each small block;

for the ith level, i is more than or equal to 1 and less than or equal to n, the encoder processes the output feature map E _i ⁰ Firstly, two are combined and then input into a decoder; wherein, the characteristic diagram obtained by combining two by two is marked as E _i ¹ (ii) a The output characteristic graph after the decoder processes is marked as D _i ，D _i Has 2 ^n-i-1 Each small block;

(3) selecting a filling mode: using edge repeat filling;

(6) and (3) testing the model: for the treatment to be subjected to the defogging treatmentInputting the fog image into the layered multi-block convolutional neural network model trained in the step (5), and finally outputting an image D _n Namely the defogged image.

2. The image defogging method based on the hierarchical multi-block convolutional neural network as claimed in claim 1, wherein in the step (2), for any one hierarchy level: the encoder consists of 7 residual blocks and two convolution layers, and the residual blocks are connected in sequence; the decoder consists of 7 residual blocks and two deconvolution layers, and the residual blocks are connected in sequence; and the number of the first and second electrodes,

3. The method of claim 2, wherein the convolutional layer in the encoder and the deconvolution layer in the decoder both use 3x3 convolutional kernels, and the step sizes of the convolutional layer and the deconvolution layer are set to 2; the step sizes of convolutional layers contained in the residual block are all set to 1.

4. The image defogging method based on the hierarchical multi-block convolutional neural network as claimed in claim 1, wherein in the step (4), with respect to the loss function, the MSE is used for the loss function of a single layer, and the weight of the loss function of the i-th level is that of the loss function of the i + 1-th level

5. The image defogging method based on the hierarchical multi-block convolutional neural network as claimed in claim 1, wherein in the step (4), the optimizer adopts an Rmsprop optimizer;

6. The image defogging method according to claim 1, wherein in the step (6), the evaluation indexes used in the test are Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM).

7. The image defogging method based on the hierarchical multi-block convolutional neural network as claimed in claim 1, wherein in the step (1), the fog synthesis processing is performed by using a fog pattern modeling, the atmospheric light is set to be 1, and the scattering coefficient takes a value randomly in an interval [0,0.1 ].

8. The image defogging method according to claim 1, wherein in the step (2), n is 2.