CN115063304A

CN115063304A - End-to-end multi-size fusion-based pyramid neural network image defogging method and system

Info

Publication number: CN115063304A
Application number: CN202210557615.9A
Authority: CN
Inventors: 王胜春; 陈培棋; 蔡荣辉; 叶成志; 刘炼烨; 黄金贵; 田斌; 葛晶晶; 罗颖光; 计君伟
Original assignee: Hunan Normal University
Current assignee: Hunan Normal University
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-09-16
Anticipated expiration: 2042-05-19
Also published as: CN115063304B

Abstract

The invention discloses an end-to-end multi-size fusion-based pyramid neural network image defogging method and system, wherein five groups of characteristic diagrams of a foggy image under different sizes and sub-regions are extracted by utilizing a main network in an image defogging model; carrying out feature enhancement on the five groups of feature graphs by using a feature pyramid network structure in the image defogging model to obtain five groups of feature graphs after feature enhancement; fusing the five groups of feature maps by a space multi-size feature superposition fusion method to obtain fusion features; further fusing and decoding the fusion characteristics by using a decoder in the image defogging model to obtain an intermediate estimation parameter of the network; and reconstructing the network intermediate estimation parameters and the original foggy image input by the network by using a physical recovery module to obtain a fogless image. The method can fuse local features and global features, combines low-level semantic features and high-level semantic features, avoids the problem of feature loss in the down-sampling process, fully utilizes the feature information of the foggy image in the convolution process, and can achieve better defogging effect.

Description

End-to-end multi-size fusion-based pyramid neural network image defogging method and system

Technical Field

The invention relates to the technical field of image processing, in particular to an end-to-end multi-size fusion-based pyramid neural network image defogging method and system.

Background

With the development of the technology, computer vision tasks such as target detection, target tracking, behavior analysis, face recognition and the like make a great breakthrough. However, advanced visual tasks such as detection, tracking rely on clear video and image data, the performance of which is often greatly affected in real scenes, such as heavy fog, heavy rain, etc. Image defogging has received attention from many researchers in recent years as a preliminary task to some advanced vision tasks.

Earlier methods that did not utilize deep learning tend to have significant drawbacks for single image defogging. The non-deep learning methods are based on prior, some methods rely on a physical scattering model to remove fog by estimating the atmospheric light and the transmission map, but the estimation of the atmospheric light and the transmission map by a fog image is an ill-posed problem, and the estimation result is often inaccurate. There are some methods to remove fog in images by using the statistical characteristics of the images. Although they have some effect, they are not applicable to many real-world scenarios. In addition, other data-driven methods based on deep learning have significant disadvantages, such methods all learn a one-to-one mapping between a given fog image and a corresponding sharp image, which is contrary to the unsuitability of the defogging problem, and such methods only give a certain output for a fog image, and lack diversity.

Therefore, it is highly desirable for workers in the field to provide a defogging method that can continue to effectively defogge a foggy image and obtain a clear image.

Disclosure of Invention

The invention provides an end-to-end-based multi-size fusion pyramid neural network image defogging method and system, which are used for solving the technical problem of effectively defogging an actually shot fog image.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: an end-to-end-based multi-size fusion pyramid neural network image defogging method comprises the following steps:

extracting five groups of characteristic diagrams of the foggy image under different sizes and sub-areas by using a main network in the image defogging model;

carrying out feature enhancement on the five groups of feature graphs by using a feature pyramid network structure in the image defogging model to obtain five groups of feature graphs after feature enhancement;

fusing the feature graphs with the five groups of enhanced features by a spatial multi-size feature superposition fusion method to obtain fusion features;

further fusing and decoding the fusion characteristics by using a decoder in the image defogging model to obtain an intermediate estimation parameter of the network;

reconstructing the intermediate estimation parameters of the network and the original foggy image input by the network by using a physical recovery module to obtain a fogless image;

the image defogging model comprises the backbone network, the feature pyramid network structure, the decoder and the physical recovery module.

Preferably, the constructing the image defogging model comprises:

acquiring a training set, wherein the training set comprises a foggy image and a clear image corresponding to the foggy image;

initializing the image defogging model;

inputting the foggy image into the main network and outputting to obtain five groups of characteristic diagrams of the foggy image under different sizes and subregions;

then, carrying out feature reinforcement on the five groups of feature graphs through the feature pyramid network structure to obtain five groups of feature graphs after feature reinforcement;

inputting the fusion characteristics into a decoder to obtain intermediate estimation parameters of the network;

and training by taking the mean square error of the fog-free image reconstructed by the fog-containing image and the clear image corresponding to the fog-containing image as a loss function to obtain a convergent image defogging model.

Preferably, the loss function satisfies the following equation:

wherein ,L_mse And for network loss, N is the number of pixels of the foggy image participating in the establishment of the image defogging model, Y is a fogless image reconstructed by the foggy image, and X is a clear image corresponding to the foggy image.

Preferably, the backbone network consists of eight convolution modules, wherein the first convolution module consists of a 3 × 3 convolution layer and a batch normalization layer; the second, third, fifth and eight convolution modules are mobile reversible convolution blocks with convolution kernel size of 3 x 3; the fourth, sixth and seventh convolution modules are mobile invertible convolution blocks with convolution kernel size 5 x 5; the 2 nd to 8 th convolution modules all use a residual network structure, and the number of network layers in the 2 nd to 8 th convolution modules is 1, 2, 3, 4 and 1 respectively.

Preferably, the decoder is composed of three decoding modules, each decoding module is composed of a 3 × 3 convolutional layer and an upsampling layer, and the fused features can obtain intermediate estimation parameters of the network through the three decoding modules.

Preferably, the characteristic pyramid network structure in the image defogging model is used for carrying out characteristic enhancement on five groups of characteristic diagrams, and the step of obtaining the five groups of characteristic diagrams after characteristic enhancement comprises the step of carrying out characteristic enhancement on five groups of characteristic diagrams containing different sizes and subregions by using a multilayer characteristic pyramid network structure.

Preferably, the feature fusion of the feature maps obtained by enhancing the five groups of features is a method of using spatial multi-size feature superposition fusion, and based on the size of the feature map with the largest size in the feature maps obtained by enhancing the five groups of features, the sizes of the other feature maps are kept consistent with the largest feature map by using a mixed interpolation mode, and the fusion feature is obtained by using the feature maps obtained by spatially multi-size superposition fusion of the five groups of features.

Preferably, the physical recovery module is used for reconstructing the intermediate estimation parameters of the network and the original foggy image input by the network to obtain a fogless image satisfying an atmospheric scattering model, and the atmospheric scattering model is rewritten as shown in the following formula:

I(x)＝J(x)t(x)+A(1-t(x))

wherein, I (x) is a foggy image, J (x) is a fogless image, A is a global atmospheric light value, and t (x) is a transmittance;

combining the global atmospheric light value A and the transmissivity t (x) in the atmospheric scattering model to obtain a physical recovery model which is depended by the physical recovery module; as shown in the following equation:

J(x)＝k(x)I(x)-k(x)+b

wherein ,

b is a constant 1, k (x) is an intermediate estimation parameter of the network;

and reconstructing the intermediate estimation parameters of the network and the foggy image input by the network by using a physical recovery module to obtain a fogless image.

The embodiment of the invention also provides an end-to-end-based multi-size fusion pyramid neural network image defogging system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of any one of the methods when executing the computer program.

The invention has the following beneficial effects:

the method comprises the steps of extracting five groups of characteristic diagrams of the foggy image under different sizes and subregions by utilizing a main network in an image defogging model; carrying out feature enhancement on the five groups of feature graphs by using a feature pyramid network structure in the image defogging model to obtain five groups of feature graphs after feature enhancement; fusing the five groups of feature maps by a space multi-size feature superposition fusion method to obtain fusion features; further fusing and decoding the fusion characteristics by using a decoder in the image defogging model to obtain an intermediate estimation parameter of the network; and reconstructing the network intermediate estimation parameters and the original foggy image input by the network by using a physical recovery module to obtain a fogless image. The feature pyramid network structure can further integrate local features and global features in the foggy image, and combines low-level semantic features and high-level semantic features, so that the problem of feature loss caused in the feature extraction process is avoided, and feature information obtained in the convolution process of the foggy image is fully utilized. The image defogging model of the invention reduces the requirement on hardware and reduces the time required in the defogging process while achieving better defogging effect. In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. Compared with other defogging methods, the fog-free image obtained by the invention has better performance on the specific details of the image, clearer and more complete image texture details and more gorgeous and more real image color. Compared with other defogging methods based on deep learning, the method has no specific requirement on the resolution of the input image, and can receive the input image with any resolution and obtain the fog-free image with the same resolution. In the actual defogging process, the global concentration and the local concentration of the fog are comprehensively considered, the situation of excessive defogging or insufficient defogging in the obtained fog-free image is avoided, and the fog-free image has better fidelity. The present invention will be described in further detail below with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a preferred embodiment of an end-to-end multi-size fusion-based pyramid neural network image defogging method according to the invention;

fig. 2 is a schematic structural diagram of an image defogging model according to a preferred embodiment of the present invention.

Detailed Description

Embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.

Example 1:

referring to fig. 1 and 2, an end-to-end multi-size fusion-based pyramid neural network image defogging method includes:

s1, extracting five groups of characteristic diagrams of the foggy image under different sizes and sub-regions by using a backbone network in the image defogging model;

s2, performing feature reinforcement on the five groups of feature graphs by using a feature pyramid network structure in the image defogging model to obtain five groups of feature graphs after feature reinforcement;

s3, fusing the feature graphs after the five groups of features are strengthened by a space multi-size feature superposition and fusion method to obtain fusion features;

s4, further fusing and decoding the fusion features by using a decoder in the image defogging model to obtain an intermediate estimation parameter of the network;

s5, reconstructing the intermediate estimation parameters of the network and the original foggy image input by the network by using a physical recovery module to obtain a fogless image;

the image defogging model comprises a backbone network, a characteristic pyramid network structure, a decoder and a physical recovery module.

Optionally, the constructing an image defogging model includes:

initializing an image defogging model;

inputting the foggy image into the main network and outputting to obtain five groups of characteristic diagrams of the foggy image under different sizes and subregions; then, carrying out feature reinforcement on the five groups of feature graphs through the feature pyramid network structure to obtain five groups of feature graphs after feature reinforcement; fusing the feature graphs with the five groups of enhanced features by a spatial multi-size feature superposition fusion method to obtain fusion features; inputting the fusion characteristics into a decoder, and further fusing and decoding to obtain intermediate estimation parameters of the network;

and training by taking the mean square error of the fog-free image reconstructed by the fog image and the clear image corresponding to the fog-free image as a loss function to obtain a converged image defogging model.

In the present alternative embodiment of the method,

optionally, the loss function satisfies the following equation:

wherein ,L_mse For network loss, N is the number of pixels of the foggy image participating in building the image defogging model, Y is the fogless image reconstructed by the foggy image, and X is the clear image corresponding to the foggy image.

In an optional implementation manner, in order to reduce the visual difference between a defogged image and a corresponding clear image obtained by an image defogging model, a mean square error between a defogged image reconstructed from a foggy image and a clear image corresponding to the foggy image is used as a loss function to train to obtain a converged image defogging model, iterative optimization is performed on the loss function through a back propagation matched optimization algorithm, and when the number of iteration rounds of the loss function is set and is not reduced any more, the training is stopped to obtain the converged image defogging model.

Optionally, the backbone network is composed of eight convolution modules, where a first convolution module is composed of a 3 × 3 convolution layer and a batch normalization layer; the second, third, fifth and eight convolution modules are mobile invertible convolution blocks with convolution kernel size of 3 x 3; the fourth, sixth and seventh convolution modules are mobile invertible convolution blocks with convolution kernel size 5 x 5; the 2 nd to 8 th convolution modules all use a residual network structure, and the number of network layers in the 2 nd to 8 th convolution modules is 1, 2, 3, 4 and 1 respectively.

In this optional embodiment, EfficientNet is used as the network structure of the main network, the sizes of the feature maps are 1/2, 1/4, 1/8, 1/16, 1/32, 1/64 and 1/128 of the size of the foggy image, respectively, the number of channels of the feature maps is 16, 24, 40, 80, 112, 192 and 320, respectively, the five groups of feature maps are taken, the number of channels is uniformly changed to 64 by using convolutional layers, and the original size is kept unchanged, so that five groups of feature maps of the foggy image under different sizes and sub-regions are obtained.

Optionally, the decoder includes three decoding modules, each decoding module includes a 3 × 3 convolutional layer and an upsampling layer, and the fused feature may obtain an intermediate estimation parameter of the network through the three decoding modules.

Optionally, the feature pyramid network structure in the image defogging model is used for performing feature enhancement on five groups of feature graphs, and obtaining five groups of feature graphs after feature enhancement includes performing feature enhancement on five groups of feature graphs containing different sizes and sub-regions by using a multilayer feature pyramid network structure.

It should be noted that, in order to better obtain the local features and the global features of the image, a multi-layer feature pyramid network structure is used to perform feature enhancement on five groups of feature maps containing different sizes and sub-regions, and in order to better fuse the local features and the global features, the fusion process uses adaptive weight fusion, and the image defogging model can better obtain effective features.

Optionally, the feature fusion of the feature maps after the five groups of features are strengthened is a method of using spatial multi-size feature superposition fusion, where the size of the feature map with the largest size in the feature maps after the five groups of features are strengthened is used as a reference, and the sizes of other feature maps are kept consistent with the largest feature map by using a mixed interpolation mode, and the fusion feature is obtained by spatially multi-size superposition and fusion of the feature maps after the five groups of features are strengthened.

Optionally, the subsequent feature map obtains a fusion feature.

I(x)＝J(x)t(x)+A(1-t(x))

and combining the global atmospheric light value A and the transmittance t (x) in the atmospheric scattering model to obtain a physical recovery model depended by the physical recovery module, wherein the physical recovery model is shown as the following formula:

J(x)＝k(x)I(x)-k(x)+b

wherein ,

Example 2:

an end-to-end-based multi-size fused pyramid neural network image defogging system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the steps of the method of the embodiment 1 when executing the computer program.

In conclusion, five groups of characteristic diagrams of the foggy image under different sizes and sub-regions are extracted by utilizing a main network in the image defogging model; carrying out feature enhancement on the five groups of feature graphs by using a feature pyramid network structure in the image defogging model to obtain five groups of feature graphs after feature enhancement; fusing the five groups of feature maps by a space multi-size feature superposition fusion method to obtain fusion features; further fusing and decoding the fusion characteristics by using a decoder in the image defogging model to obtain an intermediate estimation parameter of the network; and reconstructing the network intermediate estimation parameters and the original foggy image input by the network by using a physical recovery module to obtain a fogless image. The feature pyramid network structure can further fuse local features and global features, combines low-level semantic features and high-level semantic features, avoids the problem of feature loss in the down-sampling process, fully utilizes feature information of a foggy image in the convolution process, reduces the requirement on hardware while achieving a better defogging effect, and reduces the time required in the defogging process.

In the description of the embodiments of the present invention, "multilayer" means two or more layers unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of embodiments of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An end-to-end-based multi-size fusion pyramid neural network image defogging method is characterized by comprising the following steps of:

extracting five groups of characteristic diagrams of the foggy image under different sizes and sub-regions by using a main network in the image defogging model;

2. The end-to-end-based multi-size fusion pyramid neural network image defogging method according to claim 1, wherein the building of the image defogging model comprises:

initializing the image defogging model;

inputting the fusion characteristics into a decoder, and further fusing and decoding to obtain intermediate estimation parameters of the network;

3. The end-to-end multi-size fusion-based pyramid neural network image defogging method according to claim 2, wherein said loss function satisfies the following formula:

4. The end-to-end based multi-size fused pyramid neural network image defogging method according to claim 1, wherein the backbone network is composed of eight convolution modules, wherein a first convolution module is composed of a 3 x 3 convolution layer and a batch normalization layer; the second, third, fifth and eight convolution modules are mobile invertible convolution blocks with convolution kernel size of 3 x 3; the fourth, sixth and seventh convolution modules are mobile invertible convolution blocks with convolution kernel size 5 x 5; the 2 nd to 8 th convolution modules all use a residual network structure, and the number of network layers in the 2 nd to 8 th convolution modules is 1, 2, 3, 4 and 1 respectively.

5. The pyramid neural network image defogging method based on end-to-end multi-size fusion as claimed in claim 4, wherein said decoder is composed of three decoding modules, each decoding module is composed of a 3 x 3 convolutional layer and an upsampling layer, said fusion feature can obtain the intermediate estimation parameters of the network through the three decoding modules.

6. The end-to-end-based multi-size fusion pyramid neural network image defogging method according to claim 1, wherein the feature enhancement is performed on five groups of feature maps by using a feature pyramid network structure in an image defogging model, and the obtaining of the five groups of feature-enhanced feature maps comprises performing feature enhancement on five groups of feature maps containing different sizes and subregions by using a multilayer feature pyramid network structure.

7. The pyramid neural network image defogging method based on end-to-end multi-size fusion as claimed in claim 1, wherein said feature fusion of the five groups of feature enhanced feature maps to obtain the fusion feature is a method using spatial multi-size feature superposition fusion, taking the size of the feature map with the largest size in the five groups of feature enhanced feature maps as a reference, and using a mixed interpolation mode to make the size of other feature maps consistent with the size of the largest feature map, and obtaining the fusion feature by spatial multi-size superposition fusion of the five groups of feature enhanced feature maps.

8. The end-to-end-based multi-size fusion pyramid neural network image defogging method according to claim 1, wherein the haze-free image obtained by reconstructing the intermediate estimation parameters of the network and the original haze image input by the network by using the physical recovery module satisfies an atmospheric scattering model, and the atmospheric scattering model is rewritten, wherein the atmospheric scattering model is represented by the following formula:

I(x)＝J(x)t(x)+A(1-t(x))

wherein I (x) is the hazy image, J (x) is the haze-free image, A is the global atmospheric light value, and t (x) is the transmittance;

J(x)＝k(x)I(x)-k(x)+b

wherein ,

9. An end-to-end based multi-size fused pyramid neural network image defogging system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method of any one of claims 1 to 8.