CN112767269B

CN112767269B - Panoramic image defogging method and device

Info

Publication number: CN112767269B
Application number: CN202110061876.7A
Authority: CN
Inventors: 李甲; 赵栋; 李红雨; 赵沁平
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2022-11-01
Anticipated expiration: 2041-01-18
Also published as: CN112767269A

Abstract

The embodiment of the disclosure discloses a method and a device for defogging a panoramic image. One embodiment of the method comprises: giving a panoramic image with light intensity smaller than a preset threshold value, and performing convolution processing on the panoramic image through a strip sensitive convolution method to generate a first feature map sequence set; adding feature graphs with the same sequence number in the first feature graph sequence set to generate a second feature graph sequence, and obtaining a third feature vector corresponding to each feature graph sequence in the first feature graph sequence set; based on the third feature vector set, weighting and summing the first feature map sequence set to generate a third feature map sequence; inputting the third feature map sequence into a depth estimation module to obtain a depth map; and inputting the panoramic image, the depth map and the first feature map sequence set into a defogging module to obtain a defogged panoramic image. The embodiment effectively improves the accuracy of the defogging result, defoggs the panoramic image and generates a result with higher accuracy.

Description

Panoramic image defogging method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for defogging a panoramic image.

Background

The object of the method and the device for defogging the panoramic image is to perform a series of processing on the image, remove the fog in the image and restore the fog-free state of the image for a panoramic input image with fog, and the task can be regarded as a sub-field of image enhancement. The image aimed at by the method is a panoramic image, and different from the traditional plan view, the defogging algorithm of the invention is more targeted to the panoramic image than the traditional defogging algorithm. The defogging work of the panoramic image has significance for a plurality of downstream visual tasks, such as target detection in foggy days, semantic segmentation and the like. Meanwhile, the method is also significant for scenes such as unmanned driving and daily photography.

For image defogging tasks, existing defogging algorithms almost work around floor plans. Many methods based on deep learning, the existing methods achieve good results on the defogging work of the traditional plane images, but the defogging performance on the panoramic images cannot be satisfactory.

However, when the above-described manner is adopted for defogging of the panoramic image, there are often technical problems as follows:

for panoramic image processing, the previous convolution methods have obvious defects, and the existing convolution methods have poor flexibility and overlarge limitation on introducing artificial priori knowledge. Compared with the method, the method has the advantages of large calculated amount and low efficiency. The existing method is applied to panoramic image convolution and can dynamically select the receptive field of a convolution kernel, but the method finally fuses the features at a channel level, the method has overlarge limitation on the flexibility of feature selection, and the precision is reduced because the features can only be fused at the channel level.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a panoramic image defogging method and apparatus to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method of defogging a panoramic image, the method including: giving a panoramic image with light intensity smaller than a preset threshold value, and performing convolution processing on the panoramic image through a strip sensitive convolution method to generate a first feature map sequence set; adding feature graphs with the same sequence number in the first feature graph sequence set to generate a second feature graph sequence, performing global average pooling on the second feature graph sequence to generate a first feature vector, and inputting the first feature vector into at least one full-connection layer to obtain a third feature vector corresponding to each feature graph sequence in the first feature graph sequence set; based on the third feature vector set, weighting and summing the first feature map sequence set to generate a third feature map sequence; inputting the third feature map sequence into a depth estimation module to obtain a depth map; and inputting the panoramic image, the depth map and the first feature map sequence set into a defogging module to obtain a defogged panoramic image.

In a second aspect, some embodiments of the present disclosure provide a panoramic image defogging device including: the convolution processing unit is configured to give a panoramic image with light intensity smaller than a preset threshold value, and perform convolution processing on the panoramic image through a strip sensitive convolution method to generate a first feature map sequence set; the first input unit is configured to add feature maps with the same sequence number in the first feature map sequence set to generate a second feature map sequence, perform global average pooling on the second feature map sequence to generate a first feature vector, and input the first feature vector to at least one full-connection layer to obtain a third feature vector corresponding to each feature map sequence in the first feature map sequence set; a summation processing unit configured to weight and sum the first feature map sequence set based on the third feature vector set to generate a third feature map sequence; the second input unit is configured to input the third feature map sequence to the depth estimation module to obtain a depth map; and the third input unit is configured to input the panoramic image, the depth map and the first feature map sequence set into the defogging module to obtain a defogged panoramic image.

The above embodiments of the present disclosure have the following beneficial effects: compared with the traditional network for image defogging and panoramic image processing, the method disclosed by the invention belongs to a panoramic image defogging method and a panoramic image defogging device, and has three beneficial characteristics: 1) And the characteristics are fused in a band level, so that more semantic information is added, and the accuracy of a demisting result is effectively improved. 2) Compared with the prior task that only the planar image can be demisted, the method can demist the panoramic image, can generate more accurate results and improves the precision of the downstream task. 3) The method is an improvement based on a convolution kernel and a characteristic fusion mode, and can be applied to other panoramic image processing tasks in theory to further improve the performance of the panoramic image processing tasks.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a schematic diagram of one application scenario of a panoramic image defogging method according to some embodiments of the present disclosure;

fig. 2 is a schematic diagram of yet another application scenario of a panoramic image defogging method according to some embodiments of the present disclosure;

fig. 3 is a flow diagram of some embodiments of a panoramic image defogging method according to the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, there is shown a schematic diagram of one application scenario of a panoramic image defogging method according to some embodiments of the panoramic image defogging method of the present disclosure.

Fig. 1 is a flow chart of the overall network, describing the overall process of obtaining a corresponding sharp panoramic image from a fogged panoramic image. Firstly, a group of feature maps are generated after a fog-carrying panoramic image passes through a strip sensitive convolution module, the group of feature maps are input into a depth estimation module and pass through a plurality of coding blocks, a Residual in Residual Block module and a plurality of decoding blocks to generate an estimation result of the depth map, and the estimation result is constrained by two loss functions. And then inputting the depth estimation result and the group of feature maps into a demisting module, adding the depth estimation result and the group of feature maps to an input image after passing through a plurality of coding blocks, a Residual in Residual Block module and a plurality of decoding blocks, performing pyramid pooling, and outputting a final fog-free clear image, wherein the generated result is constrained by four loss functions. The defogging module can be a neural network and is used for fusing various characteristics to perform defogging treatment on the fogged panoramic image to generate a clear image corresponding to the input image. The Residual in Residual Block is a group of neural network layers, comprises Residual module long jump links and the like, and is a manually designed module. The strip sensitive convolution can use convolution cores with different sizes to perform convolution processing on a panoramic image so as to sense the convolution method with different image position distortion degrees, and the specific flow is shown in the claims. The coding block can be a multilayer neural network, and the input features are coded to generate features of a certain dimension for subsequent calculation.

With continued reference to fig. 2, a schematic illustration of yet another application scenario of a panoramic image defogging method is shown in accordance with some embodiments of the panoramic image defogging method of the present disclosure.

Fig. 2 describes a detailed process of the stripe sensitive convolution, which includes firstly applying rectangular convolution kernels of different sizes to an input feature map (or an input image) to perform convolution to generate K feature maps, summing the K groups of feature maps to obtain a second feature map, performing global average pooling on the second feature map to obtain a first feature vector, performing 1 × 1 convolution and a PReLU activation function on the first feature vector, then performing K1 × 1 convolutions on obtained results to generate K third feature vectors, performing weighting operation on the obtained weight vectors and a feature map group obtained at the beginning after normalization and bilinear interpolation of the weight vectors, performing weighting operation on feature stripes of the feature maps by the weight vectors during weighting operation, and finally obtaining a third feature map sequence. The feature strip may be a horizontal strip of the feature map, or a horizontal strip taken from each feature map.

With continued reference to fig. 3, a flow of some embodiments of a panoramic image defogging method according to the present disclosure is illustrated. The defogging method for the panoramic image comprises the following steps:

step S100, a panoramic image with light intensity smaller than a preset threshold value is given, and the panoramic image is subjected to convolution processing through a strip sensitive convolution method to generate a first feature map sequence set.

In some embodiments, the execution subject of the panoramic image defogging method may give the panoramic image with the light intensity less than a certain preset threshold value by means of wired connection or wireless connection. The panoramic image can be shot by shooting equipment, is a spherical image, is developed into a plane image by using an algorithm and then is subjected to defogging work, and the panoramic image can be referred to photos shot by a panoramic camera. The stripe sensitive convolution method may be to first divide an image into a plurality of stripes horizontally during the convolution operation, and then apply rectangular convolutions of different sizes on the stripes. The panoramic image is subjected to convolution processing by convolution of different sizes. Different sized convolutions may use a predetermined number of different sized convolutions. The convolution with different sizes refers to the size of a convolution kernel of the convolutional neural network, and a plurality of feature maps corresponding to each convolution kernel are generated by performing a convolution operation on one input by using a plurality of convolution kernels having the same width but different lengths. The feature map may be a three-dimensional tensor. This is the result of the convolution operation, which is the intermediate result of the computation of the neural network. The feature map may be output by convolving the panoramic image with convolution kernels of different sizes. The feature map is essentially intermediate data during the code run. The first feature map sequence set includes a predetermined number of first feature map sequences. The first feature map may be an output result of the convolution filter. The panoramic image having the light intensity smaller than a certain preset threshold may be a fogged panoramic image. The device used for shooting can be a mobile phone.

As an example, in order to adapt to the image stretching effect of the panoramic image at different latitude positions, four convolution kernels of different sizes, 1 × 1, 1 × 3, 1 × 5 and 1 × 7, are used, and 4 corresponding feature maps are generated for each input image for subsequent calculation, wherein the feature maps have the size of H × W × C.

And S200, adding feature graphs with the same sequence number in the first feature graph sequence set to generate a second feature graph sequence, performing global average pooling on the second feature graph sequence to generate a first feature vector, and inputting the first feature vector into at least one full-connection layer to obtain a third feature vector corresponding to each feature graph sequence in the first feature graph sequence set.

In some embodiments, the execution body may generate the second feature map sequence by adding feature maps with the same sequence number in the first feature map sequence set. And obtaining a first feature vector after global average pooling is carried out on the second feature map sequence. The first feature vector is convolved with a 1 x 1 convolution again to obtain an intermediate vector. Applying a predetermined number of 1 x 1 convolutions to the intermediate vector generates a second vector in the convolution process corresponding to the predetermined number of convolution kernels. The second vector may be a so-called weight. Global average pooling may be used to average all pixel values of the feature map to obtain a value. That is, the corresponding characteristic diagram is expressed by the numerical value. The full connectivity layer is used for classification based on features. Each node in the fully connected layer may be connected to all nodes in the previous layer for integrating the features extracted from the previous layer. The fully-connected layer may serve to map the learned "distributed feature representation" to the sample label space. A fully-connected layer can be converted into a convolution with a convolution kernel of 1 x 1 by a fully-connected layer that is fully-connected to the previous layer. While the fully connected layer whose front layer is convolution layer can be converted into the global convolution of convolution kernel, and H and W are respectively the height and width of the convolution result of the front layer. Wherein the second feature map sequence includes a predetermined number of second feature maps.

In an optional implementation manner of some embodiments, the executing body may add feature maps with the same sequence number in the first feature map sequence set to generate a second feature map sequence, perform global average pooling on the second feature map sequence to generate a first feature vector, input the first feature vector into at least one full-connection layer, and obtain a third feature vector corresponding to each feature map sequence in the first feature map sequence set, and may include the following steps:

in a first step, a convolution is performed on a given panoramic image to generate an initial feature map F⁰. K-1 rectangular-shaped convolution kernels and one square-shaped convolution kernel are applied to the initial feature map, respectively, to generate K first feature map sequences. The dimension of the first feature map is H × W × C. And summing the feature maps with the same sequence number in the first feature map sequence set to obtain a second feature map sequence. For each second feature map F in the sequence of second feature maps_add. Performing global average pooling on dimension W × C to obtain a first feature vector s_fvThe method comprises the following specific operations:

wherein, F_addA second characteristic diagram is shown. k represents a serial number. K represents the number of convolution kernels. And F represents a characteristic diagram. F^kAnd showing a first characteristic diagram in the convolution process corresponding to the kth convolution kernel.

A set of tensors representing dimensions H × W × C. s_fvRepresenting a first feature vector.

A set of tensors representing dimensions H × 1. w represents a serial number. c represents a serial number. W represents a panoramic image width value. C denotes the number of channels of the first signature set. H denotes a panoramic image height value. The number of channels of the first set of signatures is used to characterize the color channels of the panoramic image.

Secondly, through global average pooling, an overall feature representation can be obtained, and then K feature vectors are generated through a full-connection layer and used for weighting different positions of each feature map, wherein the specific operations are as follows:

wherein s is_fdRepresenting the second feature vector. k represents a serial number.

And representing a second feature vector in the convolution process corresponding to the kth convolution kernel.

The dimension of expression is

A set of tensors of (a). s is_fvRepresenting a first feature vector.

A set of tensors representing dimensions H x 1. W is a group of_ex1The operation of the first 1 × 1 convolution is shown. Delta [ 2 ]]Representing the sigmoid function. W_ex2The operation of the second 1 × 1 convolution is shown.

The dimension of expression is

A set of tensors of (a). H denotes a panoramic image height value.

The dimension of expression is

A set of tensors of (a). r is_dRepresenting a first parameter. r is a radical of hydrogen_eRepresenting the second parameter. s_atRepresenting a third feature vector.

And representing a third feature vector in the convolution process corresponding to the kth convolution kernel.

q denotes the corresponding dimension element of the vector. K represents the number of convolution kernels. The operation of the first convolution may be used to convolve the first feature vector. The operation of the second convolution may be used to convolve the result of the sigmoid function. The first parameter and the first parameter may be used to control the length of the vector.

And step S300, based on the third feature vector set, weighting and summing the first feature map sequence set to generate a third feature map sequence.

In some embodiments, the executing subject may perform weighting and summing processing on the first feature map sequence set to generate the third feature map sequence, and may include the following steps:

and performing weighted calculation based on the first feature map sequence set and the third feature vector corresponding to each first feature map in the first feature map sequence set to obtain a weighted feature map sequence set.

And adding the weighted feature maps with the same sequence number in the weighted feature map sequence set to obtain a third feature map sequence.

In some optional implementations of some embodiments, the executing subject may perform weighting and summing processing on the first feature map sequence set to generate a third feature map sequence, and may include the following steps:

the third feature vector generated in step S200

Using a bilinear interpolation method

Dimension extension to H dimension, followed by use of a third feature vector

For feature map F^kAre weighted and summed to form the final third feature map F_sccThe method comprises the following specific operations:

wherein, F_sccA third characteristic diagram is shown. k represents a serial number. K represents the number of convolution kernels. F^kAnd showing a first characteristic diagram in the convolution process corresponding to the kth convolution kernel. g () represents a bilinear interpolation algorithm. s_atRepresenting a third feature vector.

The dimension of expression is

A set of tensors of (a). r is_dRepresenting a first parameter. r is_eRepresenting the second parameter. H denotes a panoramic image height value.

And step S400, inputting the third feature map sequence into a depth estimation module to obtain a depth map.

In some embodiments, the executing entity may input the third feature map sequence to the depth estimation module to obtain the depth map. The third feature map sequence may include a predetermined number of third feature maps. The structure of the depth estimation module may be GAN (generic adaptive Networks, generative countermeasure network). The generator can be in a U-Net network structure. U-Net is an algorithm for semantic segmentation using a full convolutional network. The U-Net network structure may be a network structure in which a picture is divided as a whole.

In the encoder part, each coding block samples the characteristics of the upper layer to be one half of the original size, the number of channels is changed to be 2 times of the original size, and a ResNet bottomresidue block is arranged behind each coding block. The ResNet bottleck is a neural network module, and is composed of a convolutional layer and long-jump links. Similarly, each decoding block of the decoder section upsamples a feature of a previous layer by a factor of 2 and reduces the number of channels of the feature by a factor of two, after which it also has a ResNet bottomblock. The encoder and decoder are connected by a Residual in Residual Block, which contains a plurality of base Residual blocks and a long-jump connection. The Residual in Residual Block is a group of neural network layers, comprises Residual module long jump links and the like, and is a manually designed module. The generator receives the feature map generated by the strip convolution and generates a depth map estimation result. The depth map may be obtained by taking the distance of points in the scene relative to the camera. The distance of points in the scene relative to the camera can be represented by a depth map. I.e. each pixel value in the depth map may represent the distance between a point in the scene and the camera. The technology for acquiring the scene depth map by the machine vision system can be divided into two categories, namely passive distance measurement sensing and active depth sensing.

And S500, inputting the panoramic image, the depth map and the first feature map sequence set into a defogging module to obtain a defogged panoramic image.

In some embodiments, the execution subject may input the panoramic image, the depth map, and the first feature map sequence set to the defogging module, so as to obtain a defogged panoramic image. Wherein, the structure of the defogging module can be GAN. The generator is in a U-Net structure. The network structure is almost the same as that of the defogging module in step S400. The difference is that the depth map estimation result of step S400 is added as an additional feature in each layer of the generator encoder and decoder.

As an example, the execution body described above may be constrained using five penalty functions. Wherein the five loss functions include: GAN generation result loss function L_ganCharacteristic consistent constraint loss function L_fmFunction of perceptual loss L_vggDepth estimation result loss function L_l2Sum depth estimation multiscale smoothing loss function L_edge. The five loss functions are summed. And (5) performing combined training by using an Adam optimization method to obtain a final training model. The Adam optimization method is an extension of a random gradient descent method and is widely applied to deep learning in computer vision and natural language processing. The model can output the demisted image after obtaining an input single image.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A panoramic image defogging method comprises the following steps:

step S100, a panoramic image with light intensity smaller than a preset threshold value is given, and the given panoramic image is convoluted to generate an initial characteristic map F⁰Respectively applying K-1 convolution kernels in a rectangular shape and a convolution kernel in a square shape on the initial feature map to generate K first feature map sequences as a first feature map sequence set;

step S200, adding feature maps with the same sequence number in the first feature map sequence set to generate a second feature map sequence, performing global average pooling on the second feature map sequence to generate a first feature vector, and inputting the first feature vector to at least one full-connection layer to obtain a third feature vector corresponding to each feature map sequence in the first feature map sequence set;

step S300, based on the third feature vector set, weighting and summing the first feature map sequence set to generate a third feature map sequence;

step S400, inputting the third feature map sequence into a depth estimation module to obtain a depth map;

2. The method according to claim 1, wherein the adding the feature maps with the same sequence number in the first feature map sequence set to generate a second feature map sequence, performing global average pooling on the second feature map sequence to generate a first feature vector, and inputting the first feature vector into at least one full-connected layer to obtain a third feature vector corresponding to each feature map sequence in the first feature map sequence set comprises:

summing the feature maps with the same sequence number in the first feature map sequence set to obtain a second feature map sequence, wherein the dimension of the first feature map is H multiplied by W multiplied by C;

for each second feature map F in the sequence of second feature maps_addPerforming global average pooling on dimension W × C to obtain a first feature vector S_fvThe method comprises the following specific operations:

wherein, F_addRepresenting a second signature, K a serial number, K the number of convolution kernels, F a signature, F^kA first characteristic diagram in the convolution process corresponding to the kth convolution kernel is shown,

set of tensors, s, representing dimensions H × W × C_fvA first feature vector is represented that represents a first feature vector,

representing a set of tensors with dimension H multiplied by 1, wherein W represents a serial number, C represents a serial number, W represents a panoramic image width value, C represents the channel number of the first characteristic diagram group, and H represents a panoramic image height value;

obtaining integral feature representation through global average pooling, and then generating K feature vectors through a full-connection layer for weighting different positions of each feature map, wherein the specific operations are as follows:

wherein s is_fdRepresenting a second feature vector, k a sequence number,

representing a second eigenvector in the convolution process corresponding to the kth convolution kernel,

the dimension of expression is

Set of tensors of, s_fvA first feature vector is represented that represents a first feature vector,

set of tensors, V, representing dimensions H x 1_ex1Denotes the operation of the first 1X 1 convolution, δ [, ]]Denotes the sigmoid function, W_ex2The operation of the second 1 x 1 convolution is shown,

the representation dimension is

H denotes a panoramic image height value,

the dimension of expression is

Set of tensors of r_dDenotes a first parameter, r_eDenotes a second parameter, S_atRepresents the third featureThe number of the eigenvectors is the sum of the average,

representing a third eigenvector in the convolution process corresponding to the kth convolution kernel,

q represents the corresponding dimension element of the vector, and K represents the number of convolution kernels.

3. The method of claim 2, wherein the weighting and summing the first set of feature map sequences based on the third set of feature vectors to generate a third feature map sequence comprises:

the third feature vector generated in step S200

Using a bilinear interpolation method

Dimension extension to H dimension, followed by use of a third feature vector

wherein, F_sccDenotes a third characteristic diagram, K denotes a serial number, K denotes the number of convolution kernels, F^kA first characteristic diagram in the convolution process corresponding to the kth convolution kernel is shown,

representing a bilinear interpolation algorithm, S_atRepresents a third characteristic directionThe amount of the compound (A) is,

the dimension of expression is

Set of tensors of r_dDenotes a first parameter, r_eDenotes a second parameter, and H denotes a panoramic image height value.

4. A panoramic image defogging device comprising:

a convolution processing unit configured to give a panoramic image with a light intensity smaller than a preset threshold, and perform convolution on the given panoramic image to generate an initial feature map F⁰Respectively applying K-1 convolution kernels in a rectangular shape and a convolution kernel in a square shape on the initial feature map to generate K first feature map sequences as a first feature map sequence set;

the first input unit is configured to add feature maps with the same sequence number in the first feature map sequence set to generate a second feature map sequence, perform global average pooling on the second feature map sequence to generate a first feature vector, and input the first feature vector to at least one full-connection layer to obtain a third feature vector corresponding to each feature map sequence in the first feature map sequence set;

a summation processing unit configured to weight and sum the first feature map sequence set based on the third feature vector set to generate a third feature map sequence;

the second input unit is configured to input the third feature map sequence to the depth estimation module to obtain a depth map;

and the third input unit is configured to input the panoramic image, the depth map and the first feature map sequence set into the defogging module to obtain a defogged panoramic image.