CN114863232A

CN114863232A - Shadow detection method based on bidirectional multistage feature pyramid

Info

Publication number: CN114863232A
Application number: CN202210367200.5A
Authority: CN
Inventors: 曹忠; 陈俊全; 尚文利; 赵文静; 王�锋; 邓辉; 梅盈
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-08-05

Abstract

The invention relates to the technical field of shadow detection, and discloses a shadow detection method based on a bidirectional multistage feature pyramid, which comprises the following steps: and a residual error refinement module is constructed, the features of each layer are gradually learned by taking the feature maps of two adjacent layers as input, a weighted feature fusion module is constructed, and the features of different scales are fully utilized by learning weight parameters, a bidirectional multi-level feature pyramid network is constructed, and model training is carried out. The shadow detection method based on the bidirectional multistage feature pyramid comprises the steps of constructing a residual refinement module; constructing a weighted feature fusion module; constructing a bidirectional multistage characteristic pyramid network, adding a residual refinement module into the characteristic pyramid network, and performing multistage characteristic fusion on the input characteristic diagram from two directions; the semantic information of the high-level features and the fine-grained features of the bottom-level features are utilized separately, the accuracy of model prediction is improved through multi-level detection, and the method can be widely applied to the technical field of shadow detection.

Description

Shadow detection method based on bidirectional multistage feature pyramid

Technical Field

The invention relates to the technical field of shadow detection, in particular to a shadow detection method based on a bidirectional multistage feature pyramid.

Background

Shadow detection, which is a fundamental but difficult challenge in computer vision, provides the possibility of acquiring the shape and position of objects in an image and restoring a non-illuminated environment. Meanwhile, the existence of the shadow also brings obstacles for further understanding of the image, so the shadow detection plays a crucial role in the fields of object detection and tracking, semantic segmentation, intelligent driving and the like.

There are two main categories of existing shadow detection techniques: one is a conventional shadow detection method, and the other is a shadow detection method based on deep learning. The traditional shadow detection method mainly utilizes the characteristics of color, gradient, texture and the like to carry out shadow detection, for example, Guo et al propose the method of utilizing color and texture to train in Guo R, Dai Q, Hoiem D, et al, Pair regions for shadow detection and removal [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2013,35(12): 2956-.

In recent years, the Shadow Detection method based on deep learning further improves the accuracy of Shadow Detection, and Hu et al propose a Shadow Detection method based on Spatial Context Features in ' X.Hu, C. -W.Fu, L.Zhu, J.Qin and P. -A.Heng ', ' Direction-Aware Spatial contexts Features for Shadow Detection and Removal, ' in IEEE Transactions on Pattern Analysis and Machine Analysis, vol.42, No.11, pp.2795-2808,1Nov.2020, doi:10.1109/TPAMI.2019.2919616 ', analyze images from Context Features in different directions, and obtain a good result. In "Z.Chen, L.Zhu, L.Wan, S.Wang, W.Feng and P. -A.Heng", "A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection", "2020 IEEE/CVF Conference on Computer Vision and Pattern Registration (CVPR)", 2020, pp.5610-5619, doi:10.1109/CVPR42600.2020.00565. ", Chen et al propose a multitask Mean value Semi-Supervised Shadow Detection method, which designs a multitask CNN, accomplishes the mutual learning of Shadow region, Shadow edge and Shadow count Detection, and uses the model as a student network and a Teacher network. And on all the unlabelled data, the results of the three tasks of the student network and the teacher network are respectively forced to be consistent. This method significantly improves the accuracy of shadow detection.

Compared with the traditional shadow detection method, the shadow detection method for deep learning has the advantages that the accuracy is obviously improved, but the method still has insufficient places, and in the face of insufficient training data, higher-level information of the data needs to be fully mined to obtain better accuracy.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a shadow detection method based on a bidirectional multistage feature pyramid, which has the advantages of fully utilizing semantic information of high-level features and fine-grained features of bottom-level features, improving the accuracy of model prediction through multistage detection and the like, and solves the problems that the accuracy is obviously improved but still has insufficient places in a shadow detection method based on deep learning.

(II) technical scheme

In order to realize the aim of fully utilizing the semantic information of the high-level features and the fine-grained features of the bottom-level features and improving the accuracy of model prediction through multi-level detection, the invention provides the following technical scheme:

a shadow detection method based on a bidirectional multistage feature pyramid comprises the following steps:

s1, constructing residual refinement module

Gradually learning the features of each layer by taking the feature maps of two adjacent layers as input;

s2 construction of weighted feature fusion module

The features of different scales are fully utilized by learning the weight parameters;

s3, constructing a bidirectional multilevel characteristic pyramid network

The information of different layers is processed through two paths in different directions: one is from deep layer to shallow layer, then the obtained fusion layer is used for processing from the shallow layer to the deep layer, the direction of the other path is opposite to the other path, and then the information of the two directions is fused by using the weighted feature fusion;

s4, model training

The model parameters are adjusted according to the loss function.

Preferably, in step S1, the feature maps of two adjacent layers are used as input, and the refined output feature F is generated by using a residual learning-based module _r 。

Preferably, in step S1, an attention module is constructed, and a weight map is generated using the attention module; multiplying the learned attention map by F, and then summing it with F using element-by-element addition _r And adding to obtain the output refined characteristic of the residual refined module.

Preferably, in step S2, the outputs of the two directional feature pyramids are used as inputs; the different layers of the input are adjusted to have the same feature size and the same number of channels.

Preferably, in step S2, the feature map is convolved by 1 × 1 to obtain different weights; the features of the different layers are multiplied by the weighting parameters and added as the final output.

Preferably, in step S3, the feature images extracted by the convolutional neural network with different resolutions are used as input of the bidirectional multistage feature pyramid network.

Preferably, in step S3, the residual refinement module is added to the feature pyramid network, and the input feature maps are subjected to multi-level feature fusion from two directions; and inputting the result of the characteristic pyramid into a weighting fusion module to obtain final output.

Preferably, the performing model training includes calculating a loss through a loss function and adjusting model parameters according to the loss.

Preferably, the calculating the loss through the loss function includes using 10 outputs of the bidirectional multistage feature pyramid network and one fused output as supervision items, performing binary cross entropy calculation on each item, and adding the results to obtain the loss.

(III) advantageous effects

Compared with the prior art, the invention provides a shadow detection method based on a bidirectional multistage characteristic pyramid, which has the following beneficial effects:

1. the shadow detection method based on the bidirectional multistage feature pyramid comprises the steps of constructing a residual refinement module; constructing a weighted feature fusion module; constructing a bidirectional multistage characteristic pyramid network, adding a residual refinement module into the characteristic pyramid network, and performing multistage characteristic fusion on the input characteristic diagram from two directions; the semantic information of the high-level features and the fine-grained features of the bottom-level features are utilized separately, the accuracy of model prediction is improved through multi-level detection, and the method can be widely applied to the technical field of shadow detection.

2. According to the shadow detection method based on the bidirectional multistage feature pyramid, the result of the feature pyramid is input into the weighting fusion module to obtain final output, model training is carried out, model parameter adjustment and filling are carried out according to loss, the defect of model training data is overcome, and the accuracy of the obtained data is further improved by fully mining higher-level information of the data.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a block diagram of a weighted feature fusion module of the present invention;

FIG. 3 is a block diagram of a bidirectional multi-level feature pyramid network of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

s1 construction residual refinement module

s2 construction of weighted feature fusion module

s3, constructing a bidirectional multilevel characteristic pyramid network

The information of different layers is processed through two paths in different directions: one is from deep layer to shallow layer, then the obtained fusion layer is used for processing from shallow layer to deep layer, the direction of the other path is opposite to the other path, and then the information of the two directions is fused by using the weighted feature fusion;

s4, model training

The model parameters are adjusted according to the loss function.

In step S1, a refined output feature F generated by using a residual learning-based module is input with feature maps of two adjacent layers _r Constructing an attention module, and generating a weight map by using the attention module; multiplying the learned attention map by F, and then summing it with F using element-by-element addition _r And adding to obtain the output refined characteristic of the residual refined module.

Learning the features of each layer of the CNN step by taking two adjacent feature maps as input; in practical applications, adding element-by-element on two input feature maps only converges features on different layers, but they have limited ability to suppress non-shadow details in high resolution feature maps, and it is not easy to remove non-shadow regions, so in this step we use a residual method to refine the features.

In step S2, the outputs of the two directional feature pyramids are used as inputs; adjusting the input different layers to have the same feature size and the same channel number, and obtaining different weights by performing 1 multiplied by 1 convolution on the feature map; the features of the different layers are multiplied by the weighting parameters and added as the final output.

As shown in fig. 2, the output of the two directional feature pyramids is used as input, the feature sizes of different layers are adjusted to be the same, and the number of channels is also the same; obtaining different weights by convolving the feature map by 1 multiplied by 1; the features of the different layers are multiplied by the weighting parameters and added as the final output. The features of different scales are fully utilized by learning the weight parameters; in order to fully utilize semantic information of high-level features and fine-grained features of bottom-level features, output multilayer feature graphs are used for fusion to form final output.

In step S3, the characteristic images with different resolutions extracted by the convolutional neural network are used as the input of a bidirectional multistage characteristic pyramid network, a residual error thinning module is added into the characteristic pyramid network, and the input characteristic images are subjected to multistage characteristic fusion from two directions; and inputting the result of the characteristic pyramid into a weighting fusion module to obtain final output.

As shown in fig. 3, the feature images with different resolutions extracted by the convolutional neural network are used as the input of the bidirectional multistage feature pyramid network, the residual refinement module is added into the feature pyramid network, the input feature images are subjected to multistage feature fusion from two directions, and the result of the feature pyramid is input into the weighting fusion module to obtain the final output. The network processes the information of different layers using two paths in different directions: one is from deep layer to shallow layer, and the obtained fusion layer is used to process from shallow layer to deep layer, and the other path is opposite to the former path. Then, fusing information in two directions by using weighted feature fusion; in this step, the bidirectional multi-level feature pyramid network designed by us can not only use feature complementary information in two-way paths, but also can mine deeper information through multi-level detection.

In step S4, performing model training includes calculating a loss through a loss function and adjusting model parameters according to the loss, the calculating the loss through the loss function includes using 10 outputs of the bidirectional multi-level feature pyramid network and one fused output as supervision items, performing binary cross entropy calculation on each item, and adding the results to obtain a loss.

In the step, 10 outputs of the bidirectional multistage feature pyramid network and one fused output are used as supervision items, binary cross entropy calculation is carried out on each item, the results are added to be used as losses, and model parameters are adjusted according to the losses.

Example two:

s1 construction residual refinement module

s2 construction of weighted feature fusion module

s3, constructing a bidirectional multilevel characteristic pyramid network

s4, model training

The model parameters are adjusted according to the loss function.

In step S1, a refined output feature F generated by using a residual learning-based module is input with feature maps of two adjacent layers _r Constructing an attention module, and generating a weight map by using the attention module; multiplying the learned attention map by F, and then summing it with F using element-by-element addition _r Adding to obtain an output refinement feature of a residual refinement module。

In a further improvement of the present invention, the construction process of the residual refinement module in step S1 is represented as:

refined output features F generated by using a residual learning-based module with feature maps of two adjacent layers as input _r ；

F _r ＝Cat(F _.i ，F _j )+F _j

Wherein, F _i ，F _j Representing the input characteristics of two adjacent layers, Cat represents the cross-channel cascade operation.

Constructing an attention module, and generating a weight map by using the attention module;

the attention module comprises 3 residual learning blocks, each residual learning block comprises two 1 x 1 convolution layers and a 3 x 3 hollow convolution layer, the weight is calculated by using a Relu function and a Sigmoid function, and in the process, the output of each learning block is used as the input of the next learning block.

Wherein the Relu function is used to compute for the first two learning blocks:

block＝Relu(learn(x)+x)

and (3) calculating a third learning block by using a Sigmoid function to obtain a final weight:

ω＝Sigmoid(learn3(learn2)+down(learn2))

multiplying the learned attention map by F _r Then it is summed with F using element-by-element addition _r Adding to obtain an output refinement feature F of a residual refinement module _out ：

F _out ＝F _r *ω+F _r

Where ω represents a weight map obtained by the attention mechanism.

In a further improvement of the present invention, the construction process of the weighted feature fusion module in step S2 is represented as:

taking the outputs of the two directional feature pyramids as inputs, obtaining the outputs of 6 different layers after passing through a feature pyramid network, and taking the outputs as the inputs of weighted feature fusion;

the input different layers are adjusted to have the same characteristic size and the same channel number, and because a weighted addition mode is adopted, the input of the different layers is adjusted to have the same size and the same channel number by using the upsample;

the characteristic diagram is convolved by 1 multiplied by 1 to obtain different weights alpha _1～6 ；

The features of the different layers are multiplied by the weighting parameters and added as the final output:

In a further improvement of the present invention, the construction process of the bidirectional multilevel feature pyramid network in step S3 is represented as:

taking the characteristic images with different resolutions extracted by the convolutional neural network as the input of the bidirectional multistage characteristic pyramid network;

extracting 4 characteristic images F with different resolutions by using convolutional neural network _1～4 ；

Adding a residual error thinning module into a characteristic pyramid network, and performing multi-level characteristic fusion on the input characteristic diagram from two directions;

respectively embedding a plurality of residual error thinning modules into each adjacent layer of the network, then carrying out feature processing and fusion from two different directions, and finally obtaining 6 output F in total from the two directions _p1～p6 ；

Inputting the result of the characteristic pyramid into a weighting fusion module to obtain final output, and outputting the obtained 6 outputs F _p1～p6 Inputting the data into a weighted fusion module, and obtaining a final output F through weighted feature fusion _f 。

In a further improvement of the present invention, the model training in step S4 is represented as:

the loss is calculated by the loss function:

taking 10 outputs of the bidirectional multistage feature pyramid network and one fused output as supervision items, carrying out binary cross entropy calculation on each item, and adding results to obtain loss.

The binary cross entropy formula is:

where x represents the output of the model and y represents the prediction tag.

The total loss was:

wherein i represents the ith item, and 11 items in total.

And adjusting the model parameters according to the loss.

The shadow detection method based on the bidirectional multistage feature pyramid comprises the steps of constructing a residual refinement module; constructing a weighted feature fusion module; constructing a bidirectional multistage characteristic pyramid network, adding a residual refinement module into the characteristic pyramid network, and performing multistage characteristic fusion on the input characteristic diagram from two directions; the method has the advantages that semantic information of high-level features and fine-grained features of bottom-level features are utilized respectively, accuracy of model prediction is improved through multi-level detection, the method can be widely applied to the technical field of shadow detection, results of a feature pyramid are input into a weighting fusion module to obtain final output, model training is conducted, model parameter adjustment is conducted according to loss, the defect of model training data is overcome, and accuracy of obtained data is further improved through full mining of higher-level information of the data.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A shadow detection method based on a bidirectional multistage feature pyramid is characterized by comprising the following steps:

s1 construction residual refinement module

s2 construction of weighted feature fusion module

s3, constructing a bidirectional multilevel characteristic pyramid network

s4, model training

The model parameters are adjusted according to the loss function.

2. The method as claimed in claim 1, wherein in step S1, the feature maps of two adjacent layers are used as the feature maps of two adjacent layersInput, refined output features F generated by using residual learning based modules _r 。

3. The method for detecting shadows based on the bi-directional multilevel feature pyramid of claim 1, wherein in step S1, an attention module is constructed and used to generate a weight map; multiplying the learned attention map by F, and then adding it to F using element-by-element addition _r And adding to obtain the output refined characteristic of the residual refined module.

4. The method for detecting shadows based on the bi-directional multilevel feature pyramid of claim 1, wherein in step S2, the outputs of two directional feature pyramids are used as inputs; the different layers of the input are adjusted to have the same feature size and the same number of channels.

5. The method for detecting shadows based on the bi-directional multilevel feature pyramid of claim 1, wherein in step S2, the feature map is convolved by 1 × 1 to obtain different weights; the features of the different layers are multiplied by the weighting parameters and added as the final output.

6. The method for detecting shadows based on the bi-directional multilevel feature pyramid of claim 1, wherein in step S3, feature images of different resolutions extracted by the convolutional neural network are used as input of the bi-directional multilevel feature pyramid network.

7. The method for detecting shadows based on the bi-directional multilevel feature pyramid of claim 1, wherein in step S3, a residual refinement module is added to the feature pyramid network to perform multilevel feature fusion on the input feature map from two directions; and inputting the result of the characteristic pyramid into a weighting fusion module to obtain final output.

8. The method of claim 1, wherein the performing model training comprises calculating loss through a loss function and adjusting model parameters according to the loss.

9. The method according to claim 8, wherein the computing the loss through the loss function comprises taking 10 outputs of the bidirectional multilevel feature pyramid network and one fused output as a supervision term, performing binary cross entropy computation on each term, and adding the results to obtain the loss.