CN114998132A - Weak supervision shadow detection method for mining complementary features through double networks - Google Patents

Weak supervision shadow detection method for mining complementary features through double networks Download PDF

Info

Publication number
CN114998132A
CN114998132A CN202210605710.1A CN202210605710A CN114998132A CN 114998132 A CN114998132 A CN 114998132A CN 202210605710 A CN202210605710 A CN 202210605710A CN 114998132 A CN114998132 A CN 114998132A
Authority
CN
China
Prior art keywords
shadow
network
region
image
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210605710.1A
Other languages
Chinese (zh)
Inventor
李煜祥
王军
柴晨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 20 Research Institute
Original Assignee
CETC 20 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 20 Research Institute filed Critical CETC 20 Research Institute
Priority to CN202210605710.1A priority Critical patent/CN114998132A/en
Publication of CN114998132A publication Critical patent/CN114998132A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a weak supervision shadow detection method for mining complementary features through double networks. Firstly, an initial seed region of a shadow is obtained by adopting a class activation graph method, a region segmentation network carries out more accurate estimation on the shadow region based on the shadow characteristic of the initial seed region learning region, the region segmentation network is trained by utilizing the shadow detection result of a pixel segmentation network, and finally, the region segmentation network and the pixel segmentation network complete fusion based on region characteristics and context characteristics in the process of mutual learning, so that accurate segmentation on the shadow region is realized. The invention well completes the shadow detection task under the condition of using weak labels, greatly reduces the dependence on high-grade labels of the image data set, effectively utilizes the characteristics extracted by the classification network in the superpixel blocks and the context characteristics directly obtained by the segmentation network from the full image, and can effectively achieve the purpose of shadow detection under the condition of weak supervision.

Description

Weak supervision shadow detection method for mining complementary features through double networks
Technical Field
The invention relates to the field of image processing, in particular to a shadow detection method which is suitable for shadow detection tasks in multiple scenes such as daily scenes, battle scenes and the like, namely, pixels in all shadow areas are segmented out as a foreground.
Background
Today, computer vision technology is widely used in daily life scenes and battle scenes of people. The mature computer vision solution brings great convenience to people regardless of face recognition, automatic driving in life, target tracking and target recognition in military tasks.
Shadows, however, are a common physical phenomenon that also occurs in various optical images. The appearance of shadows typically causes discontinuities in the color of the object or background (i.e., the pixels in the shadow area are darkened). Moreover, the shadow usually has a variable shape outline, sometimes has a higher similarity to the target, and brings difficulty to tasks such as target tracking (for example, the camouflage painting of common military vehicles imitates the scene in which the shadow appears to a certain extent). Therefore, the appearance of shadows can create difficulties for many computer vision tasks, reducing the accuracy and robustness of the task. The shadow detection method can effectively mark the boundary of the shadow from the image, and the boundary information of the shadow can provide shadow related information for other computer vision tasks, thereby improving the accuracy of the algorithm of the shadow detection method.
The existing shadow removing method for a single image can be mainly divided into two types: the first type is a traditional method based on machine learning and artificial feature selection, and the second type is a shadow removing method based on deep learning of fire heat at present.
Traditional methods based on machine learning typically solve the shadow removal problem by building shadow models. However, the conventional method is generally applicable to simple scenes, and for variable light sources and backgrounds in natural scenes, the conventional method is generally limited by the quality of artificial feature selection and the complexity of models, and cannot achieve good detection accuracy.
The development of big data, the great improvement of the computer operational capability and the neural network algorithm promote the development of deep learning, and the end-to-end convolution neural network becomes the first choice of the current image processing algorithm. The existing shadow detection method based on deep learning is mostly based on a generative countermeasure network. Compared with the traditional method, the deep learning method greatly improves the accuracy of shadow detection. However, the shadow detection method based on deep learning requires a large number of complete label pairs, namely shadow images and corresponding shadow detection labels. But for the shadow detection task, its tag acquisition is extremely difficult. Because the shadow boundaries are usually gradual in nature, it is also extremely difficult to correctly label manually. Therefore, the existing shadow detection data set has less average data content and insufficient labeling accuracy.
Because of the demand of the deep learning method on a large-scale data set and the difficulty of label labeling in the data set, a large number of scholars propose to use a weak supervision or unsupervised method to reduce the labeling cost of the data set. Meanwhile, the data volume in the data set with lower label level is usually larger, and the performance of the deep neural network can be better exerted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a weak supervision shadow detection method for mining complementary features through a dual network. The present invention proposes to use a region segmentation network and a pixel segmentation network to mine region-based and context-based shadow features, respectively. The method of the class activation graph is firstly adopted to obtain the initial seed area of the shadow. The region segmentation network in turn learns more accurate estimates of shadow regions based on the shadow characteristics of the regions based on initial seed regions. The shadow detection result of the region segmentation network can be used for guiding the updating and learning of the pixel segmentation network, and the pixel segmentation network obtained by training can be expanded to find more shadow regions and enrich boundary information. And the shadow detection result of the pixel segmentation network is used for training the area segmentation network. And finally, the region segmentation network and the pixel segmentation network complete fusion based on region features and context features in the process of mutual learning, so that accurate segmentation of the shadow region is realized. The training method provided by the method can be compatible with various networks with a network framework, and the final result can be improved by adjusting the network settings in the training method.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1, training a VGG network as a shadow classification network, and obtaining a class activation image of a shadow image as a shadow seed region by using a trained model;
step 2, obtaining a superpixel block by using a Graph-Cuts method;
step 3, taking the super pixel block as the input of a region segmentation network, taking the shadow seed region as a label, extracting shadow features from the shadow block, effectively expanding the shadow region and removing part of error regions;
step 4, training a pixel segmentation network by utilizing a shadow detection result generated by the region segmentation network, extracting more context characteristics by the pixel segmentation network, and providing a more accurate shadow boundary for the shadow region detection network by using the generated shadow detection result;
and 5, training the region segmentation network by utilizing the shadow detection result generated by the pixel segmentation network, excavating shadow features from the seed region by utilizing the region segmentation network from bottom to top, expanding the shadow region by utilizing the excavated features, and correcting the misclassified region to obtain a shadow detection result.
The specific steps of the step 1 are as follows:
1.1) training a shadow classification network;
adopting VGG16 as a shadow classification network; the shadow classification network adopts a VGG16 model pre-trained by an ImageNet data set to carry out parameter initialization, then removes a Softmax layer in the VGG16 network, changes the number of output channels of a full connection layer into 2, and simultaneously uses Sigmod as an activation function of the full connection layer;
the shadow classification network is trained by using an ISTD data set, an input image is a shadow image or a shadow-free image, an output result is 1 or 0, namely the shadow image or the shadow-free image, and a binary cross entropy function is selected as a loss function;
1.2) obtaining a shadow seed area;
adopting a Grad-CAM method to make a class activation graph of the shadow, and obtaining a shadow seed area as a training starting point of a subsequent task;
the softmax layer input for which the shadow classification network has been obtained is denoted y c The characteristic diagram of the last convolution layer output of the shadow classification network is represented as A k Then the gradient of class c is calculated
Figure BDA0003670493690000031
After the gradient information in the backward propagation is respectively and globally pooled in the width dimension and the height dimension of the feature map, the weight of the neuron is obtained
Figure BDA0003670493690000032
The definition is as follows:
Figure BDA0003670493690000033
wherein Z represents the number of pixels in the feature map;
Figure BDA0003670493690000034
representing the value of the k characteristic diagram with the abscissa of i and the ordinate of j; the softmax layer input for which the shadow classification network has been obtained is denoted y c
Figure BDA0003670493690000035
Is a derivative symbol; after the weight is obtained, the weighted sum of the weight and the corresponding characteristic graph is the calculated class activation graph,
Figure BDA0003670493690000036
class activation graph representing class c, defined as followsThe following steps:
Figure BDA0003670493690000037
wherein ReLU () is a linear rectification function, A k Representing the k characteristic diagram;
therefore, the trained shadow classification network can be used for obtaining a shadow activation graph; then, threshold operation is carried out on the class activation graph, namely, only the region with the confidence coefficient larger than 0.65 is selected as the shadow seed region.
The specific steps for obtaining the superpixel blocks are as follows:
the method comprises the steps of processing an image by adopting a Graph Cuts method to obtain a superpixel block of the image, setting the minimum size of a segmentation block to be 200 pixels, 150 pixels and 100 pixels respectively at different training stages when adopting the Graph Cuts method, setting a Gaussian filter parameter to be 1, selecting 4 neighbors to calculate the weight between pixel points, carrying out image segmentation on a shadow image to obtain a superpixel segmentation result, dyeing different colors on the segmented shadow image, and representing a superpixel block in each color region.
The construction steps of the area division network are as follows:
adopting a Fast R-CNN network model as a region segmentation network frame, inputting the maximum external rectangle of a superpixel block in an input shadow image of a network as an interest region, removing a bounding box regression calculation branch, only reserving a confidence calculation branch, changing the number of output elements of a final full connection layer of the confidence calculation branch into 2, and respectively representing the probability that the superpixel block is a shadow region and the probability that the superpixel block is not a shadow region after Softmax calculation;
the loss function of the area division network adopts a cross entropy loss function, and the loss L of the area division network is defined as:
Figure BDA0003670493690000041
where m represents the input shadow image superpixelAn m-th superpixel block obtained after the segmentation; g m Indicating whether the m-th block of pixels is a shadow region, i.e. g when the super-pixel block i is a shadow region m Value 1, g when superpixel block i is a non-shaded area m A value of 0; p is a radical of formula m Representing the probability of a superpixel block m being predicted as a shadow; n is the number of pixel blocks used for training; l is a radical of an alcohol m Indicating a loss of the first pixel block;
in the training, in order to balance the number A of the shaded superpixels and the number B of the unshaded superpixels, the minimum value C of the two numbers is selected to be min (A, B) during each training, the number of the superpixels which is the same as that of C is randomly selected from the other one for training, and the training can be stopped when the loss of a trainer tends to be smooth;
the pixel division network is built by the following steps:
the construction of the pixel segmentation network adopts an In-Net network structure, In the process of downsampling, a ReLu activation layer and a maximum pooling layer are connected after each convolution, the downsampled feature map can be spliced with the feature map which corresponds to the downsampled feature map and has the same scale after upsampling, the effect of fusing high-level features and low-level features is achieved, and finally a shadow detection result is directly obtained;
the loss function of the pixel segmentation network is a characteristic that the full convolution network pays more attention to the shadow region, the weight of the shadow region in the cross entropy loss function is strengthened by using a coefficient w (i, j), and the weighted pixel segmentation network loss function is as follows:
Figure BDA0003670493690000042
wherein the size of the characteristic diagram is N M; q. q.s l (i, j) represents the value of the position of the pixel point (i, j) of the ith channel characteristic diagram; w (i, j) is the weight value at the position of i, j), and the calculation process is as follows.
Figure BDA0003670493690000051
Wherein S is in the true value imageTotal number of pixel points, S shadow And (4) the sum of the number of pixel points in the shadow area in the true value image.
The step of training the pixel segmentation network is as follows:
a) for an input shadow Image, training the R-net by using a shadow seed region Image corresponding to the Image at the position of t-0 as a label, and after the training of the R-net is completed, outputting a shadow detection map shown at the position of a lower half region t-1 for the input Image;
b) for an input Image, P-net is trained using, as a label, a shadow detection Image generated on R-net for which t 1 is the Image, and after the training of P-net is completed, the P-net outputs, for the input Image, a shadow detection map shown at t 1 in the upper half area;
c) in this case, the R-net is trained using the shadow detection result at the time t-1 generated by the P-net, that is, the training process in step a) and step b) is repeated, the training process at the time t-2 to 4 is completed, and finally, the result at the time t-4 generated by the P-net is used as the final training result.
The invention has the beneficial effect of providing a network structure for iteratively mining local characteristics and context characteristics by using a double network, and the network can use weak labels to complete the task of shadow detection. The network framework proposed by the method does not depend on a specific network structure, wherein: the graph segmentation method may be replaced with other types of superpixel segmentation algorithms; the class activation map method may be replaced with other initial positioning methods, such as the saliency map method; the area division network can be replaced by a classification network such as VGG, ResNet and the like; the pixel segmentation network can be replaced by an FCN, U-Net or other target segmentation network. Under the condition of using weak labels, the shadow detection task is well completed, and the dependence on high-level labels of the image data set is greatly reduced.
As can be seen from the result graph and the result comparison graph of the method, the method effectively utilizes the features extracted by the classification network in the superpixel blocks and the context features directly obtained from the whole graph by the segmentation network. The effective fusion of the two characteristics can effectively achieve the purpose of shadow detection under the condition of weak supervision.
The invention provides a training method and a network framework for solving the shadow detection problem by using a weak label for the first time, and the training method and the network framework are weak supervision methods for effectively solving the shadow detection problem.
Drawings
Fig. 1 is a schematic diagram of a network framework and training according to the present invention.
FIG. 2 is a diagram of a shadow classification network.
Fig. 3 is a schematic diagram of a shadow seed region, fig. 3(a) is a shadow image, fig. 3(b) is a shadow activation map, and fig. 3(c) is a result after thresholding.
Fig. 4 is a diagram showing a result of super-pixel segmentation, fig. 4(a) is a shaded image, and fig. 4(b) is a segmentation result.
Fig. 5 is a diagram of a structure of a zone division network.
Fig. 6 is a diagram of a pixel division network structure.
Fig. 7 is a graph comparing the results of shading detection.
Fig. 8 is a graph showing a result of the shadow detection.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The shaded areas have a varying outline. And the shadow area can be divided into a half shadow area and a full shadow area due to the influence of diffuse reflection of ambient light. The light distribution in the penumbra area is not uniform, and the shadow area is sometimes difficult to distribute. This also results in the shadow area edges being difficult to find. Therefore, the shadow region boundary labeling is extremely difficult. This also results in a smaller number of images and errors in shadow labeling for existing shadow detection data sets. The weak tags can reduce the requirements on data tags, and therefore are widely concerned. It is proposed herein to use a weak supervised approach to accomplish the shadow detection task. The training of the shadow detection method can be completed only by marking whether the image contains the shadow or not, namely, the shadow detection based on weak supervision, and the marking of the boundary of the shadow area is not needed.
The invention is further illustrated with reference to the figures and examples.
Training a VGG network as a shadow classification network, and obtaining a class activation map of a shadow image as a shadow seed region by using a trained model;
secondly, obtaining a superpixel block by using a Graph-Cuts method;
and thirdly, using the superpixel blocks as the input of a region segmentation network, and using the shadow seed regions as labels. Shadow features are extracted from the shadow blocks, so that shadow areas can be effectively expanded, and partial error areas can be removed.
And fourthly, training the pixel segmentation network by using the shadow detection result generated by the region segmentation network. The pixel segmentation network can extract more context features, and the generated shadow detection result can provide more accurate shadow boundaries for the shadow area detection network;
and fifthly, training the area segmentation network by using a shadow detection result generated by the pixel segmentation network. Excavating shadow features from the seed area by using a bottom-up area division network, expanding the shadow area by using the excavated features, and correcting the misclassified area;
and the third step and the fourth step are carried out in an iteration mode, the shadow area in the shadow detection result is gradually enlarged, the obtained shadow area boundary is gradually thinned, and the incorrect segmentation area is gradually corrected. The iterative training process is only used in the training stage, and for the inference stage, the shadow detection result is generated by using the pixel segmentation network optimized for the last time.
The entire network framework is shown in fig. 1. The initial shadow seed region is a class activation map of the shadow image obtained in the first step, and is also a label used for training the region segmentation network R-net at the time of t ═ 0. And then training by respectively using the result of the area segmentation network and the result of the pixel segmentation network as a label of another network.
The examples are as follows:
step 1: data preparation
1) Training a shadow classification network;
adopting VGG16 as a shadow classification network; the shadow classification network adopts a VGG16 model pre-trained through ImageNet data sets to carry out parameter initialization, then removes a Softmax layer in the VGG16 network, changes the number of output channels of a full connection layer into 2, and uses Sigmod as an activation function of the full connection layer. FIG. 2 is a shadow classification network model used in the present invention, and the parameters of each layer of feature map and the parameter settings of the convolutional layer are labeled in FIG. 2.
The shadow classification network is trained using the ISTD data set, the input image is a shadow (or unshaded) image, and the output result is 1 or 0 (a shadow image or an unshaded image). Training selects a binary cross entropy function as a loss function;
2) obtaining shadow seed regions
In view of the fact that if the amount of information contained in the tag is too small, for example for this task: using image-level tags (i.e., informing only whether a given image contains a shadow) as task tags, the required completion task is the segmentation of the shadow region. At this time, the label cannot directly provide any information about the position and shape of the shadow to the network, so that it is difficult to complete the training of the network. Therefore, the weak supervision method usually finds a pseudo tag by some means, so that the network can obtain more direct target information from the pseudo tag.
The Class Activation Mapping (CAM) method only uses image-level labels to obtain rough position information of a target, and therefore, the method is widely applied to weak supervision image segmentation and detection tasks by various scholars. The method uses a Grad-CAM method to make a class activation graph of the shadow, and obtains a shadow seed area as a training starting point of a subsequent task.
If the softmax layer input of the shadow classification network obtained by the invention is represented as y c The characteristic diagram of the last convolution layer output of the shadow classification network is represented as A k Then the gradient of class c is calculated
Figure BDA0003670493690000071
After the gradient information in the backward propagation is respectively and globally pooled in the width dimension and the height dimension of the feature map, the weight of the neuron is obtained
Figure BDA0003670493690000081
The definitions are as follows:
Figure BDA0003670493690000082
wherein Z represents the number of pixels in the feature map;
Figure BDA0003670493690000083
representing the value of the k characteristic diagram with the abscissa of i and the ordinate of j; the softmax layer input for which the shadow classification network has been obtained is denoted y c
Figure BDA0003670493690000084
Is a derivative symbol; after obtaining the weight, the weighted sum of the weight and the corresponding characteristic graph is the calculated class activation graph,
Figure BDA0003670493690000085
the class activation graph representing class c is defined as follows:
Figure BDA0003670493690000086
wherein ReLU () is a linear rectification function; a. the k Representing the k characteristic diagram;
therefore, the trained shadow classification network can be used for obtaining a shadow activation graph; then, threshold operation is carried out on the class activation map, namely, only the region with the confidence coefficient larger than 0.65 is selected as the shadow seed region. Fig. 3 shows the obtained shadow activation map and the shadow seed region after thresholding, where fig. 3(a) is a shadow image, fig. 3(b) is a shadow activation map, and fig. 3(c) is a result after thresholding. The figure shows an original shadow image, a shadow activation image obtained from the shadow image and a shadow seed region obtained after thresholding the activation image.
And 2, step: obtaining superpixel blocks
The image segmentation method is a traditional image processing method for performing foreground and background segmentation according to information such as color, texture and the like in an image. The graph segmentation method is also commonly used for the acquisition of superpixel blocks in early weakly supervised image segmentation tasks.
And processing the image by adopting a Graph Cuts method to obtain a superpixel block of the image. When the Graph Cuts method is adopted, the minimum size of the segmentation blocks is set to be 200 pixels, 150 pixels and 100 pixels respectively at different stages of training, the parameter of the Gaussian filter is set to be 1, and 4 neighbors are selected to calculate the weight between the pixels. Fig. 4 shows an image obtained by subjecting the shadow image to the graph division processing, where fig. 4(a) shows the shadow image and fig. 4(b) shows the division result. The figure shows the super-pixel segmentation result obtained after the input shadow image is processed by the Graph-Cuts method, the segmented shadow image is dyed with different colors, and each color area represents a super-pixel block;
and step 3: building a network model;
1) area division network
The region partition network is essentially a two-class network that classifies each superpixel block as either a shadow-containing pixel block or a shadow-free pixel block. The Fast R-CNN network model is used as the topic of the area segmentation network framework to reduce the time consumption in the classification operation process.
Compared with the traditional Fast R-CNN network which uses a bounding box search method to obtain an Interest Region (RoI, Region of Interest) for an input shadow image, the method for obtaining the super pixel block only uses the maximum circumscribed rectangle of the super pixel block in the input shadow image of the network as the input of the Interest Region, removes a bounding box regression calculation branch, only reserves a confidence coefficient calculation branch, changes the number of output elements of a full connection layer at the last of the confidence coefficient calculation branch into 2, and outputs two values respectively representing the probability that the super pixel block is a shadow Region and the probability that the super pixel block is not the shadow Region after Softmax calculation, wherein the modified Region segmentation network is shown in figure 5, the network structure of the Region segmentation network used in the text is shown in figure 5, and the main network layer type is noted.
The loss function of the area division network adopts a cross entropy loss function, and the loss L of the area division network is defined as:
Figure BDA0003670493690000091
wherein m represents an m-th superpixel block obtained after superpixel segmentation of the input shadow image; g m Indicating whether the m-th block of pixels is a shadow region, i.e. g when the super-pixel block i is a shadow region m Value 1, g when superpixel block i is a non-shaded area m A value of 0; p is a radical of m Representing the probability that a superpixel block m is predicted as a shadow; n is the number of pixel blocks used for training; l is m Indicating a loss of the first pixel block;
in the training, in order to balance the number A of the shaded superpixels and the number B of the unshaded superpixels, the minimum value C of the two numbers is selected to be min (A, B) during each training, the superpixels with the same number as C are randomly selected from the other one for training, and the training can be stopped when the loss of the trainer tends to be smooth, so that overfitting is prevented;
2) a pixel division network;
the construction of the pixel segmentation network adopts an In-Net network structure, the network structure and specific parameter settings are shown In fig. 6, and the network structure of the pixel segmentation network used herein is shown In fig. 6, wherein the size of each layer of feature map and the operation type of each layer are noted. In the process of downsampling, a ReLu activation layer and a maximum pooling layer are connected after each convolution, the downsampled feature graph is spliced with the feature graph which corresponds to the downsampled feature graph and has the same scale after upsampling, the effect of fusing high-level features and low-level features is achieved, and finally a shadow detection result is directly obtained.
The loss function of the pixel segmentation network is a characteristic that the full convolution network pays more attention to the shadow region, the weight of the shadow region in the cross entropy loss function is strengthened by using a coefficient w (i, j), and the weighted pixel segmentation network loss function is as follows:
Figure BDA0003670493690000101
wherein the size of the characteristic graph is N x M; q. q of l (i, j) represents the value of the position of the pixel point (i, j) of the ith channel characteristic diagram; w (i) of the first group,j) is the weight value at the i, j) position, the calculation process is as follows.
Figure BDA0003670493690000102
Wherein S is the total number of pixel points in the truth image, S shadow The sum of the number of pixel points in the shadow area in the true value image;
and 4, step 4: network training and evaluation
In step 2, two classical network models and their training parameters have been introduced. The invention provides a weak supervision network framework using dual-network iterative training, and an application method using the network framework will be described in detail below.
1) Network training
Fig. 1 is an overall network block diagram of the present network. Wherein R-net represents a region-partitioned network; p-net represents a pixel segmentation network; initial Shadow Seeds represent the Initial Shadow seed region; final Output indicates the Final shadow detection result; the time points t 1 to t 4 respectively represent the output results of the R-net and the P-net in the iterative training process. The Shadow seed region obtained in step 1 is the Initial Shadow Seeds marked by the red dashed frame at the time when t is 0. During training:
a) for the input shadow Image, R-net is trained using the shadow seed region Image corresponding to Image at t-0 as a label, and as a result, it can be seen that the network initially obtains the capability of detecting the shadow region. After the training of R-net is completed, for the input Image, R-net outputs a shadow detection map shown at the lower half area t-1.
b) For an input Image, P-net is trained with a shadow detection Image generated on R-net for which t 1 is an Image as a label. After completing the training of P-net, for the input Image, P-net outputs the shadow detection map shown at t-1 in the upper half area.
c) At this time, R-net is trained using the shadow detection result at time t-1 generated by P-net. Namely, the training process in a) and b) is repeated, and the training process at the time t 2-4 is completed. And finally, the result of the moment t-4 generated by the P-net is used as the final training result.
2) Evaluation of results
FIG. 7 shows the comparison of the present invention with the results of cGAN, scGAN and ST-CGAN network shadow detection, where Image is the input shadow Image; GT Mask is a shadow detection result image manually marked in a data set; cGAN, scGAN and ST-CGAN are shadow detection result graphs obtained by three existing full-supervision methods respectively; ours is a graph of shadow detection results obtained by the method presented herein. It is noted that all three other networks compared were trained using a fully supervised approach. As can be seen from the figure, the present invention can find out almost all the shadow regions. However, there are some problems with insufficient details regarding the boundary problem. FIG. 8 shows more results obtained by the present method and the effect of the dual-network iterative training, wherein Image is the input shadow Image; CAM is shadow initial seed area; r-net 1 is a shadow detection result graph output after the first training of the area segmentation network; iters 1-4 are shadow detection result graphs output by P-net after 1-4 times of iterative training respectively; GT is a shadow detection result image manually marked in the data set. The shadow detection method well uses the weak label to complete the shadow detection task, namely the shadow detection frame based on weak supervision.
Therefore, the network structure which is provided by the method and adopts the dual-network iterative mining of the local features and the context features can well complete the aim of completing the shadow detection task under the weak label. The network structure provided by the method for mining the local characteristics and the context characteristics by using the double-network iteration is a universal network framework, and two network models can be replaced by the same model.
It can be seen that the method can well complete the shadow detection task under the condition of using the weak label. The dependence on advanced labels of the image dataset is greatly reduced.
As can be seen from the result graph and the result comparison graph of the method, the method effectively utilizes the features extracted by the classification network in the superpixel blocks and the context features directly obtained from the whole graph by the segmentation network. The effective fusion of the two characteristics can effectively achieve the purpose of shadow detection under the condition of weak supervision.
The method provides a training method and a network framework for solving the shadow detection problem by using a weak label for the first time, and is a weak supervision method for effectively solving the shadow detection problem.

Claims (6)

1. A weak supervision shadow detection method for mining complementary features through double networks is characterized by comprising the following steps:
step 1, training a VGG network as a shadow classification network, and obtaining a class activation image of a shadow image as a shadow seed region by using a trained model;
step 2, obtaining a superpixel block by using a Graph-Cuts method;
step 3, taking the super pixel block as the input of a region segmentation network, taking the shadow seed region as a label, extracting shadow features from the shadow block, effectively expanding the shadow region and removing part of error regions;
step 4, training a pixel segmentation network by utilizing a shadow detection result generated by the region segmentation network, extracting more context characteristics by the pixel segmentation network, and providing a more accurate shadow boundary for the shadow region detection network by using the generated shadow detection result;
and 5, training the region segmentation network by utilizing the shadow detection result generated by the pixel segmentation network, excavating shadow features from the seed region by utilizing the region segmentation network from bottom to top, expanding the shadow region by utilizing the excavated features, and correcting the misclassified region to obtain a shadow detection result.
2. The method of weakly supervised shadow detection by dual network mining of complementary features as recited in claim 1, wherein:
the specific steps of the step 1 are as follows:
1.1) training a shadow classification network;
adopting VGG16 as a shadow classification network; the shadow classification network adopts a VGG16 model pre-trained by an ImageNet data set to carry out parameter initialization, then removes a Softmax layer in the VGG16 network, changes the number of output channels of a full connection layer into 2, and simultaneously uses Sigmod as an activation function of the full connection layer;
the shadow classification network is trained by using an ISTD data set, an input image is a shadow image or a shadow-free image, an output result is 1 or 0, namely the shadow image or the shadow-free image exists, and a binary cross entropy function is selected as a loss function;
1.2) obtaining a shadow seed area;
making a class activation graph of the shadow by adopting a Grad-CAM method, and obtaining a shadow seed area as a training starting point of a subsequent task;
the softmax layer input for which the shadow classification network has been obtained is denoted y c The characteristic diagram of the last convolution layer output of the shadow classification network is represented as A k Then the gradient of class c is calculated
Figure FDA0003670493680000011
After the gradient information in the backward propagation is respectively and globally pooled in the width dimension and the height dimension of the feature map, the weight of the neuron is obtained
Figure FDA0003670493680000012
The definition is as follows:
Figure FDA0003670493680000021
wherein Z represents the number of pixels in the feature map;
Figure FDA0003670493680000022
representing the value of the k characteristic diagram with the abscissa as i and the ordinate as j; the softmax layer input for which the shadow classification network has been obtained is denoted y c
Figure FDA0003670493680000023
Is a derivative symbol; after the weight is obtained, the weighted sum of the weight and the corresponding characteristic graph is the calculated class activation graph,
Figure FDA0003670493680000024
the class activation graph representing class c is defined as follows:
Figure FDA0003670493680000025
wherein ReLU () is a linear rectification function, A k Representing the k characteristic diagram;
therefore, the trained shadow classification network can be used for obtaining a shadow activation graph; then, threshold operation is carried out on the class activation graph, namely, only the region with the confidence coefficient larger than 0.65 is selected as the shadow seed region.
3. The method of weakly supervised shadow detection by dual network mining complementary features of claim 1, characterized in that:
the specific steps for obtaining the superpixel blocks are as follows:
the method comprises the steps of processing an image by adopting a Graph Cuts method to obtain a superpixel block of the image, setting the minimum size of a segmentation block to be 200 pixels, 150 pixels and 100 pixels respectively at different training stages when adopting the Graph Cuts method, setting a Gaussian filter parameter to be 1, selecting 4 neighbors to calculate the weight between pixel points, carrying out image segmentation on a shadow image to obtain a superpixel segmentation result, dyeing different colors on the segmented shadow image, and representing a superpixel block in each color region.
4. The method of weakly supervised shadow detection by dual network mining complementary features of claim 1, characterized in that:
the construction steps of the area division network are as follows:
adopting a Fast R-CNN network model as a region segmentation network frame, inputting the maximum external rectangle of a superpixel block in an input shadow image of a network as an interest region, removing a bounding box regression calculation branch, only reserving a confidence calculation branch, changing the number of output elements of a final full connection layer of the confidence calculation branch into 2, and respectively representing the probability that the superpixel block is a shadow region and the probability that the superpixel block is not a shadow region after Softmax calculation;
the loss function of the area division network adopts a cross entropy loss function, and the loss L of the area division network is defined as:
Figure FDA0003670493680000031
wherein m represents the mth superpixel block obtained after the superpixel segmentation of the input shadow image; g is a radical of formula m Indicating whether the m-th block of pixels is a shadow region, i.e. g when the super-pixel block i is a shadow region m Value 1, g when superpixel block i is a non-shaded area m A value of 0; p is a radical of m Representing the probability that a superpixel block m is predicted as a shadow; n is the number of pixel blocks used for training; l is a radical of an alcohol m Representing a loss of the first block of pixels;
in the training, in order to balance the number A of the shaded superpixels and the number B of the unshaded superpixels, the minimum value C of the two numbers is selected to be min (A, B) during each training, the number of the superpixels which is the same as that of C is randomly selected from the other number for training, and the training can be stopped when the loss of the trainer tends to be smooth.
5. The method of weakly supervised shadow detection by dual network mining of complementary features as recited in claim 1, wherein:
the pixel division network is built by the following steps:
the construction of the pixel segmentation network adopts an In-Net network structure, In the process of downsampling, a ReLu activation layer and a maximum pooling layer are connected after each convolution, the downsampled feature map can be spliced with the feature map which corresponds to the downsampled feature map and has the same scale after upsampling, the effect of fusing high-level features and low-level features is achieved, and finally a shadow detection result is directly obtained;
the loss function of the pixel segmentation network is a characteristic that the full convolution network pays more attention to the shadow region, the weight of the shadow region in the cross entropy loss function is strengthened by using a coefficient w (i, j), and the weighted pixel segmentation network loss function is as follows:
Figure FDA0003670493680000032
wherein the size of the characteristic graph is N x M; q. q.s l (i, j) represents the value of the position of the pixel point (i, j) of the ith channel characteristic diagram; w (i, j) is the weight value at the position of i, j), and the calculation process is as follows:
Figure FDA0003670493680000033
wherein S is the total number of pixel points in the truth image, S shadow And (4) the sum of the number of pixel points in the shadow area in the true value image.
6. The method of weakly supervised shadow detection by dual network mining of complementary features as recited in claim 1, wherein:
the step of training the pixel segmentation network is as follows:
a) for an input shadow Image, training the R-net by using a shadow seed region Image corresponding to the Image at the position of t-0 as a label, and after the training of the R-net is completed, outputting a shadow detection map shown at the position of a lower half region t-1 for the input Image;
b) for an input Image, P-net is trained using, as a label, a shadow detection Image generated on R-net for which t is 1, and after the training of P-net is completed, the P-net outputs, for the input Image, a shadow detection map shown at t 1 in the upper half area;
c) in this case, the R-net is trained using the shadow detection result at the time t-1 generated by the P-net, that is, the training process in step a) and step b) is repeated, the training process at the time t-2 to 4 is completed, and finally, the result at the time t-4 generated by the P-net is used as the final training result.
CN202210605710.1A 2022-05-30 2022-05-30 Weak supervision shadow detection method for mining complementary features through double networks Pending CN114998132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210605710.1A CN114998132A (en) 2022-05-30 2022-05-30 Weak supervision shadow detection method for mining complementary features through double networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210605710.1A CN114998132A (en) 2022-05-30 2022-05-30 Weak supervision shadow detection method for mining complementary features through double networks

Publications (1)

Publication Number Publication Date
CN114998132A true CN114998132A (en) 2022-09-02

Family

ID=83031842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210605710.1A Pending CN114998132A (en) 2022-05-30 2022-05-30 Weak supervision shadow detection method for mining complementary features through double networks

Country Status (1)

Country Link
CN (1) CN114998132A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024119824A1 (en) * 2022-12-09 2024-06-13 上海万向区块链股份公司 Image recognition method and system based on inventory counting of biological assets in shadow area

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024119824A1 (en) * 2022-12-09 2024-06-13 上海万向区块链股份公司 Image recognition method and system based on inventory counting of biological assets in shadow area

Similar Documents

Publication Publication Date Title
CN111259828B (en) High-resolution remote sensing image multi-feature-based identification method
CN104050471B (en) Natural scene character detection method and system
CN109448015B (en) Image collaborative segmentation method based on saliency map fusion
CN111368846B (en) Road ponding identification method based on boundary semantic segmentation
CN112750106B (en) Nuclear staining cell counting method based on incomplete marker deep learning, computer equipment and storage medium
CN109741341B (en) Image segmentation method based on super-pixel and long-and-short-term memory network
CN111191583A (en) Space target identification system and method based on convolutional neural network
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN111401380B (en) RGB-D image semantic segmentation method based on depth feature enhancement and edge optimization
CN112541508A (en) Fruit segmentation and recognition method and system and fruit picking robot
CN113239782A (en) Pedestrian re-identification system and method integrating multi-scale GAN and label learning
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN108596240B (en) Image semantic segmentation method based on discriminant feature network
CN110111351B (en) Pedestrian contour tracking method fusing RGBD multi-modal information
CN113486956B (en) Target segmentation system and training method thereof, and target segmentation method and device
CN107506792B (en) Semi-supervised salient object detection method
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN116883650A (en) Image-level weak supervision semantic segmentation method based on attention and local stitching
CN114998132A (en) Weak supervision shadow detection method for mining complementary features through double networks
CN108664968B (en) Unsupervised text positioning method based on text selection model
CN112836755B (en) Sample image generation method and system based on deep learning
CN112446417B (en) Spindle-shaped fruit image segmentation method and system based on multilayer superpixel segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination