CN111275076B - Image significance detection method based on feature selection and feature fusion - Google Patents

Image significance detection method based on feature selection and feature fusion Download PDF

Info

Publication number
CN111275076B
CN111275076B CN202010030505.8A CN202010030505A CN111275076B CN 111275076 B CN111275076 B CN 111275076B CN 202010030505 A CN202010030505 A CN 202010030505A CN 111275076 B CN111275076 B CN 111275076B
Authority
CN
China
Prior art keywords
feature
conv
features
pyramid set
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010030505.8A
Other languages
Chinese (zh)
Other versions
CN111275076A (en
Inventor
袁夏
居思刚
赵春霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202010030505.8A priority Critical patent/CN111275076B/en
Publication of CN111275076A publication Critical patent/CN111275076A/en
Application granted granted Critical
Publication of CN111275076B publication Critical patent/CN111275076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image significance detection method based on feature selection and feature fusion, which comprises the following steps of: extracting features of the input image, and adding the features into the feature pyramid set; selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set; performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set; and training the significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on the image to be detected by using the trained model. The invention adopts the attention model to select the characteristics of the image, enhances the characteristics related to the image target, makes the characteristics more effective, adopts a bottom-up characteristic fusion structure to effectively fuse the detailed characteristics of the bottom layer and the semantic characteristics of the high layer, greatly improves the characteristic capability of the characteristics, and has higher detection accuracy than a common significance model network.

Description

Image significance detection method based on feature selection and feature fusion
Technical Field
The invention belongs to the field of image significance detection, and particularly relates to an image significance detection method based on feature selection and feature fusion.
Background
The saliency of an image is an object or an object which draws attention in the image, the result of saliency detection in the image or the video is often an object in the image or the video, the saliency detection in the neurology department is described as an attention mechanism for focusing or reducing a significant part of a seen object scene, and the saliency detection can automatically process the representation of the object in the image. The significance detection can improve the efficiency of algorithms such as object detection and image segmentation.
The most effective significance detection method at present is realized based on a full convolution neural network. The full convolutional neural network adds a plurality of convolutional layers and pooling layers, gradually increases the receptive field, generates high-level semantic information, and plays a crucial role in significance detection, while the pooling layers reduce the size of feature mapping and worsen the boundaries of salient objects. Some networks protect the boundary of a protruding object by using manual design features, extract the manual features to calculate the significant value of a super pixel, and divide an image into regions by using the manual features. When the saliency map is generated, the handcraft features and the high-level features of the convolutional neural network are complementary, but the methods extract the features separately, and the complementary features extracted separately are difficult to be effectively fused. Furthermore, the manual process feature extraction process is very time consuming.
In addition to manual process characterization, some studies have found that the features of different layers of the network are also complementary and integrate multi-scale features for significance detection. More specifically, deep features often contain global context-aware information that is suitable for correctly locating salient regions. Shallow features contain spatial structural details suitable for locating boundaries. These methods fuse different scale features but do not take into account their different contributions to significance, which makes significance detection underperforming. To overcome these problems, the prior art proposes to introduce a focus model and a gate function into the significance detection network, but this method ignores different features of high-level and low-level features, which may affect the extraction of valid features, and thus reduce the accuracy of significance detection.
Disclosure of Invention
The invention aims to provide an image significance detection method based on feature selection and feature fusion, which can better perform feature characterization and significance prediction on an image.
The technical solution for realizing the purpose of the invention is as follows: an image saliency detection method based on feature selection and feature fusion, the method comprising the steps of:
step 1, extracting features of an input image, and adding all the features into a feature pyramid set;
step 2, selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set;
step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set;
and 4, training a significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on the image to be detected by using the trained significance prediction network model.
Further, the step 1 of performing feature extraction on the input image, specifically performing feature extraction on the input image by using a convolutional neural network ResNext, and the specific process includes:
suppose that the five part of the convolution blocks included in the convolutional neural network ResNext are conv respectively 1 、conv 2 、conv 3 、conv 4 、conv 5
Step 1-1, inputting an image into the five parts of the volume blocks in sequence, and performing forward iteration, wherein an iteration formula is as follows:
f i+1 =conv j (f i ,W j ),j∈[1,5],i∈[-1,3]
wherein, when i = -1, f -1 F is the image to be detected, i is-1,0,1,2,3 respectively i+1 Respectively representing the convolution blocks conv 1 、conv 2 、conv 3 、conv 4 、conv 5 Output result of (1), W j For the convolution block conv j The parameters of (1);
step 1-2, adding the feature graph output by each partial rolling block to an output set to form a feature pyramid set { f } 0 ,f 1 ,f 2 ,f 3 ,f 4 }。
Further, in step 2, feature selection is performed on the feature pyramid set, specifically, a spatial attention and channel attention mechanism is adopted for feature selection, and the specific process includes:
step 2-1, utilizing spatial attention to the bottom layer feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Figure BDA0002364138390000021
Step 2-2, utilizing channel attention to the middle layer characteristic diagram f in the characteristic pyramid set 2 The selection of the characteristics is carried out,obtaining a new mid-level feature map
Figure BDA0002364138390000022
Obtaining new feature pyramid set from above
Figure BDA0002364138390000023
Further, step 2-1 is to utilize the spatial attention to the bottom-level feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Figure BDA0002364138390000024
The method specifically comprises the following steps:
defining an underlying feature graph f 0 Is composed of
Figure BDA0002364138390000031
w, h and c respectively represent the width, height and channel number of the characteristic diagram; constructing a spatial attention module comprising two sub-volume blocks, respectively denoted conv 11 、conv 22
Step 2-1-1, mixing f l Put into conv in sequence 11 、conv 22 The sub-volume blocks respectively output the feature map C 1 、C 2
C 1 =conv 11 (f l ,W 11 )
C 2 =conv 22 (f l ,W 22 )
In the formula, W 11 、W 22 Are each conv 11 、conv 22 Parameters of the sub-volume blocks;
step 2-1-2, p conv 11 、conv 22 Output result C of sub-volume block 1 、C 2 Element-by-element addition is performed and the resulting value of the addition is mapped to [0,1] using a sigmoid function]Obtaining the weight SA of the spatial attention, wherein the specific formula is as follows:
SA=σ(C 1 +C 2 )
in the formula, σ represents a sigmoid function;
step 2-1-3, utilizing the weight SA of the space attention to the bottom layer feature map f 0 Carrying out feature selection to obtain a new underlying feature map
Figure BDA0002364138390000032
Or
Figure BDA0002364138390000033
The formula used is:
Figure BDA0002364138390000034
further, the sub-volume block conv 11 、conv 22 Each including two convolutional layers, one of which has 32 convolutional kernels and 3x3 convolutional kernels, and the other has 64 convolutional kernels and 3x3 convolutional kernels.
Further, step 2-2 describes using channel attention to the middle-level feature map f in the feature pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Figure BDA0002364138390000035
The method specifically comprises the following steps:
definition of middle layer feature map f 2 Is composed of
Figure BDA0002364138390000036
Step 2-2-1, mixing f m Expand into one set:
f m ={f 1 m ,f 2 m ,......,f C m }
wherein f is i m Is f m The ith channel slice feature in (1),
Figure BDA0002364138390000037
i =1,2, …, C, C is characteristic diagram f m The number of channels of (a);
step 2-2-2, slicing feature f for each channel i m Performing global pooling to obtain a channel level vector
Figure BDA0002364138390000038
Step 2-2-3, learning the channel level vector by utilizing two continuous full-connection layers and a nonlinear activation layer to obtain a channel level attention vector, mapping the channel level attention vector to [0,1] by utilizing a sigmoid function, and obtaining a weight CA of channel attention, wherein the formula is as follows:
CA=F(v m ,W)=σ(fc 2 (δ(fc 1 (v m ,W 1 )),W 2 ))
in the formula, W 1 、W 2 Respectively full connection layer fc 1 、fc 2 δ is a nonlinear activation function, and σ is a sigmoid function;
step 2-2-4, centering the middle layer characteristic diagram f by using the weight CA of the channel attention 2 Re-distributing channel weight to obtain new middle layer characteristic diagram
Figure BDA0002364138390000041
Or
Figure BDA0002364138390000042
The formula used is:
Figure BDA0002364138390000043
further, step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a fused feature pyramid set, specifically including:
step 3-1, removing new bottom layer characteristic diagram
Figure BDA0002364138390000044
Sampling some other characteristic graph as new bottom characteristic graph
Figure BDA0002364138390000045
Then upsampled feature map and
Figure BDA0002364138390000046
or mixed feature cascade to obtain cascade feature f cat The formula used is:
Figure BDA0002364138390000047
in the formula (f) i ×) represents the pair feature f i Up-sampling, [ c ]]Representing channel cascade operation, j = -1,
Figure BDA0002364138390000048
to represent
Figure BDA0002364138390000049
j =0,1,2,
Figure BDA00023641383900000410
representing cascade characteristics f cat Hybrid features learned through the three convolutional layers;
step 3-2, the cascade characteristic f cat Through three layers of convolution layers, learning of feature fusion is carried out to obtain mixed features
Figure BDA00023641383900000411
The formula used is:
Figure BDA00023641383900000412
step 3-3, repeating step 3-1 and step 3-2 in a bottom-up mode, and enabling the features f in the new feature pyramid set to be new 1 ,f 2 ,f 3 ,f 4 I.e. f 1 ,
Figure BDA00023641383900000413
f 3 ,f 4 Fusing layer by layer to obtainObtaining a set of mixed feature pyramids
Figure BDA00023641383900000414
Further, the saliency prediction network model in the step 4 includes three convolution layers, a batch regularization layer and an activation layer are added behind the first two convolution layers, and the last convolution layer outputs a saliency map which is a single channel and has the same resolution as the original input image.
Further, in the step 4, training a significance prediction network model by using the features in the mixed feature pyramid set includes:
step 4-1, sequentially carrying out significance prediction on the features in the mixed feature pyramid set by using a significance prediction network model;
4-2, performing loss calculation on all prediction results to obtain a gradient, and performing iterative update on the significance prediction network model parameters by using the gradient through a reverse transfer algorithm;
and (5) repeating the step 4-1 to the step 4-2 until the iteration times exceed a preset threshold value, and finishing the training of the significance prediction network model.
Compared with the prior art, the invention has the following remarkable advantages: 1) The attention model is adopted to select the features of the image, so that the features related to the image target are enhanced, and the features are more effective; 2) The method adopts a bottom-up feature fusion structure to effectively fuse the detail features of the bottom layer and the semantic features of the high layer, greatly improves the characterization capability of the features, and has higher detection accuracy than a common significance model network.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
Fig. 1 is a flowchart of an image saliency detection method based on feature selection and feature fusion according to the present invention.
FIG. 2 is a diagram illustrating feature selection performed on a feature map by a spatial attention module according to the present invention.
FIG. 3 is a diagram illustrating feature selection performed on a feature map by a channel attention module according to the present invention.
FIG. 4 is a schematic diagram of bottom-up feature fusion for a feature pyramid in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, in conjunction with fig. 1, the present invention provides an image saliency detection method based on feature selection and feature fusion, the method including the following steps:
step 1, extracting features of an input image, and adding all the features into a feature pyramid set;
step 2, selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set;
step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set;
and 4, training the significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on the image to be detected by using the trained significance prediction network model.
Further, in one embodiment, the feature extraction is performed on the input image in step 1, specifically, the feature extraction is performed on the input image by using a convolutional neural network ResNext, and the specific process includes:
suppose that the five part of the convolution blocks included in the convolutional neural network ResNext are conv respectively 1 、conv 2 、conv 3 、conv 4 、conv 5 (ii) a The higher feature layer has rich semantic information, and the lower feature layer has rich low-level information such as texture.
Step 1-1, inputting an image into five parts of convolution blocks in sequence, and performing forward iteration, wherein an iteration formula is as follows:
f i+1 =conv j (f i ,W j ),j∈[1,5],i∈[-1,3]
wherein, when i = -1, f -1 F for the image to be detected, i is-1,0,1,2,3, respectively i+1 Respectively representing the convolution blocks conv 1 、conv 2 、conv 3 、conv 4 、conv 5 Output result of (1), W j Conv for convolution block j The parameters of (a);
step 1-2, adding the feature graph output by each partial rolling block to an output set to form a feature pyramid set { f } 0 ,f 1 ,f 2 ,f 3 ,f 4 }。
Exemplary preferably, as a specific example, the above conv 1 Is a layer of convolution layer with convolution kernel size of 7x7, conv 2 、conv 3 、conv 4 、conv 5 The method comprises 3, 4, 6 and 3 blocks, wherein the blocks are structures commonly used in Resnet series, specifically, a network structure formed by serially stacking three layers of convolution layers, and the convolution kernel sizes of the three layers of convolution are 1x1,3x3 and 1x1 respectively.
Illustratively, as a specific example, assume an input image I 3×300×300 The picture size is RGB three channels, and the length and the width of the picture are both 300 pixels. The characteristic pyramid set obtained through the process of the step 1 is
Figure BDA0002364138390000061
Wherein the superscript indicates the serial number of the feature map and the subscript indicates the number of channels and the width and height shapes of the feature map.
Further, in one embodiment, in step 2, feature selection is performed on the feature pyramid set, specifically, a spatial attention and channel attention mechanism is adopted for feature selection, and the specific process includes:
step 2-1, utilizing spatial attention to the bottom layer feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Figure BDA0002364138390000062
Step 2-2, utilizing channel attention to the middle layer characteristic diagram f in the characteristic pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Figure BDA0002364138390000071
Obtaining new feature pyramid set from above
Figure BDA0002364138390000072
Further, in one embodiment, in conjunction with FIG. 2, step 2-1 utilizes spatial attention to the underlying feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Figure BDA0002364138390000073
The method specifically comprises the following steps:
defining an underlying feature graph f 0 Is composed of
Figure BDA0002364138390000074
w, h and c respectively represent the width, height and channel number of the characteristic diagram; in view of the above-described examples,
Figure BDA0002364138390000075
constructing a spatial attention module comprising two sub-volume blocks, respectively denoted conv 11 、conv 22
Step 2-1-1, mixing f l Put into conv in sequence 11 、conv 22 The sub-volume blocks respectively output the feature map C 1 、C 2
C 1 =conv 11 (f l ,W 11 )
C 2 =conv 22 (f l ,W 22 )
In the formula, W 11 、W 22 Are respectively conv 11 、conv 22 Parameters of the sub-volume blocks;
as a specific example, for the above example, the following will be mentioned
Figure BDA0002364138390000076
Put into conv in sequence 11 、conv 22 Sub-volume blocks, respectively outputting a feature map C 1 、C 2
Figure BDA0002364138390000077
Step 2-1-2, p conv 11 、conv 22 Output result C of sub-volume block 1 、C 2 Performing element-by-element addition, and mapping the result of the addition to [0,1] using a sigmoid function]Obtaining the weight SA of the spatial attention, wherein the specific formula is as follows:
SA=σ(C 1 +C 2 )
in the formula, σ represents a sigmoid function;
as a specific example, for the above example,
Figure BDA0002364138390000078
step 2-1-3, utilizing the weight SA of the space attention to the bottom layer characteristic diagram f 0 Carrying out feature selection to obtain a new underlying feature map
Figure BDA0002364138390000079
Or
Figure BDA00023641383900000710
The formula used is:
Figure BDA00023641383900000711
further, in one embodiment, the sub-volume block conv 11 、conv 22 Each including two convolutional layers, one of which has 32 convolutional kernels and 3x3 convolutional kernels, and the other has 64 convolutional kernels and 3x3 convolutional kernels.
Further, in one embodiment, in conjunction with FIG. 3, step 2-2 utilizes channel attentionFor the middle layer characteristic diagram f in the characteristic pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Figure BDA0002364138390000081
The method specifically comprises the following steps:
definition of middle layer feature map f 2 Is composed of
Figure BDA0002364138390000082
Step 2-2-1, mixing f m Expand into one set:
f m ={f 1 m ,f 2 m ,......,f C m }
wherein f is i m Is f m The ith channel slice feature in (1),
Figure BDA0002364138390000083
i =1,2, …, C, C is characteristic diagram f m The number of channels of (a);
as a specific example, the above example is addressed
Figure BDA0002364138390000084
Spread out into a set f m ={f 1 m ,f 2 m ,......,f 512 m }。
Step 2-2-2, slicing feature f for each channel i m Performing global pooling to obtain a channel level vector
Figure BDA0002364138390000085
As a specific example, for the above example,
Figure BDA0002364138390000086
the vector is a 512x1 dimension channel.
Step 2-2-3, learning a channel level vector by utilizing two continuous full-connection layers and a nonlinear activation layer to obtain a channel level attention vector, mapping the channel level attention vector to [0,1] by utilizing a sigmoid function, and obtaining a weight CA of channel attention, wherein the formula is as follows:
CA=F(v m ,W)=σ(fc 2 (δ(fc 1 (v m ,W 1 )),W 2 ))
in the formula, W 1 、W 2 Respectively full connection layer fc 1 、fc 2 Delta is a nonlinear activation function, and sigma is a sigmoid function;
as a specific example, for the above example,
Figure BDA0002364138390000087
step 2-2-4, centering the middle layer characteristic diagram f by using the weight CA of the channel attention 2 Re-distributing channel weight to obtain new middle layer characteristic diagram
Figure BDA0002364138390000088
Or
Figure BDA0002364138390000089
The formula used is:
Figure BDA00023641383900000810
further, in one embodiment, with reference to fig. 4, in step 3, feature fusion is performed on features in the new feature pyramid set in a bottom-up manner, so as to obtain a fused feature pyramid set, which specifically includes:
step 3-1, removing new bottom layer characteristic diagram
Figure BDA0002364138390000091
Sampling some other characteristic graph as new bottom characteristic graph
Figure BDA0002364138390000092
Then the up-sampled feature map and
Figure BDA0002364138390000093
or mixed feature cascade to obtain cascade feature f cat The formula used is:
Figure BDA0002364138390000094
in the formula (f) i ×) represents the pair feature f i Up-sampling, [ c ]]Indicating channel cascade operation, j = -1,
Figure BDA0002364138390000095
to represent
Figure BDA0002364138390000096
j =0,1,2,
Figure BDA0002364138390000097
representing cascade characteristics f cat Hybrid features learned through the three convolutional layers;
step 3-2, cascading characteristic f cat Through three layers of convolution layers, learning of feature fusion is carried out to obtain mixed features
Figure BDA0002364138390000098
The formula used is:
Figure BDA0002364138390000099
step 3-3, repeating step 3-1 and step 3-2 in a bottom-up mode, and enabling the features f in the new feature pyramid set to be new 1 ,f 2 ,f 3 ,f 4 I.e. f 1 ,
Figure BDA00023641383900000910
f 3 ,f 4 Fusing layer by layer to obtain a mixed characteristic pyramid set
Figure BDA00023641383900000911
Illustratively, in one embodiment, the convolution kernel size of the three convolutional layers in step 3-2 is 3x3,1x1 in this order.
Further, in one embodiment, the saliency-predicted network model in step 4 includes three convolutional layers, the first two convolutional layers are added with batch regularization layers and activation layers, and the last convolutional layer outputs a saliency map of a single channel and the same resolution as the original input image.
In one embodiment, the sizes of convolution kernels of three convolutional layers included in the significance prediction network model are sequentially 3x3,1x1.
Further, in one embodiment, in step 4, the feature in the mixed feature pyramid set is used to train the significance prediction network model, and the specific process includes:
step 4-1, sequentially carrying out significance prediction on the features in the mixed feature pyramid set by using a significance prediction network model;
4-2, performing loss calculation on all prediction results to obtain a gradient, and performing iterative update on the significance prediction network model parameters by using the gradient through a reverse transfer algorithm;
and (5) repeating the step 4-1 to the step 4-2 until the iteration times exceed a preset threshold value, and finishing the training of the significance prediction network model.
The invention adopts the attention model to select the characteristics of the image, enhances the characteristics related to the image target, makes the characteristics more effective, adopts a bottom-up characteristic fusion structure to effectively fuse the detailed characteristics of the bottom layer and the semantic characteristics of the high layer, greatly improves the characteristic capability of the characteristics, and has higher detection accuracy than a common significance model network.

Claims (6)

1. An image saliency detection method based on feature selection and feature fusion is characterized by comprising the following steps:
step 1, extracting features of an input image, and adding all the features into a feature pyramid set; the feature extraction is performed on the input image, specifically, the feature extraction is performed on the input image by using a convolutional neural network ResNext, and the specific process includes:
suppose that the five partial convolution blocks included in the convolutional neural network ResNext are conv 1 、conv 2 、conv 3 、conv 4 、conv 5
Step 1-1, inputting an image into the five parts of the volume blocks in sequence, and performing forward iteration, wherein an iteration formula is as follows:
f i+1 =conv j (f i ,W j ),j∈[1,5],i∈[-1,3]
wherein, when i = -1, f -1 F is the image to be detected, i is-1,0,1,2,3 respectively i+1 Respectively representing the convolution blocks conv 1 、conv 2 、conv 3 、conv 4 、conv 5 Output result of (1), W j For the convolution block conv j The parameters of (1);
step 1-2, adding the feature graph output by each partial rolling block to an output set to form a feature pyramid set { f } 0 ,f 1 ,f 2 ,f 3 ,f 4 };
Step 2, selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set; and selecting the features of the feature pyramid set, specifically selecting the features by adopting a space attention and channel attention mechanism, wherein the specific process comprises the following steps:
step 2-1, utilizing spatial attention to the bottom layer feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Figure FDA0003796149800000011
Step 2-2, utilizing channel attention to the middle layer feature pattern f in the feature pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Figure FDA0003796149800000012
Obtaining new feature pyramid set from above
Figure FDA0003796149800000013
Step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set; the feature fusion is performed on the features in the new feature pyramid set in a bottom-up manner to obtain a fused feature pyramid set, and the method specifically includes:
step 3-1, removing new bottom layer characteristic diagram
Figure FDA0003796149800000014
Sampling some other characteristic graph as new bottom characteristic graph
Figure FDA0003796149800000015
Then upsampled feature map and
Figure FDA0003796149800000021
or mixed feature cascade to obtain cascade feature f cat The formula used is:
Figure FDA0003796149800000022
in the formula (f) i ×) represents the pair feature f i Up-sampling, [ c ]]Indicating channel cascade operation, j = -1,
Figure FDA0003796149800000023
represent
Figure FDA0003796149800000024
j =0,1,2,
Figure FDA0003796149800000025
representing cascade characteristics f cat The hybrid features learned through the three convolutional layers;
step 3-2, the cascade characteristic f cat Through three layers of convolution layers, learning of feature fusion is carried out to obtain mixed features
Figure FDA0003796149800000026
The formula used is:
Figure FDA0003796149800000027
step 3-3, repeating step 3-1 and step 3-2 in a bottom-up mode, and enabling the features f in the new feature pyramid set to be in the same shape 1 ,f 2 ,f 3 ,f 4 I.e. f 1 ,
Figure FDA0003796149800000028
f 3 ,f 4 Fusing layer by layer to obtain a pyramid set with mixed features
Figure FDA0003796149800000029
Step 4, training a significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on an image to be detected by using the trained significance prediction network model; the significance prediction network model comprises three convolutional layers, wherein a batch regularization layer and an activation layer are added behind the first two convolutional layers, and the last convolutional layer outputs a significance map which is a single channel and has the same resolution as the original input image.
2. The method for detecting image saliency based on feature selection and feature fusion as claimed in claim 1, wherein step 2-1 utilizes spatial attention to the underlying feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Figure FDA00037961498000000210
The method specifically comprises the following steps:
defining an underlying feature graph f 0 Is composed of
Figure FDA00037961498000000211
w, h and c respectively represent the width, height and channel number of the characteristic diagram; constructing a spatial attention module comprising two sub-volume blocks, respectively denoted conv 11 、conv 22
Step 2-1-1, mixing f l Put into conv in sequence 11 、conv 22 The sub-volume blocks respectively output the feature map C 1 、C 2
C 1 =conv 11 (f l ,W 11 )
C 2 =conv 22 (f l ,W 22 )
In the formula, W 11 、W 22 Are respectively conv 11 、conv 22 Parameters of the sub-volume blocks;
step 2-1-2, p conv 11 、conv 22 Output result C of sub-volume block 1 、C 2 Element-by-element addition is performed and the resulting value of the addition is mapped to [0,1] using a sigmoid function]The specific formula of the weight SA of the spatial attention is obtained as follows:
SA=σ(C 1 +C 2 )
in the formula, σ represents a sigmoid function;
step 2-1-3, utilizing the weight SA of the space attention to the bottom layer characteristic diagram f 0 Carrying out feature selection to obtain a new underlying feature map
Figure FDA0003796149800000031
Or
Figure FDA0003796149800000032
The formula used is:
Figure FDA0003796149800000033
3. the method according to claim 2, wherein the sub-volume block conv is used for detecting image saliency based on feature selection and feature fusion 11 、conv 22 Each including two convolutional layers, where the number of convolutional kernels in one layer is 32, the size of convolutional kernel is 3x3, the number of convolutional kernels in the other layer is 64, and the size of convolutional kernel is 3x3.
4. The method for detecting image saliency based on feature selection and feature fusion as claimed in claim 1, characterized in that, in step 2-2, the middle-level feature map f in the feature pyramid set is focused on by channel attention 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Figure FDA0003796149800000034
The method specifically comprises the following steps:
definition of middle layer feature map f 2 Is composed of
Figure FDA0003796149800000035
Step 2-2-1, mixing f m Expand into one set:
f m ={f 1 m ,f 2 m ,......,f C m }
wherein f is i m Is f m The ith channel slice feature in (1),
Figure FDA0003796149800000036
c is a characteristic diagram f m The number of channels of (a);
step 2-2-2, slicing feature f for each channel i m Performing global pooling to obtain a channel level vector
Figure FDA0003796149800000037
Step 2-2-3, learning the channel level vector by utilizing two continuous full-connection layers and a nonlinear activation layer to obtain a channel level attention vector, mapping the channel level attention vector to [0,1] by utilizing a sigmoid function, and obtaining a weight CA of channel attention, wherein the formula is as follows:
CA=F(v m ,W)=σ(fc 2 (δ(fc 1 (v m ,W 1 )),W 2 ))
in the formula, W 1 、W 2 Respectively full connection layer fc 1 、fc 2 δ is a nonlinear activation function, and σ is a sigmoid function;
step 2-2-4, centering the middle layer characteristic diagram f by using the weight CA of the channel attention 2 Re-distributing channel weight to obtain new middle layer characteristic diagram
Figure FDA0003796149800000041
Or
Figure FDA0003796149800000042
The formula used is:
Figure FDA0003796149800000043
5. the image significance detection method based on feature selection and feature fusion as claimed in claim 1, wherein the convolution kernel size of the three layers of convolution layers in step 3-2 is 3x3,1x1 in sequence.
6. The method for detecting image saliency based on feature selection and feature fusion according to claim 1, wherein in step 4, the feature in the mixed feature pyramid set is used for training a saliency prediction network model, and the specific process includes:
step 4-1, sequentially carrying out significance prediction on the features in the mixed feature pyramid set by using a significance prediction network model;
4-2, performing loss calculation on all prediction results to obtain a gradient, and performing iterative update on the significance prediction network model parameters by using the gradient through a reverse transfer algorithm;
and (5) repeating the step 4-1 to the step 4-2 until the iteration times exceed a preset threshold value, and finishing the training of the significance prediction network model.
CN202010030505.8A 2020-01-13 2020-01-13 Image significance detection method based on feature selection and feature fusion Active CN111275076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010030505.8A CN111275076B (en) 2020-01-13 2020-01-13 Image significance detection method based on feature selection and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010030505.8A CN111275076B (en) 2020-01-13 2020-01-13 Image significance detection method based on feature selection and feature fusion

Publications (2)

Publication Number Publication Date
CN111275076A CN111275076A (en) 2020-06-12
CN111275076B true CN111275076B (en) 2022-10-21

Family

ID=70997061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010030505.8A Active CN111275076B (en) 2020-01-13 2020-01-13 Image significance detection method based on feature selection and feature fusion

Country Status (1)

Country Link
CN (1) CN111275076B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931793B (en) * 2020-08-17 2024-04-12 湖南城市学院 Method and system for extracting saliency target
CN112927209B (en) * 2021-03-05 2022-02-11 重庆邮电大学 CNN-based significance detection system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN110619638A (en) * 2019-08-22 2019-12-27 浙江科技学院 Multi-mode fusion significance detection method based on convolution block attention module

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI709107B (en) * 2018-05-21 2020-11-01 國立清華大學 Image feature extraction method and saliency prediction method including the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN110619638A (en) * 2019-08-22 2019-12-27 浙江科技学院 Multi-mode fusion significance detection method based on convolution block attention module

Also Published As

Publication number Publication date
CN111275076A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
Qian et al. Learning and transferring representations for image steganalysis using convolutional neural network
CN111325165B (en) Urban remote sensing image scene classification method considering spatial relationship information
CN105095862B (en) A kind of human motion recognition method based on depth convolution condition random field
CN105678284B (en) A kind of fixed bit human body behavior analysis method
CN104462494B (en) A kind of remote sensing image retrieval method and system based on unsupervised feature learning
CN113344806A (en) Image defogging method and system based on global feature fusion attention network
CN110287777B (en) Golden monkey body segmentation algorithm in natural scene
CN109753959B (en) Road traffic sign detection method based on self-adaptive multi-scale feature fusion
CN113269224B (en) Scene image classification method, system and storage medium
CN111275076B (en) Image significance detection method based on feature selection and feature fusion
CN110866455B (en) Pavement water body detection method
CN113449612B (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
Yoo et al. Fast training of convolutional neural network classifiers through extreme learning machines
CN112132145A (en) Image classification method and system based on model extended convolutional neural network
CN112991364A (en) Road scene semantic segmentation method based on convolution neural network cross-modal fusion
CN111967464A (en) Weak supervision target positioning method based on deep learning
CN115482518A (en) Extensible multitask visual perception method for traffic scene
CN113393457A (en) Anchor-frame-free target detection method combining residual dense block and position attention
CN113627487B (en) Super-resolution reconstruction method based on deep attention mechanism
CN115410087A (en) Transmission line foreign matter detection method based on improved YOLOv4
CN111340189A (en) Space pyramid graph convolution network implementation method
CN117218457B (en) Self-supervision industrial anomaly detection method based on double-layer two-dimensional normalized flow
CN114219757B (en) Intelligent damage assessment method for vehicle based on improved Mask R-CNN
CN111047571B (en) Image salient target detection method with self-adaptive selection training process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant