CN111028235A - Image segmentation method for enhancing edge and detail information by utilizing feature fusion - Google Patents
Image segmentation method for enhancing edge and detail information by utilizing feature fusion Download PDFInfo
- Publication number
- CN111028235A CN111028235A CN201911094462.3A CN201911094462A CN111028235A CN 111028235 A CN111028235 A CN 111028235A CN 201911094462 A CN201911094462 A CN 201911094462A CN 111028235 A CN111028235 A CN 111028235A
- Authority
- CN
- China
- Prior art keywords
- feature map
- feature
- fusion
- conv
- pooling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000003709 image segmentation Methods 0.000 title claims abstract description 18
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 15
- 238000005070 sampling Methods 0.000 claims abstract description 13
- 238000010606 normalization Methods 0.000 claims abstract description 7
- 238000011478 gradient descent method Methods 0.000 claims abstract description 4
- 238000011176 pooling Methods 0.000 claims description 74
- 238000010586 diagram Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 abstract description 14
- 230000000694 effects Effects 0.000 abstract description 3
- 238000013528 artificial neural network Methods 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
Abstract
The invention provides an image segmentation method for enhancing edge and detail information by utilizing feature fusion, and relates to the technical field of computer vision. The method utilizes a convolution neural network to extract the characteristics of an input image; inputting the extracted features into a decoding structure added with more feature fusion, and enriching edge and detail information while restoring the image resolution to obtain a dense feature map; outputting the maximum values of different classifications through a normalization method; and calculating a cross entropy loss function, and updating the weight in the network by using a random gradient descent method. The method can restore the position and boundary detail information lost in the encoding stage while restoring the resolution of the feature map, enriches the information of the picture, obtains the dense feature map, makes up the sparse feature map brought by direct up-sampling, makes the boundary and the detail of the segmentation clearer, and improves the effect of segmenting the detailed small objects.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to an image segmentation method for enhancing edge and detail information by utilizing feature fusion.
Background
With the continuous progress of scientific technology and the rapid development of national economy, artificial intelligence gradually enters the visual field of people, plays an increasingly important role in the production and life of human beings, is widely applied in various fields, is an important research direction of artificial intelligence, is a very important means for realizing automatic scene understanding, and can be applied to many fields such as automatic driving systems, unmanned vehicles, application and the like.
The image semantic segmentation technology is an important branch of the computer vision field in machine learning, and is used for processing an input image, automatically segmenting and identifying the content in the image. Before applying deep learning to the field of computer vision, classifiers that build semantic segmentation of images are usually based on the use of a texton forest, or a random forest. With the appearance and the vigorous development of the deep convolutional neural network, an effective method is provided for semantic segmentation, the CNN is applied to the semantic segmentation to make a good progress, the development of the semantic segmentation is promoted, and the application of the CNN in various fields makes a remarkable result.
After deep learning is applied to semantic segmentation, many classical segmentation methods appear, such as a Full Convolution Network (FCN), a segNet network with an encoder-decoder structure and a deep Lab with a hole convolution, but as the hierarchy of a CNN network is deepened, continuous pooling and downsampling can cause position information and boundary detail information of a picture to be lost, the process is irreversible, and the removed information cannot be completely recovered, so that a feature map sampled in a decoding stage becomes sparse due to information loss, and the methods have certain limitations.
The position and edge details of the full convolution network FCN and the traditional SegNet network are lost due to down sampling, the information lost during up sampling in the decoding stage is not reproduced, the obtained characteristic diagram is sparse, and although the SegNet network recovers the position information through the pooling index and enriches the boundary and detail information by using convolution operation, a large amount of information is lost.
The hole convolution is a convolution layer capable of obtaining a dense feature map, but the calculation cost of using the hole convolution is high, and a large amount of memory is occupied when a large amount of high-resolution feature maps are processed.
The existing image semantic segmentation method generally has the problems that the retention of edge detail features and position information still needs to be further improved, and the segmentation accuracy also needs to be improved.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide an image segmentation method for enhancing edge and detail information by feature fusion, so as to realize the segmentation of images.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an image segmentation method for enhancing edge and detail information by using feature fusion comprises the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: zooming and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360 × 480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as that of a SegNet network, a front 13 layer of VGG-16 is adopted, and a maximum pooling index is added during pooling to remember the maximum value of pixels in an image and the positions of the pixels;
the convolutional kernel size of each convolutional layer of the coding structure is 3 × 3, and the signature graph after each convolutional layer is conv _ i _ j, where i is 1,2,3,4,5, j is 1,2 when i is 1,2, and j is 1,2,3 when i is 3,4, 5; meanwhile, each convolution layer is followed by the Batch normalization and the ReLU activation functions; adding a maximum pooling index into each pooling layer, realizing down-sampling by using 2 × 2 non-overlapping maximum pooling, and keeping the position of the maximum pixel value through the maximum pooling index, wherein a feature map obtained by each pooling layer is represented by pool _ r, wherein r is 1,2,3,4, 5;
the specific method for memorizing the maximum value of the pixel in the image and the position of the pixel by adding the maximum pooling index during pooling comprises the following steps:
for an input profile X ∈ Rh×w×cWherein h and w are each independentlyHeight and width of the feature map, c is the number of channels, and the feature map is obtained by 2 x 2 non-overlapping maximum poolingWherein, the value of the pixel point (i, j) is shown as the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m)i,nj) The following formula shows:
and step 3: inputting the pooled feature map pool _5 obtained by the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in the original position by using the maximum pooled index, and filling the rest positions with 0 to realize 2 times of upsampling to obtain a sparse feature map upsampling 5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolution layer in the decoding structure is followed by a Batch normalization and a ReLU activation function;
the obtained sparse feature map upsampling5 has a value of each pixel as shown in the following formula:
wherein Z isu,vThe pixel value of a pixel point (u, v) in the sparse feature map upsampling5 is obtained;
and 4, step 4: performing feature fusion operation once through a decoding structure, fusing the sparse feature map upsampling5 with the convolution feature maps conv _5_1 and conv _5_2, and fusing the feature map obtained by fusion with the pooled feature map pool _4 with the corresponding size to obtain a fusion feature map F1;
The fusion process is that the pixel values of the corresponding positions in the feature map are added;
fusing the feature maps F1Inputting the data into a first three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode5, and making up for information loss caused by pooling and downsampling;
and 5: performing feature fusion operation for four times through a decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operation until the resolution of the feature map is restored to the original size;
step 5.1: performing second feature fusion through a decoding structure to restore image information;
step 5.1.1: performing 2-time upsampling on conv _ decode5 by using a maximum pooling index stored when the pooling feature map pool _4 is generated to obtain a sparse feature map upsampling 4;
step 5.1.2: fusing the sparse feature map upsampling4 with the convolutional feature maps conv _4_1, conv _4_2 and pooling feature map pool _3 with the same resolution extracted from the coding structure to obtain a fused feature map F2;
Step 5.1.3: fusing the feature maps F2Inputting the data into a second three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 4;
step 5.2: carrying out third-time feature fusion through a decoding structure to restore image information;
step 5.2.1: performing 2-time upsampling on the feature map conv _ decode4 by using the maximum pooling index stored when the pooling feature map pool _3 is generated to obtain a sparse feature map upsampling 3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3, the convolutional feature maps conv3_1 and conv3_2 extracted from the coding structure and having the same resolution, and the pooled feature map pool _2 to obtain a fusion feature map F3;
Step 5.2.3: fusing the feature maps F3Inputting the data into a third three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 3;
step 5.3: performing fourth feature fusion through a decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time upsampling on the conv _ decode3 of the feature map by using the maximum pooling index stored when the pooling feature map pool _2 is generated to obtain a sparse feature map upsampling 2;
step 5.3.2: performing feature fusion on the sparse feature map upsampling2, the convolution feature map conv _2_1 and the pooling feature map pool _1 to obtain a fusion feature map F4;
Step 5.3.3: according to the symmetry of the SegNet network, fusing the feature diagram F4Inputting the data into a first two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 2;
step 5.4: performing fifth feature fusion through a decoding structure to recover edge information of the image;
step 5.4.1: performing 2-time upsampling on the feature map conv _ decode2 by using the maximum pooling index stored when the pooling feature map pool _1 is generated to obtain a sparse feature map upsampling 1;
step 5.4.2: performing feature fusion on the sparse feature map upsampling1 and the convolution feature map conv _1_1 to obtain a fusion feature map F5;
Step 5.4.3: fusing the feature maps F5Inputting the data into a second two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 1;
step 6: inputting the dense feature map conv _ decode1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
and 7: and calculating a cross entropy loss function through the maximum probability of pixel classification in the image, and updating convolution kernel parameters of each convolution layer and each pooling layer in the coding structure and the decoding structure through a random gradient descent method to realize image segmentation.
The technical principle of the method is as follows: on the basis of an original SegNet network, a decoding stage is improved, the resolution of a feature map is restored, and meanwhile, image position and boundary detail information are restored, so that a dense feature map is obtained; the method is characterized in that features of an image are extracted by utilizing a convolutional layer and a pooling layer in a coding structure, different scales of information are extracted by the convolutional layer and the pooling layer at different depths, global low-level semantic information such as edges, directions, textures, chroma and the like is extracted by a shallow layer structure, local high-level semantic information such as the shape of an object is extracted by a deep layer structure, the features extracted deeper the network level are more abstract, and in order to extract more abstract high-level features, the maximum pooling instead of average pooling is selected by the model in the coding structure.
Since the maximum pixel value extracted from the feature map and the position thereof are important, not only edge detail information is lost during pooling, but also position information is lost due to the reduction of the resolution of the feature map, a pooling index is added into the coding structure to remember the position of the maximum pixel value, the maximum pixel value is released in the original position by the decoding structure through the pooling index, and the rest positions are filled with 0, so that 2-time upsampling can be realized, important position information can be recovered, and errors are reduced.
However, as the network hierarchy of the decoding structure is deepened, the extracted features are more and more abstract, a lot of edge detail information is lost, information with different scales is lost in each layer, the rest of the positions of the feature diagram obtained after upsampling in the decoding structure except the maximum value are all 0, the obtained feature diagram is sparse, the lost information is not reproduced in the feature diagram obtained after upsampling, so that feature fusion is added into the decoding structure to recover the information, and the sparse feature diagram obtained after each upsampling is superposed with the feature diagram obtained after convolution and pooling of the corresponding size in the encoding stage. In this way, each feature map after up-sampling is input into the fusion structure, the information lost in the coding stage is gradually recovered, and then the fusion result is input into the convolution layer to further enrich the information, so that a denser feature map is obtained, the segmentation effect is better, and the accuracy is higher.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the image segmentation method for enhancing the edge and detail information by utilizing feature fusion can restore the position and edge detail information lost in the encoding stage while restoring the resolution of the feature map, enrich the information of the image, obtain the dense feature map, make up for the sparse feature map brought by direct up-sampling, make the segmented edge and detail clearer, improve the segmentation effect on fine and small objects in detail, and improve the average segmentation precision and mIOU.
Drawings
Fig. 1 is a flowchart of an image segmentation method for enhancing edge and detail information by feature fusion according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In this embodiment, an image segmentation method for enhancing edge and detail information by feature fusion, as shown in fig. 1, includes the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: zooming and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360 × 480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as that of a SegNet network, a front 13 layer of VGG-16 is adopted, and a maximum pooling index is added during pooling to remember the maximum value of pixels in an image and the positions of the pixels;
the convolution kernel size of each convolution layer of the coding structure is 3 × 3, which ensures that the image size is unchanged, and the feature map after each convolution layer is denoted as conv _ i _ j, where i is 1,2,3,4,5, j is 1,2 when i is 1,2, and j is 1,2,3 when i is 3,4, 5; meanwhile, each convolution layer is followed by the Batch normalization and the ReLU activation functions; the BatchNormal is used for accelerating the convergence speed of the model and relieving the gradient dispersion problem in the deep network to a certain extent, so that the deep network model is easier and more stable to train; the ReLU activation function is selected, so that gradient disappearance can be solved, and overfitting of the network is relieved; adding a maximum pooling index into each pooling layer, realizing down-sampling by using 2 × 2 non-overlapping maximum pooling, and keeping the position of the maximum pixel value through the maximum pooling index, wherein a feature map obtained by each pooling layer is represented by pool _ r, wherein r is 1,2,3,4, 5;
the coding structure uses the front 13 layers of VGG-16 to extract the characteristics of pictures, and uses the convolution layer and the pooling layer to extract the image characteristics with different scales, the front 4 layers of the structure can be regarded as a shallow layer structure, the obtained low-level semantic information is low, the back 9 layers can be regarded as a deep layer structure, the obtained high-level abstract information is high, and the characteristics with different scales can be obtained through the coding structure;
for an input profile X ∈ Rh×w×cH and w are height and width of the characteristic diagram respectively, c is channel number, and the characteristic diagram is obtained by 2 multiplied by 2 non-overlapping maximum poolingWherein, the value of the pixel point (i, j) is shown as the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m)i,nj) The following formula shows:
and step 3: inputting the pooled feature map pool _5 obtained by the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in the original position by using the maximum pooled index, and filling the rest positions with 0 to realize 2 times of upsampling to obtain a sparse feature map upsampling 5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolution layer in the decoding structure is followed by a Batch normalization and a ReLU activation function;
the value of each pixel in the obtained sparse feature map upsampling5 is shown as follows:
wherein Z isu,vIs the pixel value of the pixel point (u, v) in the sparse feature map upsampling 5.
And 4, step 4: as the feature map obtained by up-sampling is sparse, the feature fusion operation is carried out once through a decoding structure; convolutional feature maps extracted from the coding structure and having the same resolution as the sparse feature map upsampling5 include conv _5_1, conv _5_2 and conv _5_3, and as pool _5 is obtained by directly pooling conv _5_3, a part of information is recovered in the process of 2 times of upsampling, and in order to reduce the training parameters of the model, only the sparse feature map upsampling5 is fused with the convolutional feature maps conv _5_1 and conv _5_2, and the fused feature map is fused with the pooled feature map pool _4 with the corresponding size, so that a fused feature map F is obtained1;
The fusion process is that the pixel values of the corresponding positions in the feature map are added;
in order to maintain the symmetry of the original SegNet network, a feature map F is fused1Inputting the data into a first three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode5, further enriching the information of the picture and making up the information loss caused by pooling and downsampling;
step 4 is equivalent to the first feature fusion operation, five times of feature fusion are required in the decoding process of the method, and the method is divided into three different fusion forms according to the difference of the up-sampling depth, wherein the first three fusion forms are the same, and the next four feature fusion forms are required.
And 5: performing feature fusion operation for four times through a decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operation until the resolution of the feature map is restored to the original size to obtain a dense feature map conv _ decode 1;
step 5.1: performing second feature fusion through a decoding structure to restore image information;
step 5.1.1: after the step 4, the resolution of the conv _ decode5 of the feature map is the same as that of the pooled feature map pool _4, and 2 times of upsampling is performed on the conv _ decode5 by using the maximum pooled index stored when the pooled feature map pool _4 is generated, so that a sparse feature map upsampling4 is obtained;
step 5.1.2: fusing the sparse feature map upsampling4 with the convolutional feature maps conv _4_1, conv _4_2 and pooling feature map pool _3 with the same resolution extracted from the coding structure to obtain a fused feature map F2;
Step 5.1.3: fusing the feature maps F2Inputting the data into a second three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 4;
step 5.2: carrying out third-time feature fusion through a decoding structure to restore image information;
step 5.2.1: performing 2-time upsampling on the feature map conv _ decode4 by using the maximum pooling index stored when the pooling feature map pool _3 is generated to obtain a sparse feature map upsampling 3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3, the convolutional feature maps conv3_1 and conv3_2 extracted from the coding structure and having the same resolution, and the pooled feature map pool _2 to obtain a fusion feature map F3;
Step 5.2.3: fusing the feature maps F3Inputting the data into a third three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 3;
the first three times of feature fusion are coding feature maps corresponding to three stages, have the same fusion structure, and the feature maps participating in the fusion have lower resolution and have local abstract features, so that the local abstract features are recovered by using the same fusion form.
Step 5.3: performing fourth feature fusion through a decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time upsampling on the conv _ decode3 of the feature map by using the maximum pooling index stored when the pooling feature map pool _2 is generated to obtain a sparse feature map upsampling 2;
step 5.3.2: since the resolution of the feature map is restored to that of the original image after step 5.3.1At this time, the corresponding feature maps comprise conv _2_1, conv _2_2 and pool _1, and in order to reduce the parameters of model training, only the sparse feature map upsampling2 is subjected to feature fusion with the convolution feature map conv _2_1 and the pooling feature map pool _1 to obtain a fusion feature map F4;
Step 5.3.3: according to the symmetry of the SegNet network, fusing the feature diagram F4Inputting the data into a first two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 2;
different from the feature fusion of the first three times, the feature fusion of the time corresponds to the coding feature maps of the two stages and is used for recovering detailed information, so the fusion forms are different;
step 5.4: performing fifth feature fusion through a decoding structure to recover edge information of the image;
step 5.4.1: performing 2-time upsampling on the feature map conv _ decode2 by using the maximum pooling index stored when the pooling feature map pool _1 is generated to obtain a sparse feature map upsampling 1;
step 5.4.2: since the resolution of the feature map is restored to the original size after step 5.4.1, and there are convolution features conv _1_1 and conv _1_2 with the feature map with the same resolution obtained by the coding structure, in order to reduce the parameters of the model training, only the sparse feature map upsampling1 and the convolution feature map conv _1_1 are subjected to feature fusion to obtain a fusion feature map F5;
Step 5.4.3: fusing the feature maps F5Inputting the data into a second two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 1;
the feature fusion only has one stage of coding feature graph participating in the fusion and is used for recovering edge information.
Step 6: inputting the dense feature map conv _ decode1 to the Softmax layer results in the maximum probability of pixel classification in the image.
And 7: and calculating a cross entropy loss function through the maximum probability of pixel classification in the image, and updating convolution kernel parameters of each convolution layer and each pooling layer in the coding structure and the decoding structure through a random gradient descent method to realize image segmentation.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
Claims (6)
1. An image segmentation method for enhancing edge and detail information by using feature fusion is characterized in that: the method comprises the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as that of a SegNet network, a front 13 layer of VGG-16 is adopted, and a maximum pooling index is added during pooling to remember the maximum value of pixels in an image and the positions of the pixels;
the convolutional kernel size of each convolutional layer of the coding structure is 3 × 3, and the feature map after each convolutional layer is denoted as conv _ i _ j, where i is 1,2,3,4,5, j is 1,2 when i is 1,2, and j is 1,2,3 when i is 3,4, 5; meanwhile, each convolution layer is followed by the Batch normalization and the ReLU activation functions; adding a maximum pooling index into each pooling layer, realizing down-sampling by using 2 × 2 non-overlapping maximum pooling, and keeping the position of the maximum pixel value through the maximum pooling index, wherein a feature map obtained by each pooling layer is represented by pool _ r, wherein r is 1,2,3,4, 5;
and step 3: inputting the pooled feature map pool _5 obtained by the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in the original position by using the maximum pooled index, and filling the rest positions with 0 to realize 2 times of upsampling to obtain a sparse feature map upsampling 5;
and 4, step 4: performing feature fusion operation once through a decoding structure, fusing the sparse feature map upsampling5 with the convolution feature maps conv _5_1 and conv _5_2, and fusing the feature map obtained by fusion with the pooled feature map pool _4 with the corresponding size to obtain a fusion feature map F1;
Fusing the feature maps F1Inputting the data into a first three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode5, and making up for information loss caused by pooling and downsampling;
and 5: performing feature fusion operation for four times through a decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operation until the resolution of the feature map is restored to the original size to obtain a dense feature map conv _ decode 1;
step 6: inputting the dense feature map conv _ decode1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
and 7: and calculating a cross entropy loss function through the maximum probability of pixel classification in the image, and updating convolution kernel parameters of each convolution layer and each pooling layer in the coding structure and the decoding structure through a random gradient descent method to realize image segmentation.
2. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 1, wherein: the specific method of the step 1 comprises the following steps:
step 1.1: zooming and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: the resolution of the input image is fixed to 360 × 480.
3. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 1, wherein: step 2, adding a maximum pooling index during pooling to remember the maximum value of the pixel in the image and the position of the pixel in the image comprises the following specific steps:
for an input profile X ∈ Rh×w×cWhere h and w are the height and width of the feature map, respectively, and c is the channelObtaining a feature map by 2 × 2 non-overlapping maximum poolingWherein, the value of the pixel point (i, j) is shown as the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m)i,nj) The following formula shows:
4. the image segmentation method for enhancing edge and detail information by feature fusion according to claim 3, wherein: step 3, the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolution layer in the decoding structure is followed by a Batch normalization and a ReLU activation function;
the value of each pixel in the obtained sparse feature map upsampling5 is shown as follows:
wherein Z isu,vIs the pixel value of the pixel point (u, v) in the sparse feature map upsampling 5.
5. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 1, wherein: and 4, the fusion process is to add the pixel values of the corresponding positions in the feature map.
6. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 4, wherein: the specific method of the step 5 comprises the following steps:
step 5.1: performing second feature fusion through a decoding structure to restore image information;
step 5.1.1: performing 2-time upsampling on conv _ decode5 by using a maximum pooling index stored when the pooling feature map pool _4 is generated to obtain a sparse feature map upsampling 4;
step 5.1.2: fusing the sparse feature map upsampling4 with the convolutional feature maps conv _4_1, conv _4_2 and pooling feature map pool _3 with the same resolution extracted from the coding structure to obtain a fused feature map F2;
Step 5.1.3: fusing the feature maps F2Inputting the data into a second three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 4;
step 5.2: carrying out third-time feature fusion through a decoding structure to restore image information;
step 5.2.1: performing 2-time upsampling on the feature map conv _ decode4 by using the maximum pooling index stored when the pooling feature map pool _3 is generated to obtain a sparse feature map upsampling 3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3, the convolutional feature maps conv3_1 and conv3_2 extracted from the coding structure and having the same resolution, and the pooled feature map pool _2 to obtain a fusion feature map F3;
Step 5.2.3: fusing the feature maps F3Inputting the data into a third three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 3;
step 5.3: performing fourth feature fusion through a decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time upsampling on the conv _ decode3 of the feature map by using the maximum pooling index stored when the pooling feature map pool _2 is generated to obtain a sparse feature map upsampling 2;
step 5.3.2: performing feature fusion on the sparse feature map upsampling2, the convolution feature map conv _2_1 and the pooling feature map pool _1 to obtain a fusion feature map F4;
Step 5.3.3: according to the symmetry of the SegNet network, fusing the feature diagram F4Inputting the data into a first two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 2;
step 5.4: performing fifth feature fusion through a decoding structure to recover edge information of the image;
step 5.4.1: performing 2-time upsampling on the feature map conv _ decode2 by using the maximum pooling index stored when the pooling feature map pool _1 is generated to obtain a sparse feature map upsampling 1;
step 5.4.2: performing feature fusion on the sparse feature map upsampling1 and the convolution feature map conv _1_1 to obtain a fusion feature map F5;
Step 5.4.3: fusing the feature maps F5And inputting the data into a second two-layer convolution structure to carry out convolution operation, and obtaining a dense feature map conv _ decode 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911094462.3A CN111028235B (en) | 2019-11-11 | 2019-11-11 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911094462.3A CN111028235B (en) | 2019-11-11 | 2019-11-11 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111028235A true CN111028235A (en) | 2020-04-17 |
CN111028235B CN111028235B (en) | 2023-08-22 |
Family
ID=70205321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911094462.3A Active CN111028235B (en) | 2019-11-11 | 2019-11-11 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111028235B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582111A (en) * | 2020-04-29 | 2020-08-25 | 电子科技大学 | Cell component segmentation method based on semantic segmentation |
CN111666842A (en) * | 2020-05-25 | 2020-09-15 | 东华大学 | Shadow detection method based on double-current-cavity convolution neural network |
CN111784642A (en) * | 2020-06-10 | 2020-10-16 | 中铁四局集团有限公司 | Image processing method, target recognition model training method and target recognition method |
CN113052159A (en) * | 2021-04-14 | 2021-06-29 | ***通信集团陕西有限公司 | Image identification method, device, equipment and computer storage medium |
CN113192200A (en) * | 2021-04-26 | 2021-07-30 | 泰瑞数创科技(北京)有限公司 | Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm |
CN113280820A (en) * | 2021-06-09 | 2021-08-20 | 华南农业大学 | Orchard visual navigation path extraction method and system based on neural network |
CN113496453A (en) * | 2021-06-29 | 2021-10-12 | 上海电力大学 | Anti-network image steganography method based on multi-level feature fusion |
CN113724269A (en) * | 2021-08-12 | 2021-11-30 | 浙江大华技术股份有限公司 | Example segmentation method, training method of example segmentation network and related equipment |
CN115828079A (en) * | 2022-04-20 | 2023-03-21 | 北京爱芯科技有限公司 | Method and device for maximum pooling operation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
CN109903292A (en) * | 2019-01-24 | 2019-06-18 | 西安交通大学 | A kind of three-dimensional image segmentation method and system based on full convolutional neural networks |
CN110264483A (en) * | 2019-06-19 | 2019-09-20 | 东北大学 | A kind of semantic image dividing method based on deep learning |
-
2019
- 2019-11-11 CN CN201911094462.3A patent/CN111028235B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
CN109903292A (en) * | 2019-01-24 | 2019-06-18 | 西安交通大学 | A kind of three-dimensional image segmentation method and system based on full convolutional neural networks |
CN110264483A (en) * | 2019-06-19 | 2019-09-20 | 东北大学 | A kind of semantic image dividing method based on deep learning |
Non-Patent Citations (1)
Title |
---|
肖朝霞 等: "图像语义分割问题研究综述", 软件导刊 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582111B (en) * | 2020-04-29 | 2022-04-29 | 电子科技大学 | Cell component segmentation method based on semantic segmentation |
CN111582111A (en) * | 2020-04-29 | 2020-08-25 | 电子科技大学 | Cell component segmentation method based on semantic segmentation |
CN111666842A (en) * | 2020-05-25 | 2020-09-15 | 东华大学 | Shadow detection method based on double-current-cavity convolution neural network |
CN111666842B (en) * | 2020-05-25 | 2022-08-26 | 东华大学 | Shadow detection method based on double-current-cavity convolution neural network |
CN111784642A (en) * | 2020-06-10 | 2020-10-16 | 中铁四局集团有限公司 | Image processing method, target recognition model training method and target recognition method |
CN113052159A (en) * | 2021-04-14 | 2021-06-29 | ***通信集团陕西有限公司 | Image identification method, device, equipment and computer storage medium |
CN113052159B (en) * | 2021-04-14 | 2024-06-07 | ***通信集团陕西有限公司 | Image recognition method, device, equipment and computer storage medium |
CN113192200A (en) * | 2021-04-26 | 2021-07-30 | 泰瑞数创科技(北京)有限公司 | Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm |
CN113280820A (en) * | 2021-06-09 | 2021-08-20 | 华南农业大学 | Orchard visual navigation path extraction method and system based on neural network |
CN113280820B (en) * | 2021-06-09 | 2022-11-29 | 华南农业大学 | Orchard visual navigation path extraction method and system based on neural network |
CN113496453A (en) * | 2021-06-29 | 2021-10-12 | 上海电力大学 | Anti-network image steganography method based on multi-level feature fusion |
CN113724269A (en) * | 2021-08-12 | 2021-11-30 | 浙江大华技术股份有限公司 | Example segmentation method, training method of example segmentation network and related equipment |
CN115828079A (en) * | 2022-04-20 | 2023-03-21 | 北京爱芯科技有限公司 | Method and device for maximum pooling operation |
CN115828079B (en) * | 2022-04-20 | 2023-08-11 | 北京爱芯科技有限公司 | Method and device for maximum pooling operation |
Also Published As
Publication number | Publication date |
---|---|
CN111028235B (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111028235B (en) | Image segmentation method for enhancing edge and detail information by utilizing feature fusion | |
Anwar et al. | Image colorization: A survey and dataset | |
CN107644006B (en) | Automatic generation method of handwritten Chinese character library based on deep neural network | |
CN108830855B (en) | Full convolution network semantic segmentation method based on multi-scale low-level feature fusion | |
CN111028177B (en) | Edge-based deep learning image motion blur removing method | |
CN109087258B (en) | Deep learning-based image rain removing method and device | |
CN108647560B (en) | CNN-based face transfer method for keeping expression information | |
CN110276354B (en) | High-resolution streetscape picture semantic segmentation training and real-time segmentation method | |
CN113408471B (en) | Non-green-curtain portrait real-time matting algorithm based on multitask deep learning | |
CN113569865B (en) | Single sample image segmentation method based on class prototype learning | |
CN111915627A (en) | Semantic segmentation method, network, device and computer storage medium | |
CN110689599A (en) | 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement | |
CN114936605A (en) | Knowledge distillation-based neural network training method, device and storage medium | |
WO2023212997A1 (en) | Knowledge distillation based neural network training method, device, and storage medium | |
CN113689434B (en) | Image semantic segmentation method based on strip pooling | |
CN113066025B (en) | Image defogging method based on incremental learning and feature and attention transfer | |
CN112950477A (en) | High-resolution saliency target detection method based on dual-path processing | |
CN113888547A (en) | Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN112270366B (en) | Micro target detection method based on self-adaptive multi-feature fusion | |
CN111833360B (en) | Image processing method, device, equipment and computer readable storage medium | |
WO2020043296A1 (en) | Device and method for separating a picture into foreground and background using deep learning | |
CN115984747A (en) | Video saliency target detection method based on dynamic filter | |
CN113139551A (en) | Improved semantic segmentation method based on deep Labv3+ | |
CN114022497A (en) | Image processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |