CN111028235A - Image segmentation method for enhancing edge and detail information by utilizing feature fusion - Google Patents

Image segmentation method for enhancing edge and detail information by utilizing feature fusion Download PDF

Info

Publication number
CN111028235A
CN111028235A CN201911094462.3A CN201911094462A CN111028235A CN 111028235 A CN111028235 A CN 111028235A CN 201911094462 A CN201911094462 A CN 201911094462A CN 111028235 A CN111028235 A CN 111028235A
Authority
CN
China
Prior art keywords
feature map
feature
fusion
conv
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911094462.3A
Other languages
Chinese (zh)
Other versions
CN111028235B (en
Inventor
朱和贵
苗艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201911094462.3A priority Critical patent/CN111028235B/en
Publication of CN111028235A publication Critical patent/CN111028235A/en
Application granted granted Critical
Publication of CN111028235B publication Critical patent/CN111028235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image segmentation method for enhancing edge and detail information by utilizing feature fusion, and relates to the technical field of computer vision. The method utilizes a convolution neural network to extract the characteristics of an input image; inputting the extracted features into a decoding structure added with more feature fusion, and enriching edge and detail information while restoring the image resolution to obtain a dense feature map; outputting the maximum values of different classifications through a normalization method; and calculating a cross entropy loss function, and updating the weight in the network by using a random gradient descent method. The method can restore the position and boundary detail information lost in the encoding stage while restoring the resolution of the feature map, enriches the information of the picture, obtains the dense feature map, makes up the sparse feature map brought by direct up-sampling, makes the boundary and the detail of the segmentation clearer, and improves the effect of segmenting the detailed small objects.

Description

Image segmentation method for enhancing edge and detail information by utilizing feature fusion
Technical Field
The invention relates to the technical field of computer vision, in particular to an image segmentation method for enhancing edge and detail information by utilizing feature fusion.
Background
With the continuous progress of scientific technology and the rapid development of national economy, artificial intelligence gradually enters the visual field of people, plays an increasingly important role in the production and life of human beings, is widely applied in various fields, is an important research direction of artificial intelligence, is a very important means for realizing automatic scene understanding, and can be applied to many fields such as automatic driving systems, unmanned vehicles, application and the like.
The image semantic segmentation technology is an important branch of the computer vision field in machine learning, and is used for processing an input image, automatically segmenting and identifying the content in the image. Before applying deep learning to the field of computer vision, classifiers that build semantic segmentation of images are usually based on the use of a texton forest, or a random forest. With the appearance and the vigorous development of the deep convolutional neural network, an effective method is provided for semantic segmentation, the CNN is applied to the semantic segmentation to make a good progress, the development of the semantic segmentation is promoted, and the application of the CNN in various fields makes a remarkable result.
After deep learning is applied to semantic segmentation, many classical segmentation methods appear, such as a Full Convolution Network (FCN), a segNet network with an encoder-decoder structure and a deep Lab with a hole convolution, but as the hierarchy of a CNN network is deepened, continuous pooling and downsampling can cause position information and boundary detail information of a picture to be lost, the process is irreversible, and the removed information cannot be completely recovered, so that a feature map sampled in a decoding stage becomes sparse due to information loss, and the methods have certain limitations.
The position and edge details of the full convolution network FCN and the traditional SegNet network are lost due to down sampling, the information lost during up sampling in the decoding stage is not reproduced, the obtained characteristic diagram is sparse, and although the SegNet network recovers the position information through the pooling index and enriches the boundary and detail information by using convolution operation, a large amount of information is lost.
The hole convolution is a convolution layer capable of obtaining a dense feature map, but the calculation cost of using the hole convolution is high, and a large amount of memory is occupied when a large amount of high-resolution feature maps are processed.
The existing image semantic segmentation method generally has the problems that the retention of edge detail features and position information still needs to be further improved, and the segmentation accuracy also needs to be improved.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide an image segmentation method for enhancing edge and detail information by feature fusion, so as to realize the segmentation of images.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an image segmentation method for enhancing edge and detail information by using feature fusion comprises the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: zooming and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360 × 480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as that of a SegNet network, a front 13 layer of VGG-16 is adopted, and a maximum pooling index is added during pooling to remember the maximum value of pixels in an image and the positions of the pixels;
the convolutional kernel size of each convolutional layer of the coding structure is 3 × 3, and the signature graph after each convolutional layer is conv _ i _ j, where i is 1,2,3,4,5, j is 1,2 when i is 1,2, and j is 1,2,3 when i is 3,4, 5; meanwhile, each convolution layer is followed by the Batch normalization and the ReLU activation functions; adding a maximum pooling index into each pooling layer, realizing down-sampling by using 2 × 2 non-overlapping maximum pooling, and keeping the position of the maximum pixel value through the maximum pooling index, wherein a feature map obtained by each pooling layer is represented by pool _ r, wherein r is 1,2,3,4, 5;
the specific method for memorizing the maximum value of the pixel in the image and the position of the pixel by adding the maximum pooling index during pooling comprises the following steps:
for an input profile X ∈ Rh×w×cWherein h and w are each independentlyHeight and width of the feature map, c is the number of channels, and the feature map is obtained by 2 x 2 non-overlapping maximum pooling
Figure BDA0002267872380000021
Wherein, the value of the pixel point (i, j) is shown as the following formula:
Figure BDA0002267872380000022
the position corresponding to the maximum value of the pixel point is recorded as (m)i,nj) The following formula shows:
Figure BDA0002267872380000023
and step 3: inputting the pooled feature map pool _5 obtained by the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in the original position by using the maximum pooled index, and filling the rest positions with 0 to realize 2 times of upsampling to obtain a sparse feature map upsampling 5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolution layer in the decoding structure is followed by a Batch normalization and a ReLU activation function;
the obtained sparse feature map upsampling5 has a value of each pixel as shown in the following formula:
Figure BDA0002267872380000031
wherein Z isu,vThe pixel value of a pixel point (u, v) in the sparse feature map upsampling5 is obtained;
and 4, step 4: performing feature fusion operation once through a decoding structure, fusing the sparse feature map upsampling5 with the convolution feature maps conv _5_1 and conv _5_2, and fusing the feature map obtained by fusion with the pooled feature map pool _4 with the corresponding size to obtain a fusion feature map F1
The fusion process is that the pixel values of the corresponding positions in the feature map are added;
fusing the feature maps F1Inputting the data into a first three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode5, and making up for information loss caused by pooling and downsampling;
and 5: performing feature fusion operation for four times through a decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operation until the resolution of the feature map is restored to the original size;
step 5.1: performing second feature fusion through a decoding structure to restore image information;
step 5.1.1: performing 2-time upsampling on conv _ decode5 by using a maximum pooling index stored when the pooling feature map pool _4 is generated to obtain a sparse feature map upsampling 4;
step 5.1.2: fusing the sparse feature map upsampling4 with the convolutional feature maps conv _4_1, conv _4_2 and pooling feature map pool _3 with the same resolution extracted from the coding structure to obtain a fused feature map F2
Step 5.1.3: fusing the feature maps F2Inputting the data into a second three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 4;
step 5.2: carrying out third-time feature fusion through a decoding structure to restore image information;
step 5.2.1: performing 2-time upsampling on the feature map conv _ decode4 by using the maximum pooling index stored when the pooling feature map pool _3 is generated to obtain a sparse feature map upsampling 3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3, the convolutional feature maps conv3_1 and conv3_2 extracted from the coding structure and having the same resolution, and the pooled feature map pool _2 to obtain a fusion feature map F3
Step 5.2.3: fusing the feature maps F3Inputting the data into a third three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 3;
step 5.3: performing fourth feature fusion through a decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time upsampling on the conv _ decode3 of the feature map by using the maximum pooling index stored when the pooling feature map pool _2 is generated to obtain a sparse feature map upsampling 2;
step 5.3.2: performing feature fusion on the sparse feature map upsampling2, the convolution feature map conv _2_1 and the pooling feature map pool _1 to obtain a fusion feature map F4
Step 5.3.3: according to the symmetry of the SegNet network, fusing the feature diagram F4Inputting the data into a first two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 2;
step 5.4: performing fifth feature fusion through a decoding structure to recover edge information of the image;
step 5.4.1: performing 2-time upsampling on the feature map conv _ decode2 by using the maximum pooling index stored when the pooling feature map pool _1 is generated to obtain a sparse feature map upsampling 1;
step 5.4.2: performing feature fusion on the sparse feature map upsampling1 and the convolution feature map conv _1_1 to obtain a fusion feature map F5
Step 5.4.3: fusing the feature maps F5Inputting the data into a second two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 1;
step 6: inputting the dense feature map conv _ decode1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
and 7: and calculating a cross entropy loss function through the maximum probability of pixel classification in the image, and updating convolution kernel parameters of each convolution layer and each pooling layer in the coding structure and the decoding structure through a random gradient descent method to realize image segmentation.
The technical principle of the method is as follows: on the basis of an original SegNet network, a decoding stage is improved, the resolution of a feature map is restored, and meanwhile, image position and boundary detail information are restored, so that a dense feature map is obtained; the method is characterized in that features of an image are extracted by utilizing a convolutional layer and a pooling layer in a coding structure, different scales of information are extracted by the convolutional layer and the pooling layer at different depths, global low-level semantic information such as edges, directions, textures, chroma and the like is extracted by a shallow layer structure, local high-level semantic information such as the shape of an object is extracted by a deep layer structure, the features extracted deeper the network level are more abstract, and in order to extract more abstract high-level features, the maximum pooling instead of average pooling is selected by the model in the coding structure.
Since the maximum pixel value extracted from the feature map and the position thereof are important, not only edge detail information is lost during pooling, but also position information is lost due to the reduction of the resolution of the feature map, a pooling index is added into the coding structure to remember the position of the maximum pixel value, the maximum pixel value is released in the original position by the decoding structure through the pooling index, and the rest positions are filled with 0, so that 2-time upsampling can be realized, important position information can be recovered, and errors are reduced.
However, as the network hierarchy of the decoding structure is deepened, the extracted features are more and more abstract, a lot of edge detail information is lost, information with different scales is lost in each layer, the rest of the positions of the feature diagram obtained after upsampling in the decoding structure except the maximum value are all 0, the obtained feature diagram is sparse, the lost information is not reproduced in the feature diagram obtained after upsampling, so that feature fusion is added into the decoding structure to recover the information, and the sparse feature diagram obtained after each upsampling is superposed with the feature diagram obtained after convolution and pooling of the corresponding size in the encoding stage. In this way, each feature map after up-sampling is input into the fusion structure, the information lost in the coding stage is gradually recovered, and then the fusion result is input into the convolution layer to further enrich the information, so that a denser feature map is obtained, the segmentation effect is better, and the accuracy is higher.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the image segmentation method for enhancing the edge and detail information by utilizing feature fusion can restore the position and edge detail information lost in the encoding stage while restoring the resolution of the feature map, enrich the information of the image, obtain the dense feature map, make up for the sparse feature map brought by direct up-sampling, make the segmented edge and detail clearer, improve the segmentation effect on fine and small objects in detail, and improve the average segmentation precision and mIOU.
Drawings
Fig. 1 is a flowchart of an image segmentation method for enhancing edge and detail information by feature fusion according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In this embodiment, an image segmentation method for enhancing edge and detail information by feature fusion, as shown in fig. 1, includes the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: zooming and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360 × 480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as that of a SegNet network, a front 13 layer of VGG-16 is adopted, and a maximum pooling index is added during pooling to remember the maximum value of pixels in an image and the positions of the pixels;
the convolution kernel size of each convolution layer of the coding structure is 3 × 3, which ensures that the image size is unchanged, and the feature map after each convolution layer is denoted as conv _ i _ j, where i is 1,2,3,4,5, j is 1,2 when i is 1,2, and j is 1,2,3 when i is 3,4, 5; meanwhile, each convolution layer is followed by the Batch normalization and the ReLU activation functions; the BatchNormal is used for accelerating the convergence speed of the model and relieving the gradient dispersion problem in the deep network to a certain extent, so that the deep network model is easier and more stable to train; the ReLU activation function is selected, so that gradient disappearance can be solved, and overfitting of the network is relieved; adding a maximum pooling index into each pooling layer, realizing down-sampling by using 2 × 2 non-overlapping maximum pooling, and keeping the position of the maximum pixel value through the maximum pooling index, wherein a feature map obtained by each pooling layer is represented by pool _ r, wherein r is 1,2,3,4, 5;
the coding structure uses the front 13 layers of VGG-16 to extract the characteristics of pictures, and uses the convolution layer and the pooling layer to extract the image characteristics with different scales, the front 4 layers of the structure can be regarded as a shallow layer structure, the obtained low-level semantic information is low, the back 9 layers can be regarded as a deep layer structure, the obtained high-level abstract information is high, and the characteristics with different scales can be obtained through the coding structure;
for an input profile X ∈ Rh×w×cH and w are height and width of the characteristic diagram respectively, c is channel number, and the characteristic diagram is obtained by 2 multiplied by 2 non-overlapping maximum pooling
Figure BDA0002267872380000061
Wherein, the value of the pixel point (i, j) is shown as the following formula:
Figure BDA0002267872380000062
the position corresponding to the maximum value of the pixel point is recorded as (m)i,nj) The following formula shows:
Figure BDA0002267872380000063
and step 3: inputting the pooled feature map pool _5 obtained by the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in the original position by using the maximum pooled index, and filling the rest positions with 0 to realize 2 times of upsampling to obtain a sparse feature map upsampling 5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolution layer in the decoding structure is followed by a Batch normalization and a ReLU activation function;
the value of each pixel in the obtained sparse feature map upsampling5 is shown as follows:
Figure BDA0002267872380000064
wherein Z isu,vIs the pixel value of the pixel point (u, v) in the sparse feature map upsampling 5.
And 4, step 4: as the feature map obtained by up-sampling is sparse, the feature fusion operation is carried out once through a decoding structure; convolutional feature maps extracted from the coding structure and having the same resolution as the sparse feature map upsampling5 include conv _5_1, conv _5_2 and conv _5_3, and as pool _5 is obtained by directly pooling conv _5_3, a part of information is recovered in the process of 2 times of upsampling, and in order to reduce the training parameters of the model, only the sparse feature map upsampling5 is fused with the convolutional feature maps conv _5_1 and conv _5_2, and the fused feature map is fused with the pooled feature map pool _4 with the corresponding size, so that a fused feature map F is obtained1
The fusion process is that the pixel values of the corresponding positions in the feature map are added;
in order to maintain the symmetry of the original SegNet network, a feature map F is fused1Inputting the data into a first three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode5, further enriching the information of the picture and making up the information loss caused by pooling and downsampling;
step 4 is equivalent to the first feature fusion operation, five times of feature fusion are required in the decoding process of the method, and the method is divided into three different fusion forms according to the difference of the up-sampling depth, wherein the first three fusion forms are the same, and the next four feature fusion forms are required.
And 5: performing feature fusion operation for four times through a decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operation until the resolution of the feature map is restored to the original size to obtain a dense feature map conv _ decode 1;
step 5.1: performing second feature fusion through a decoding structure to restore image information;
step 5.1.1: after the step 4, the resolution of the conv _ decode5 of the feature map is the same as that of the pooled feature map pool _4, and 2 times of upsampling is performed on the conv _ decode5 by using the maximum pooled index stored when the pooled feature map pool _4 is generated, so that a sparse feature map upsampling4 is obtained;
step 5.1.2: fusing the sparse feature map upsampling4 with the convolutional feature maps conv _4_1, conv _4_2 and pooling feature map pool _3 with the same resolution extracted from the coding structure to obtain a fused feature map F2
Step 5.1.3: fusing the feature maps F2Inputting the data into a second three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 4;
step 5.2: carrying out third-time feature fusion through a decoding structure to restore image information;
step 5.2.1: performing 2-time upsampling on the feature map conv _ decode4 by using the maximum pooling index stored when the pooling feature map pool _3 is generated to obtain a sparse feature map upsampling 3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3, the convolutional feature maps conv3_1 and conv3_2 extracted from the coding structure and having the same resolution, and the pooled feature map pool _2 to obtain a fusion feature map F3
Step 5.2.3: fusing the feature maps F3Inputting the data into a third three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 3;
the first three times of feature fusion are coding feature maps corresponding to three stages, have the same fusion structure, and the feature maps participating in the fusion have lower resolution and have local abstract features, so that the local abstract features are recovered by using the same fusion form.
Step 5.3: performing fourth feature fusion through a decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time upsampling on the conv _ decode3 of the feature map by using the maximum pooling index stored when the pooling feature map pool _2 is generated to obtain a sparse feature map upsampling 2;
step 5.3.2: since the resolution of the feature map is restored to that of the original image after step 5.3.1
Figure BDA0002267872380000071
At this time, the corresponding feature maps comprise conv _2_1, conv _2_2 and pool _1, and in order to reduce the parameters of model training, only the sparse feature map upsampling2 is subjected to feature fusion with the convolution feature map conv _2_1 and the pooling feature map pool _1 to obtain a fusion feature map F4
Step 5.3.3: according to the symmetry of the SegNet network, fusing the feature diagram F4Inputting the data into a first two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 2;
different from the feature fusion of the first three times, the feature fusion of the time corresponds to the coding feature maps of the two stages and is used for recovering detailed information, so the fusion forms are different;
step 5.4: performing fifth feature fusion through a decoding structure to recover edge information of the image;
step 5.4.1: performing 2-time upsampling on the feature map conv _ decode2 by using the maximum pooling index stored when the pooling feature map pool _1 is generated to obtain a sparse feature map upsampling 1;
step 5.4.2: since the resolution of the feature map is restored to the original size after step 5.4.1, and there are convolution features conv _1_1 and conv _1_2 with the feature map with the same resolution obtained by the coding structure, in order to reduce the parameters of the model training, only the sparse feature map upsampling1 and the convolution feature map conv _1_1 are subjected to feature fusion to obtain a fusion feature map F5
Step 5.4.3: fusing the feature maps F5Inputting the data into a second two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 1;
the feature fusion only has one stage of coding feature graph participating in the fusion and is used for recovering edge information.
Step 6: inputting the dense feature map conv _ decode1 to the Softmax layer results in the maximum probability of pixel classification in the image.
And 7: and calculating a cross entropy loss function through the maximum probability of pixel classification in the image, and updating convolution kernel parameters of each convolution layer and each pooling layer in the coding structure and the decoding structure through a random gradient descent method to realize image segmentation.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (6)

1. An image segmentation method for enhancing edge and detail information by using feature fusion is characterized in that: the method comprises the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as that of a SegNet network, a front 13 layer of VGG-16 is adopted, and a maximum pooling index is added during pooling to remember the maximum value of pixels in an image and the positions of the pixels;
the convolutional kernel size of each convolutional layer of the coding structure is 3 × 3, and the feature map after each convolutional layer is denoted as conv _ i _ j, where i is 1,2,3,4,5, j is 1,2 when i is 1,2, and j is 1,2,3 when i is 3,4, 5; meanwhile, each convolution layer is followed by the Batch normalization and the ReLU activation functions; adding a maximum pooling index into each pooling layer, realizing down-sampling by using 2 × 2 non-overlapping maximum pooling, and keeping the position of the maximum pixel value through the maximum pooling index, wherein a feature map obtained by each pooling layer is represented by pool _ r, wherein r is 1,2,3,4, 5;
and step 3: inputting the pooled feature map pool _5 obtained by the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in the original position by using the maximum pooled index, and filling the rest positions with 0 to realize 2 times of upsampling to obtain a sparse feature map upsampling 5;
and 4, step 4: performing feature fusion operation once through a decoding structure, fusing the sparse feature map upsampling5 with the convolution feature maps conv _5_1 and conv _5_2, and fusing the feature map obtained by fusion with the pooled feature map pool _4 with the corresponding size to obtain a fusion feature map F1
Fusing the feature maps F1Inputting the data into a first three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode5, and making up for information loss caused by pooling and downsampling;
and 5: performing feature fusion operation for four times through a decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operation until the resolution of the feature map is restored to the original size to obtain a dense feature map conv _ decode 1;
step 6: inputting the dense feature map conv _ decode1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
and 7: and calculating a cross entropy loss function through the maximum probability of pixel classification in the image, and updating convolution kernel parameters of each convolution layer and each pooling layer in the coding structure and the decoding structure through a random gradient descent method to realize image segmentation.
2. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 1, wherein: the specific method of the step 1 comprises the following steps:
step 1.1: zooming and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: the resolution of the input image is fixed to 360 × 480.
3. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 1, wherein: step 2, adding a maximum pooling index during pooling to remember the maximum value of the pixel in the image and the position of the pixel in the image comprises the following specific steps:
for an input profile X ∈ Rh×w×cWhere h and w are the height and width of the feature map, respectively, and c is the channelObtaining a feature map by 2 × 2 non-overlapping maximum pooling
Figure FDA0002267872370000021
Wherein, the value of the pixel point (i, j) is shown as the following formula:
Figure FDA0002267872370000022
the position corresponding to the maximum value of the pixel point is recorded as (m)i,nj) The following formula shows:
Figure FDA0002267872370000023
4. the image segmentation method for enhancing edge and detail information by feature fusion according to claim 3, wherein: step 3, the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolution layer in the decoding structure is followed by a Batch normalization and a ReLU activation function;
the value of each pixel in the obtained sparse feature map upsampling5 is shown as follows:
Figure FDA0002267872370000024
wherein Z isu,vIs the pixel value of the pixel point (u, v) in the sparse feature map upsampling 5.
5. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 1, wherein: and 4, the fusion process is to add the pixel values of the corresponding positions in the feature map.
6. The image segmentation method for enhancing edge and detail information by feature fusion according to claim 4, wherein: the specific method of the step 5 comprises the following steps:
step 5.1: performing second feature fusion through a decoding structure to restore image information;
step 5.1.1: performing 2-time upsampling on conv _ decode5 by using a maximum pooling index stored when the pooling feature map pool _4 is generated to obtain a sparse feature map upsampling 4;
step 5.1.2: fusing the sparse feature map upsampling4 with the convolutional feature maps conv _4_1, conv _4_2 and pooling feature map pool _3 with the same resolution extracted from the coding structure to obtain a fused feature map F2
Step 5.1.3: fusing the feature maps F2Inputting the data into a second three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 4;
step 5.2: carrying out third-time feature fusion through a decoding structure to restore image information;
step 5.2.1: performing 2-time upsampling on the feature map conv _ decode4 by using the maximum pooling index stored when the pooling feature map pool _3 is generated to obtain a sparse feature map upsampling 3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3, the convolutional feature maps conv3_1 and conv3_2 extracted from the coding structure and having the same resolution, and the pooled feature map pool _2 to obtain a fusion feature map F3
Step 5.2.3: fusing the feature maps F3Inputting the data into a third three-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 3;
step 5.3: performing fourth feature fusion through a decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time upsampling on the conv _ decode3 of the feature map by using the maximum pooling index stored when the pooling feature map pool _2 is generated to obtain a sparse feature map upsampling 2;
step 5.3.2: performing feature fusion on the sparse feature map upsampling2, the convolution feature map conv _2_1 and the pooling feature map pool _1 to obtain a fusion feature map F4
Step 5.3.3: according to the symmetry of the SegNet network, fusing the feature diagram F4Inputting the data into a first two-layer convolution structure to carry out convolution operation to obtain a dense feature map conv _ decode 2;
step 5.4: performing fifth feature fusion through a decoding structure to recover edge information of the image;
step 5.4.1: performing 2-time upsampling on the feature map conv _ decode2 by using the maximum pooling index stored when the pooling feature map pool _1 is generated to obtain a sparse feature map upsampling 1;
step 5.4.2: performing feature fusion on the sparse feature map upsampling1 and the convolution feature map conv _1_1 to obtain a fusion feature map F5
Step 5.4.3: fusing the feature maps F5And inputting the data into a second two-layer convolution structure to carry out convolution operation, and obtaining a dense feature map conv _ decode 1.
CN201911094462.3A 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion Active CN111028235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911094462.3A CN111028235B (en) 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911094462.3A CN111028235B (en) 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion

Publications (2)

Publication Number Publication Date
CN111028235A true CN111028235A (en) 2020-04-17
CN111028235B CN111028235B (en) 2023-08-22

Family

ID=70205321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911094462.3A Active CN111028235B (en) 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion

Country Status (1)

Country Link
CN (1) CN111028235B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582111A (en) * 2020-04-29 2020-08-25 电子科技大学 Cell component segmentation method based on semantic segmentation
CN111666842A (en) * 2020-05-25 2020-09-15 东华大学 Shadow detection method based on double-current-cavity convolution neural network
CN111784642A (en) * 2020-06-10 2020-10-16 中铁四局集团有限公司 Image processing method, target recognition model training method and target recognition method
CN113052159A (en) * 2021-04-14 2021-06-29 ***通信集团陕西有限公司 Image identification method, device, equipment and computer storage medium
CN113192200A (en) * 2021-04-26 2021-07-30 泰瑞数创科技(北京)有限公司 Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm
CN113280820A (en) * 2021-06-09 2021-08-20 华南农业大学 Orchard visual navigation path extraction method and system based on neural network
CN113496453A (en) * 2021-06-29 2021-10-12 上海电力大学 Anti-network image steganography method based on multi-level feature fusion
CN113724269A (en) * 2021-08-12 2021-11-30 浙江大华技术股份有限公司 Example segmentation method, training method of example segmentation network and related equipment
CN115828079A (en) * 2022-04-20 2023-03-21 北京爱芯科技有限公司 Method and device for maximum pooling operation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110264483A (en) * 2019-06-19 2019-09-20 东北大学 A kind of semantic image dividing method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110264483A (en) * 2019-06-19 2019-09-20 东北大学 A kind of semantic image dividing method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖朝霞 等: "图像语义分割问题研究综述", 软件导刊 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582111B (en) * 2020-04-29 2022-04-29 电子科技大学 Cell component segmentation method based on semantic segmentation
CN111582111A (en) * 2020-04-29 2020-08-25 电子科技大学 Cell component segmentation method based on semantic segmentation
CN111666842A (en) * 2020-05-25 2020-09-15 东华大学 Shadow detection method based on double-current-cavity convolution neural network
CN111666842B (en) * 2020-05-25 2022-08-26 东华大学 Shadow detection method based on double-current-cavity convolution neural network
CN111784642A (en) * 2020-06-10 2020-10-16 中铁四局集团有限公司 Image processing method, target recognition model training method and target recognition method
CN113052159A (en) * 2021-04-14 2021-06-29 ***通信集团陕西有限公司 Image identification method, device, equipment and computer storage medium
CN113052159B (en) * 2021-04-14 2024-06-07 ***通信集团陕西有限公司 Image recognition method, device, equipment and computer storage medium
CN113192200A (en) * 2021-04-26 2021-07-30 泰瑞数创科技(北京)有限公司 Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm
CN113280820A (en) * 2021-06-09 2021-08-20 华南农业大学 Orchard visual navigation path extraction method and system based on neural network
CN113280820B (en) * 2021-06-09 2022-11-29 华南农业大学 Orchard visual navigation path extraction method and system based on neural network
CN113496453A (en) * 2021-06-29 2021-10-12 上海电力大学 Anti-network image steganography method based on multi-level feature fusion
CN113724269A (en) * 2021-08-12 2021-11-30 浙江大华技术股份有限公司 Example segmentation method, training method of example segmentation network and related equipment
CN115828079A (en) * 2022-04-20 2023-03-21 北京爱芯科技有限公司 Method and device for maximum pooling operation
CN115828079B (en) * 2022-04-20 2023-08-11 北京爱芯科技有限公司 Method and device for maximum pooling operation

Also Published As

Publication number Publication date
CN111028235B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN111028235B (en) Image segmentation method for enhancing edge and detail information by utilizing feature fusion
Anwar et al. Image colorization: A survey and dataset
CN107644006B (en) Automatic generation method of handwritten Chinese character library based on deep neural network
CN108830855B (en) Full convolution network semantic segmentation method based on multi-scale low-level feature fusion
CN111028177B (en) Edge-based deep learning image motion blur removing method
CN109087258B (en) Deep learning-based image rain removing method and device
CN108647560B (en) CNN-based face transfer method for keeping expression information
CN110276354B (en) High-resolution streetscape picture semantic segmentation training and real-time segmentation method
CN113408471B (en) Non-green-curtain portrait real-time matting algorithm based on multitask deep learning
CN113569865B (en) Single sample image segmentation method based on class prototype learning
CN111915627A (en) Semantic segmentation method, network, device and computer storage medium
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
WO2023212997A1 (en) Knowledge distillation based neural network training method, device, and storage medium
CN113689434B (en) Image semantic segmentation method based on strip pooling
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN112950477A (en) High-resolution saliency target detection method based on dual-path processing
CN113888547A (en) Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN112270366B (en) Micro target detection method based on self-adaptive multi-feature fusion
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
WO2020043296A1 (en) Device and method for separating a picture into foreground and background using deep learning
CN115984747A (en) Video saliency target detection method based on dynamic filter
CN113139551A (en) Improved semantic segmentation method based on deep Labv3+
CN114022497A (en) Image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant