CN116629322B

CN116629322B - Segmentation method of complex morphological target

Info

Publication number: CN116629322B
Application number: CN202310919327.8A
Authority: CN
Inventors: 马千里; 李棋
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2023-11-10
Anticipated expiration: 2043-07-26
Also published as: CN116629322A

Abstract

The invention provides a segmentation method of a complex morphological target, which comprises the following steps: an example segmentation data set is manufactured, and the data set is divided into a training set and a verification set; inputting the training set into an improved YOLOv5 example segmentation network for iterative training; detecting each iteration model by using a verification set, and judging convergence; and finally, outputting the instance segmentation network model after the convergence of the iteration model. Compared with the prior art, the method can realize image-based instance segmentation, achieves detection of the morphological information and the positioning information of the complex morphological target, improves the detection precision of the complex morphological target, and simultaneously keeps a good detection speed.

Description

Segmentation method of complex morphological target

Technical Field

The invention relates to a segmentation method of a complex morphological target, and belongs to the technical field of image processing.

Background

In most living scenes, detection work needs to be carried out on targets of different categories according to different requirements, category, positioning and form information of the targets are conveniently obtained, and the required functions are achieved by carrying out subsequent processing. In order to meet the requirements, the research is divided into image processing, machine learning and deep learning methods, wherein the difficulty of an image processing algorithm is high, and generalization is weak; machine learning is more difficult for feature extraction; the convolutional neural network in the deep learning can realize the learning and extraction of the features in the image through the convolutional operation, has good detection precision and detection speed, and becomes a popular research field.

Computer vision tasks in deep learning are divided into image classification, object detection and instance segmentation, wherein image classification predicts a single class for a single image; the target detection can realize multi-classification and positioning of targets in the image; the example segmentation can realize the pixel level morphological information prediction of the target on the basis of target detection. Although the example segmentation task can well complete the tasks of image classification and target detection, the detection precision and speed are generally limited, and the detection precision of complex morphological targets such as cracks, industrial scraps and the like is poor, so that in order to meet the precision requirements of some industrial scenes, the example segmentation detection precision of the complex morphological targets is improved, and the detection speed is guaranteed to be a direction with research value.

In view of the foregoing, it is necessary to propose a method for segmenting a complex-form object to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a method for segmenting a complex morphological target, which can enhance the complex morphological target feature extraction capability of a YOLOv5 instance segmentation network, solve the conflict problem of classification tasks and regression tasks and improve the detection precision of crack instance segmentation.

In order to achieve the above object, the present invention provides a method for segmenting a complex-morphology object, comprising the steps of:

an example segmentation data set is manufactured, and the data set is divided into a training set and a verification set;

inputting the training set into an improved YOLOv5 example segmentation network for iterative training;

detecting each iteration model by using a verification set, and judging convergence;

and finally, outputting the instance segmentation network model after the convergence of the iteration model.

As a further improvement of the present invention, the improved YOLOv5 instance split network comprises: an improved feature extraction network for improving feature extraction capability for complex morphology targets and an improved detection head for solving the conflict problem of classification tasks and regression tasks caused by coupling structures.

As a further improvement of the invention, the improved feature extraction network is based on an original feature extraction network, a novel C3 module NewC3 module based on a rich gradient flow structure is used, wherein a deformable convolution DCN and an attention CA module are embedded to replace the C3 module; and replacing the rapid space pooling pyramid SPPF module by using the mixed cavity space pooling pyramid ASPP_HDC module based on the mixed cavity convolution structure.

As a further improvement of the invention, the improved detection head is obtained by processing prediction information of confidence, classification, positioning and mask weight through three decoupling branches of a feature map, wherein the confidence and classification adopt the same branch prediction, and the confidence prediction information is used for replacing classification information.

As a further improvement of the present invention, the novel C3 module NewC3 module is based on a C3 module improvement, the improvement comprising the following:

replacing the Bottleneck layer Bottleneck module with a deformable Bottleneck layer DCNBottleneck module embedded in a deformable convolution DCN;

the original series Bottleneck layer Bottleneck module is improved to be a parallel deformable Bottleneck layer DCNBottleck module;

the 3 x 3 convolutional layer before output is replaced with the attention CA module.

As a further improvement of the present invention, the hybrid void space pooling pyramid aspp_hdc module is an improvement based on a void space pooling pyramid ASPP module, comprising the following parts:

processing the input features by 1 x 1 convolution;

global average pooling is used for obtaining image-level features, and 1×1 convolution is performed to obtain bilinear upsampling to a specified size;

obtaining multi-scale characteristics through a mixed cavity convolution structure after using the maximum pooling layer;

and finally fusing the feature graphs obtained by the processing.

As a further improvement of the invention, the deformable Bottleneck layer DCNBottleneck module is based on a Bottleneck layer Bottleneck module, and the deformable receptive field sampling is realized by replacing 3×3 common convolution in a residual structure with a deformable convolution DCN, so that the extraction capability of target complexity morphological characteristics is improved.

As a further improvement of the invention, the mixed cavity convolution structure is based on a series structure, the information utilization rate is improved by the series superposition of cavity convolutions with fixed expansion rate, the multi-scale receptive field of an original network is considered, and cavity convolutions with expansion rates of 1, 2 and 3 are respectively used after the layer is pooled maximally.

As a further improvement of the invention, after the original detection head, the decoupling branches respectively use 3×3 convolution on each prediction branch to obtain feature map acquisition for each prediction task, and then the feature map is processed by 1×1 convolution to obtain final prediction information of each task.

As a further improvement of the invention, the iterative model convergence is represented by the fact that the performance of the model detected on the verification set does not exceed the best performance in a plurality of iterations, and the model with the best performance is output after convergence.

The beneficial effects of the invention are as follows: the method can realize image-based instance segmentation, achieves the detection of the morphological information and the positioning information of the complex morphological target, improves the detection precision of the complex morphological target, and simultaneously keeps a good detection speed.

Drawings

FIG. 1 is a flow chart of an improved YOLOv5 complex morphology object instance segmentation method of the present invention.

Fig. 2 is a structural comparison diagram of the feature extraction network before and after improvement.

Fig. 3 is a structural comparison diagram of the detection head before and after modification.

FIG. 4 is a comparison graph of recognition results of the model image detection of the class dataset before and after improvement.

FIG. 5 is a comparison graph of the recognition effect of the model before and after improvement on the detection of the images outside the data set.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

In this case, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps closely related to the aspects of the present invention are shown in the drawings, and other details not greatly related to the present invention are omitted.

In addition, it should be further noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Referring to fig. 1, a first embodiment of the present invention, in which a concrete crack target is used as a detection target and a crack instance segmentation dataset is made by using a labelme module package to mark an original image, provides a complex morphology target instance segmentation method based on improved YOLOv5, and the complex morphology target instance segmentation method based on improved YOLOv5 includes the following steps:

s1, manufacturing a crack instance segmentation data set, and randomly dividing the data set into a training set and a verification set.

Specifically, the data set consists of a crack image and example segmentation label information, and is divided into a training set and a verification set by a certain proportion.

S2, inputting the training set into the improved YOLOv5 network for iterative training.

Specifically, referring to fig. 2, the improved YOLOv5 network includes: the improved feature extraction network is mainly used for enhancing the feature extraction capacity of cracks and the connectivity of feature pixel points, and meanwhile, the information utilization rate in the multi-scale feature map is improved; the improved detection head mainly solves the problem of conflict caused by the fact that the original coupling detection head utilizes the bias similarity of the feature map for classification and regression tasks.

And S3, enhancing the crack feature extraction capacity by using an improved feature extraction network.

Specifically, the improved feature extraction network is based on an original feature network, a novel C3 module NewC3 module with rich gradient flow structure is used for replacing a C3 module in order to improve crack feature extraction capacity, as many feature graphs as possible are reserved, and a deformable convolution DCN and an attention CA module are embedded; in order to improve connectivity and information utilization rate of the feature pixel points, a hybrid cavity space pooling pyramid ASPP_HDC module based on a hybrid cavity convolution structure is used for replacing a rapid space pooling pyramid SPPF module.

S3.1, the novel C3 module NewC3 module is based on an original C3 module, and the specific improvement part comprises: the deformable Bottleneck layer DCNBottleneck module replaces the Bottleneck layer Bottleneck module; the series Bottleneck layer Bottleneck module structure is replaced by a parallel deformable Bottleneck layer DCNBottleck module; the pre-output convolutional layer is replaced with an attention CA module.

S3.1.1, specifically, the deformable Bottleneck layer DCNBottleneck module replaces 3×3 common convolution in the residual structure of the Bottleneck layer Bottleneck module with deformable convolution DCN, and mainly learns the offset of x and y axes by adding a convolution layer in the common convolution layer, so that the offset of sampling points is realized, the deformation of a receptive field is finally realized, and the enhancement of the extraction capability of complex morphological characteristics of cracks is realized.

S3.2, specifically, improving a feature extraction network to achieve multi-scale feature fusion, enhancing connectivity and information utilization rate among feature pixels, and replacing a rapid space pooling pyramid SPPF module by using a mixed cavity space pooling pyramid ASPP_HDC module based on a mixed cavity convolution structure.

S3.2.1 the hybrid cavity space pooling pyramid aspp_hdc module is based on an ASPP module in deep labv3, and improves the cavity convolution of an original parallel structure into a series structure through hybrid cavity convolution. The cavity convolution can aggregate multi-scale context information without losing resolution, and the connectivity among the characteristic pixel points is enhanced through the mixed cavity convolution, and the information utilization rate in the receptive field is improved.

S4, using an improved detection head to learn different weights aiming at different tasks, and solving the conflict problem of classification tasks and regression tasks.

S4.1, the improved detection head is characterized in that after the original detection head, decoupling operation is carried out on confidence level, classification, positioning and weight mask prediction, and as the characteristic graphs of the weight of each item of prediction information are possibly different, the model convergence is seriously damaged by the similarity of weight parameters of an original 1X 1 convolution layer, so that the detection head with a decoupling structure is used for realizing various prediction information functions by 3X 3 convolution and 1X 1 convolution respectively, thereby solving the conflict problem of classification and regression tasks.

And S5, detecting the performance of the model trained in each iteration through the verification set, recording the performance with the best performance, judging that the network converges according to the fact that the performance of the iterative training model does not exceed the best performance in the process of a plurality of iterative training, and outputting the iterative training model with the best performance as a crack instance segmentation model for a crack instance segmentation detection function of an input image.

Compared with the YOLOv5 network before improvement, the method and the device enhance the extraction capability of crack features, the connectivity among feature pixels and the utilization rate of feature information by improving the feature extraction network, and simultaneously use the improved detection head with a decoupling structure for solving the problem of conflict between classification and regression tasks of the YOLOv5 coupling detection head. Experimental results prove that the crack example segmentation model generated by the improved method has higher detection precision, better generalization and better robustness. The method can predict the morphological information of the crack targets for different scene demands on the basis of classification and positioning of the crack targets.

Example 2

Referring to fig. 4 and 5, in the above embodiment, in order to verify the advantageous effects thereof, a detailed description is provided for a second embodiment of the present invention.

From the above description of the embodiments, those skilled in the art will recognize that the present invention provides a crack example segmentation method based on improved YOLOv5 and uses a concrete crack dataset and a YOLOv5 network model for verification. The detection effect on the image of the class data set before and after improvement is compared with that shown in fig. 4, it can be found that the detection precision of the crack is obviously improved after improvement, the detection is more suitable for complex crack image detection, and the detection effect on the crack image outside the data set is also very good, as shown in fig. 5, the ablation experiment specific to two different improvement points is as follows:

table 1 ablation experimental data before and after improvement

Experimental results show that the improved model has better generalization and robustness. The two improvement points of the improved feature extraction network and the improved detection head are improved for four indexes, and finally the two experimental data are combinedLess than optimal, achieving a 2.7% improvement compared to 0.9% lower for the improved feature extraction network alone, but、/>And->All are best improved by 14.7%, 2.3% and 5.2%, respectively, wherein the performance improvement is larger when the threshold is more than 50%, and the final detection result is more accurate. Meanwhile, as the YOLOv5 is a single-stage network, the detection speed can be kept to be 86FPS, and the detection requirement in an industrial scene can be well met.

In conclusion, the method can realize image-based instance segmentation, achieves detection of the morphological information and the positioning information of the complex morphological target, improves the detection precision of the complex morphological target, and simultaneously keeps good detection speed.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The method for segmenting the complex morphological target is characterized by comprising the following steps of:

outputting an instance segmentation network model after the final iteration model converges;

wherein the improved YOLOv5 instance splitting network comprises: the improved feature extraction network is used for improving feature extraction capacity aiming at complex morphological targets, and the improved detection head is used for solving the conflict problem of classification tasks and regression tasks caused by a coupling structure;

the improved feature extraction network is based on an original feature extraction network, a novel C3 module NewC3 module based on a rich gradient flow structure is used, a deformable convolution DCN and an attention CA module are embedded in the novel C3 module, and the C3 module is replaced; replacing a rapid space pooling pyramid SPPF module by using a mixed cavity space pooling pyramid ASPP_HDC module based on a mixed cavity convolution structure;

the novel C3 module NewC3 module is improved based on the C3 module, and the improvement comprises the following parts:

replacing the 3×3 convolution layer before output with an attention CA module;

the deformable Bottleneck layer DCNBottleneck module is based on the Bottleneck layer Bottleneck module, and the 3 multiplied by 3 common convolution in the residual structure is replaced by the deformable convolution DCN, so that the deformability of receptive field sampling is realized, and the extraction capability of target complexity morphological characteristics is improved.

2. The method for segmenting a complex morphological object according to claim 1, characterized in that: the improved detection head is obtained by processing prediction information of confidence coefficient, classification, positioning and mask weight through three decoupling branches of a feature map, wherein the confidence coefficient and classification adopt the same branch prediction, and the confidence coefficient prediction information is used for replacing classification information.

3. The method for segmenting a complex morphological object according to claim 1, characterized in that: the mixed cavity space pooling pyramid ASPP_HDC module is improved based on the cavity space pooling pyramid ASPP module and comprises the following parts:

processing the input features by 1 x 1 convolution;

and finally fusing the feature graphs obtained by the processing.

4. A method for segmenting a complex morphological object according to claim 3, characterized in that: the mixed cavity convolution structure is based on a series structure, the information utilization rate is improved through the series superposition of cavity convolutions with fixed expansion rate, the multi-scale receptive field of an original network is considered, and cavity convolutions with expansion rates of 1, 2 and 3 are respectively used after the layer is pooled maximally.

5. The method for segmenting a complex morphological object according to claim 2, characterized in that: and after the original detection head, the decoupling branches respectively use 3X 3 convolution on each prediction branch to obtain feature graphs aiming at each prediction task, and then the feature graphs are processed through 1X 1 convolution to obtain the final prediction information of each task.

6. The method for segmenting a complex morphological object according to claim 1, characterized in that: the iterative model convergence is represented by the fact that the performance of the model detected on the verification set does not exceed the best performance in a plurality of iteration times, and the model with the best performance is output after convergence.