CN111598854B

CN111598854B - Segmentation method for small defects of complex textures based on rich robust convolution feature model

Info

Publication number: CN111598854B
Application number: CN202010368806.1A
Authority: CN
Inventors: 陈海永; 刘聪; 王霜; 刘卫朋; 张建华
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-05-01
Filing date: 2020-05-01
Publication date: 2023-04-28
Anticipated expiration: 2040-05-01
Also published as: CN111598854A

Abstract

The invention discloses a segmentation method of small defects of complex textures based on a rich robust convolution feature model, which is characterized by comprising the steps of obtaining an image containing an object to be segmented, and carrying out feature recombination on the image containing the object to be segmented by utilizing the rich robust convolution feature model to obtain a feature map of each side output layer; the feature map of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature map of each stage side output layer; and adding a fusion layer into the model, fusing the feature images after deconvolution of the feature images of all the side output layers together, and then connecting a fusion layer fine loss function to obtain a final prediction image so as to realize defect segmentation. The method solves the problem of inaccurate prediction results caused by unbalanced duty ratio between the target pixel and the background pixel, and can predict the fine target.

Description

Segmentation method for small defects of complex textures based on rich robust convolution feature model

Technical Field

The invention relates to the technical field of lithium battery surface defect detection, in particular to a segmentation method of small defects of complex textures based on a rich robust convolution characteristic model.

Background

The surface defect detection of the lithium battery has become an important technical means for controlling the surface quality of the lithium battery, and the surface quality of the lithium battery not only can prolong the service life of a battery assembly, but also can improve the power generation efficiency of the lithium battery.

For the crack segmentation method based on Convolutional Neural Network (CNN), two problems are generally faced, namely that the omission or false detection of cracks is serious, and the predicted crack segmentation result is thicker, and the fine cracks can be obtained only by complex post-processing. The complex non-uniform texture background on the surface of the lithium battery is one of the main reasons for the problems, and the other is that the ratio of cracked pixels to background pixels in a defect image is extremely unbalanced, for example, the defect image size is 100 ten thousand pixels, and the defect pixels only occupy tens of pixels or even tens of pixels.

The information obtained by different convolution layers becomes coarser along with the deepening of the layers, the lower convolution layers contain complex random texture background and target detail information, the distinction between the target information and the background is not obvious, the network learns only some information which has no distinction such as shapes, corner features and the like, important target information is reserved in the higher convolution layers, and the middle convolution layers contain essential target detail information. However, the general convolutional neural network model only uses the output characteristics of the last convolutional layer or the convolutional layer before the pooling layer of each stage, and ignores the target detail information contained in the middle convolutional layer; for crack segmentation, the key problem is that the similarity between background information and target information is high, and excessive fusion tends to cause serious false detection.

Although the segmentation method based on the convolutional neural network is good at predicting the features such as outline, edge and the like rich in semantic information, the prediction result of directly adopting the convolutional neural network to conduct crack segmentation is much thicker than the labeling cracks of the real labels through analysis, so that crack pixels cannot be accurately positioned. The problem of predicting cracks, edges, contours or lines too thick is rarely discussed in the prior literature, and one possible reason is that these methods usually apply post-processing methods for refining the cracks, edges, contours or lines to obtain a predicted result close to a real label after generating an initial predicted result, so that the width of the processed predicted result appears to have little effect on the result, and actually reduces the prediction accuracy, and therefore, in some detection tasks with high requirements on accurate positioning of pixel levels, the requirements cannot be met.

The loss function is used for evaluating the degree of inconsistency between the predicted value and the true value of the model, and the smaller the loss function is, the better the robustness of the representative model is, and the loss function can guide the model to learn. The lithium battery crack defect image has the advantages that the ratio of the crack pixels to the background pixels is extremely unbalanced, so that a negative sample (background pixels) occupies a great part of the proportion of the model loss, the local minimum value of the loss function is sunk in the learning process, the prediction is more biased to the background pixels, the trained model cannot detect the crack, and the influence of the unbalanced ratio of the crack pixels to the background pixels on the loss function is the root cause of thicker crack segmentation result.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a segmentation method for small defects of complex textures based on a rich robust convolution characteristic model.

The technical scheme adopted for solving the technical problems is as follows:

the method is characterized by comprising the steps of obtaining an image containing an object to be segmented, and carrying out feature recombination on the image containing the object to be segmented by utilizing the rich robust convolution feature model to obtain a feature map of each side output layer; the feature map of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature map of each stage side output layer;

meanwhile, a fusion layer is added in the model, feature images after deconvolution of all the side output layer feature images are fused together, and then a fusion layer fine loss function is connected to obtain a final prediction image, so that defect segmentation is realized;

wherein, the fine loss function of the side output layer satisfies the formula (1):

P _side ＝σ(A _side )，A _side ＝{a _j ，j＝1，……|Y|} (2)

wherein L is ^(k) (P _side G) represents a distance loss function at the kth stage; l (W, W) ^(k) ) A weighted cross entropy loss function representing a kth stage; p (P) _side A prediction feature map representing a kth stage side output layer; sigma is a sigmoid activation function; a is that _side Representing a set of activation values at all pixels of the predictive feature map for the k stage side output layers; a, a _j An activation value at any pixel j in the prediction feature map representing the kth stage side output layer; y represents the sum of defective pixels and non-defective pixels in the diagram;

the fusion layer fine loss function is obtained by the following formula:

L _fuse (W，w)＝L _c (P _fuse ，G) (3)

wherein L is _c Representing a standard cross entropy loss function; p (P) _fuse Fusion of prediction feature graphs of the k stage side output layers is shown, namely fusion layer weights; k represents the total number of stages;

summarizing the fusion layer fine loss function and the side output layer fine loss functions of all stages by using an argmin function to obtain an objective function L, wherein the objective function L is represented by a formula (5);

and finally, optimizing the objective function to obtain the weight of the fine loss function of the side output layer and the fine loss function of the fusion layer.

The specific process of feature recombination by using the rich robust convolution feature model is as follows:

removing a full connection layer and a pooling layer in a fifth stage on the basis of an original ResNet40 network, wherein an identification block layer in a first stage and an identification block layer in a second stage of the original ResNet40 network are respectively and laterally connected with a convolution layer to obtain feature images of a first-stage side output layer and a second-stage side output layer;

and laterally connecting a convolution layer after each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain characteristic diagrams after the convolution of the respective block layers, and then respectively adding the characteristic diagrams after the convolution of all the block layers in the same stage element by element to obtain the characteristic diagrams of the side output layers in the corresponding stages.

The convolution kernel sizes of the convolution layers laterally connected by the first and second marking block layers are 1 multiplied by 1, and the step length and the channel number are 1; the convolution kernel size of the laterally connected convolution layers behind each block layer in the third, fourth and fifth stages is 1×1, the step size is 1, and the channel number is 21.

The original ResNet40 network comprises 40 convolution layers and a full connection layer positioned at the last layer of the network, and is divided into 5 stages, wherein each stage comprises a convolution block layer and one or more identification block layers, the first stage and the second stage respectively comprise a convolution block layer and one identification block layer, the third stage, the fourth stage and the fifth stage respectively comprise a convolution block layer and two identification block layers, and each convolution block layer and each identification block layer comprise a plurality of convolution layers; each stage adds a pooling layer with a pooling window size of 2×2 and a step size of 1 after all the identification block layers.

The specific structure of the original ResNet40 network is: firstly, an input target image sequentially passes through convolution with a convolution kernel size of 5 multiplied by 5, a step length of 1 and a channel number of 32 and a maximum pooling layer with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain an input characteristic of a first stage; the input features of the first stage are sequentially subjected to residual connection of three convolutions with convolution kernel sizes of 1×1, 3×3 and 1×1, step sizes of 1 and channel numbers of 32 and one convolution kernel with the size of 1×1, step sizes of 1 and channel numbers of 32 to obtain the output features of the convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are obtained after three convolutions with convolution kernel sizes of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, step sizes of 1 and channel numbers of 32 are sequentially carried out; the output characteristics of the first stage identification block layer are obtained after passing through a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2;

the output characteristics of the first stage are sequentially subjected to residual connection of three convolutions with convolution kernel sizes of 1×1, 3×3 and 1×1, step sizes of 1 and channel numbers of 64 and one convolution kernel with the sizes of 1×1, step sizes of 1 and channel numbers of 64 to obtain the output characteristics of the convolution block layer of the second stage; the output characteristics of the second-stage convolution block layer are obtained after three convolutions with convolution kernel sizes of 1×1, 3×3 and 1×1, step sizes of 1 and channel numbers of 64 are sequentially carried out; the output characteristics of the second-stage identification block layer are subjected to a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2 to obtain the output characteristics of the second stage;

the output characteristics of the second stage are sequentially subjected to residual connection of three convolutions with convolution kernel sizes of 1×1, 3×3 and 1×1, step sizes of 1 and channel numbers of 256 and one convolution kernel with the size of 1×1, step sizes of 1 and channel numbers of 256 to obtain the output characteristics of the convolution block layer of the third stage; the output characteristics of the third-stage convolution block layer are sequentially subjected to three convolutions with convolution kernel sizes of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, step sizes of 1 and channel numbers of 256 to obtain the output characteristics of the third-stage first identification block layer; the output characteristics of the first identification block layer in the third stage are obtained after three convolutions with convolution kernel sizes of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, step sizes of 1 and channel numbers of 256 are sequentially carried out; the output characteristics of the second identification block layer in the third stage are obtained after passing through a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2;

the operation process of the fourth stage is the same as that of the third stage, and the output characteristics of the third stage are obtained after repeating the operation of the third stage;

the operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers of the fourth stage, and the output characteristics of the fourth stage are obtained after repeating the operation of the convolution block layer and the two identification block layers of the fourth stage.

A segmentation method of complex texture small defects based on a rich robust convolution feature model comprises the following specific steps:

s1 image preprocessing

Collecting an image containing a defect to be segmented, and normalizing the collected image into 1024×1024 pixels; adding pixel-level labels to the normalized image, wherein the images with the labels added are target images; dividing the target image into different sample sets according to the proportion;

s2 constructing rich robust convolution characteristic model

Based on an original ResNet40 network, a convolution layer is laterally connected with an identification block layer in a first stage and an identification block layer in a second stage of the original ResNet40 network respectively to obtain feature diagrams of a first-stage side output layer and a second-stage side output layer;

connecting a convolution layer laterally after each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain feature images after the convolution of the respective block layers, and then adding the feature images after the convolution of all the block layers in the same stage element by element to obtain feature images of side output layers in the corresponding stages;

respectively connecting the feature graphs of the five stage side output layers with a deconvolution layer (deconv) for up-sampling to obtain feature graphs after deconvolution of the respective stages, and respectively connecting the feature graphs after deconvolution of each stage with a side output layer fine loss function for pixel-by-pixel classification to obtain a prediction feature graph of each stage side output layer;

connecting the deconvoluted feature images of all the stages together, and then fusing all the deconvoluted feature images through a convolution layer with the convolution kernel size of 1 multiplied by 1 and the step length of 1 to obtain a fused layer feature image; finally connecting a fusion layer fine loss function with the fusion layer feature map to obtain a final prediction feature map;

s3 model training and testing

Initializing model parameters, and inputting a target image for training and a corresponding pixel-level label thereof; transmitting the loss to the weight of each convolution layer through a random gradient descent method in the model training process, and updating the weight value of the weight, wherein the momentum of the random gradient descent method is 0.9, and the weight attenuation is 0.0005; randomly sampling 1 image in each training process, stopping training when the iteration cycle number reaches 100 cycles, and finishing the training of the model;

scaling and adjusting the target image for test to 1024x1024 pixels, and inputting the scaled target image into the model after training; and the test time of the single image is 0.1s, the operation of the model is repeated, and the model test is completed.

The object to be segmented is a crack, an edge or a linear structure.

Compared with the prior art, the invention has the beneficial effects that:

from the perspective of reasonable utilization of convolution characteristics and design of a loss function, the invention aims to enable the model to learn defect characteristics which are as rich and complete as possible, predict fine defects with robustness under the condition of not using a post-processing method, and enable the defect characteristics learned by the model to generate a predicted characteristic diagram which is as similar as possible to a real label; therefore, the invention builds a rich robust convolution feature model based on an original ResNet40 network, and performs end-to-end deep learning under a Keras1.13 deep learning framework, the model adopts a network structure with multi-scale and multi-level features, more high-level features (third, fourth and fifth stages) are fused, and fewer low-level features (first and second stages) are fused, and simultaneously, convolution with a convolution kernel size of 1x1 is respectively adopted for superposition fusion in each stage, so that all convolution features are packaged into a richer and more robust expression mode, and the expression capability of the features is improved; and the output characteristic diagram of the middle layer of each stage is utilized, so that the defect that the prior conventional convolutional neural network model only uses the output characteristic of the last convolutional layer or the convolutional layer before the pooling layer of each stage and ignores the target detail information contained in the middle layer is overcome.

Aiming at the problem of inaccurate prediction results caused by unbalanced duty ratio between crack pixels and non-crack pixels, in order to predict fine cracks, a side output layer fine loss function is respectively introduced into a side output layer of each stage, a fusion layer fine loss function is introduced into a fusion layer of a model, the side output layer fine loss function is combined with a weighted cross entropy loss function and a distance loss function, defect characteristics in a prediction characteristic diagram of the fine side output layer are detected, and meanwhile, the weight of each convolution layer in the side output layer is optimized through the side output layer fine loss function in the model training process; the fusion layer fine loss function fuses the side output layer fine loss function, the defect characteristics in the characteristic diagram are predicted finely and finally, meanwhile, the weight of each convolution layer in the fusion layer is optimized through the fusion layer fine loss function in the model training process, and the crack is predicted globally to locally.

Compared with the traditional filter segmentation method and the conventional convolutional neural network, the method has the advantages that finer cracks can be predicted by using the abundant robust convolutional feature model, so that crack segmentation recognition accuracy can reach 79.64%.

The method can provide thought for target segmentation with extremely high aspect ratio similar to a crack structure and high requirement on fineness.

Drawings

FIG. 1 is a network structure diagram of a robust-rich convolution feature model of the present invention;

FIG. 2 is a graph of crack segmentation results for different segmentation methods of the present invention;

FIG. 3 is a graph showing the comparison of the evaluation results of different segmentation methods according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made more fully hereinafter with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The method of the present application will be described in detail below with reference to the application to the lithium battery for performing the surface crack defect of the lithium battery.

The invention provides a segmentation method (abbreviated as method) of complex texture small defects based on a rich robust convolution characteristic model, which comprises the following steps:

s1 image preprocessing

S1-1 acquiring an image

The method has the advantages that the 140 ten thousand near infrared cameras are used for collecting lithium battery images, the actual size of the collected lithium battery images is 165mm multiplied by 165mm, the collected images are normalized to 1024 multiplied by 1024 pixels, and the 1024 multiplied by 1024 pixels serve as original images, so that the original images do not need to be subjected to complex pretreatment processes, and the dimension normalization can be used for model input; the setting of the image size is almost equal to the image size acquired by an original camera, so that the original information of the image can be better reserved, no complex processing process is needed, the processing speed of an algorithm is improved, and the real-time requirement of production line detection is met; the original image comprises an image containing an object to be segmented and an image not containing the object to be segmented;

s1-2 image label

Manually labeling all original images containing objects to be segmented in the step S1-1 by using Labelimg software, adding pixel-level labels, wherein the pixel-level labels comprise the area size and the spatial position information of defects, and the images added with the labels are target images for model training, testing and verification;

s1-3 preparation of sample set

Grouping the target images in the step S1-2, randomly extracting 20% (default value) of target images from the target images to serve as test sample sets, and randomly dividing the rest target images into a training sample set and a verification sample set according to the ratio of 4:1;

s2 construction of a robust convolutional feature model (Rich and Robust Convolutional Features, RRCF)

S2-1 original ResNet40 network

The invention is based on the improvement of the original ResNet40 network, the original ResNet40 network includes 40 convolution layers (Conv) and full-connection layer (Fully Connected Layer) located at the last layer of the network, mainly divide into 5 stages (Stage), each Stage includes a convolution Block layer (Conv Block) and one or more identification Block layers (Identity Block), represent with the stagek_block, k represents the number of stages, m represents the number of Block layers in the corresponding Stage, wherein the first Stage includes a convolution Block layer and a identification Block layer, the third Stage, the fourth Stage, and the fifth Stage include a convolution Block layer and two identification Block layers, each convolution Block layer and identification Block layer include multiple convolution layers; adding a pooling layer with the pooling window size of 2 multiplied by 2 and the step length of 1 after all the identification block layers in each stage; specific parameters of each convolution layer of the original ResNet40 network are shown in Table 1;

firstly, an input target image sequentially passes through convolution with a convolution kernel size of 5 multiplied by 5, a step length of 1 and a channel number of 32 and a maximum pooling layer (Maxpool) with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain an input characteristic of a first stage; the input features of the first stage sequentially pass through three convolutions with convolution kernel sizes of 1×1, 3×3 and 1×1, step sizes of 1 and channel numbers of 32 and residual connection (Shortcut) with one convolution kernel size of 1×1, step size of 1 and channel number of 32 to obtain the output features of the convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are obtained after three convolutions with convolution kernel sizes of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, step sizes of 1 and channel numbers of 32 are sequentially carried out; the output characteristics of the first stage identification block layer are obtained after passing through a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2;

the operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers of the fourth stage, and the output characteristics of the fourth stage are obtained after repeating the operation of the convolution block layer and the two identification block layers of the fourth stage;

the input target image is calculated layer by layer to extract features, for example, the size of the target image is 1024×1024×32, wherein both the length and the width are 1024, and the number of channels is 32; the output size after convolution with a convolution kernel size of 5×5, a step size of 1, and a channel number of 32 is 1024×1024×32; then the output size after the maximum pooling with the convolution kernel size of 2 x2 and the step size of 2 is 512 x 32, namely the size of the input feature in the first stage is 512 x 32; after the operation, the output feature size of the first stage is 256×256×32, the output feature size of the second stage is 128×128×64, the output feature size of the third stage is 64×64×256, the output feature size of the fourth stage is 32×32×256, and the output feature size of the fifth stage is 16×16×256;

TABLE 1 specific parameters of the original ResNet40 network

/>

In the table, identity Block×2 indicates that the Identity Block operation has been performed twice;

s2-2 feature map reorganization of rich robust convolution feature model

The full connection layer and the pooling layer of the fifth stage are removed on the basis of the original ResNet40 network constructed in the step S2-1; on the one hand, the full connection layer is removed to provide a full convolution network for outputting image-to-image prediction, and meanwhile, the model calculation complexity can be reduced; on the other hand, the pooling layer in the fifth stage can increase the step length by two times to influence the defect positioning; although the pooling layer can affect positioning, the pooling layer is adopted in the first four stages mainly for accelerating training;

the method comprises the steps that a convolution layer with the convolution kernel size of 1 multiplied by 1 and the step length and the channel number of 1 is respectively and laterally connected with an identification block layer (stage 1_block2) in the first stage and an identification block layer (stage 2_block2) in the second stage of an original ResNet40 network to reduce the dimension of the channel number, so that feature graphs of a first-stage side output layer and a second-stage side output layer are respectively obtained, and integration of feature information is realized;

each block layer in the third, fourth and fifth stages of the original ResNet40 network, namely, a convolution layer with a convolution kernel size of 1 multiplied by 1, a step size of 1 and a channel number of 21 is laterally connected after each block layer, namely, a step 3_block1, a step 3_block2, a step 3_block3, a step 4_block1, a step 4_block2, a step 4_block3, a step 5_block1, a step 5_block2 and a step 5_block3, so as to obtain a feature map after convolution of each block layer, and then, the feature maps after convolution of all block layers in the same stage are added element by element to obtain a feature map of a side output layer of the corresponding stage;

s2-3 constructing predictive feature graphs of rich robust convolution feature models

The feature images of the five stage side output layers are respectively connected with a deconvolution layer (deconv) for up-sampling, so that the feature images with the same scale as the target image are predicted, the feature images after deconvolution of the respective stages are obtained, and the spatial position information of the defects in the target image is reserved in the feature images after deconvolution;

the feature map after deconvolution of each stage is respectively connected with a fine loss function of a side output layer to carry out pixel-by-pixel classification, so that a predicted feature map of the side output layer of each stage is obtained, namely, each pixel on the feature map after deconvolution is classified, defect features in the predicted feature map of the fine side output layer are obtained, and meanwhile, the weight of each convolution layer in the side output layer is optimized through the fine loss function of the side output layer in the model training process;

in order to directly utilize the prediction feature graphs of the side output layers of each stage, a fusion layer is added in a model, and the weight of the fusion layer is learned in the training process, namely, the feature graphs after deconvolution of each stage are connected (connected) together, and then all the deconvoluted feature graphs are fused through a convolution layer with the convolution kernel size of 1 multiplied by 1 and the step length of 1, so that the fusion layer feature graph is obtained; finally, connecting a fusion layer fine loss function with the fusion layer feature map to obtain a final prediction feature map, finely predicting defect features in the feature map, and optimizing weights of all convolution layers in the fusion layer through the fusion layer fine loss function in the model training process; the final prediction feature map is the prediction feature map of the rich robust convolution feature model;

s3 design of fine loss function

S3-1 weighted cross entropy loss function

Since small defects are very unevenly distributed over pixels in a complex texture background in a defect image, most pixels are randomly distributed non-defective pixels, i.e. background, such as cracked pixels and non-cracked pixels, the defective pixels cannot be accurately segmented from the non-defective pixels directly using the cross entropy loss function; a weighted cross entropy loss function (weighted cross-entropy loss) introduces a balance-like weight coefficient β to offset the imbalance between defective and non-defective pixels, the loss of each pixel satisfying equation (1):

β＝|Y _- |/|Y|，1-β＝|Y ₊ |/|Y| (2)

wherein X represents a target image; w represents a set of all network layer parameters; w (w) ^(k) Weights of the prediction feature map representing the kth stage side output layer; y is Y ₊ And Y _- Representing defective pixels and non-defective pixels, respectively; beta represents a class balance weight coefficient; y=y ₊ And Y _{_} And (3) summing; y is _j Representing any pixel in the target image; pr (y) _j ＝1|X；W，w ^(k) ) Represented at pixel y _j Class score calculated by using sigmoid activation function and Pr E [0,1]；

S3-2 distance loss function (Dice, written as Dice loss function)

Given a target image X and a corresponding real tag G, the predicted image of the target image X is P, and the distance loss function (Dice loss function) can compare the similarity between the predicted image P and the real tag G and can minimize the distance between the two, and the Dice loss function (Dist (P, G)) has the following formula:

wherein p is _j E, P, is any pixel in the predicted image P; g _j E, G is any pixel in the real label G; n represents the total number of pixels in the target image;

design of S3-3 Fine loss function

In order to obtain better defect prediction performance, a fine loss function (Precise Loss Function) is proposed that combines a weighted cross entropy loss function and a Dice loss function; wherein the Dice loss function is considered as a loss at the image level, focusing on the similarity between two groups of image pixels, the Dice loss function can reduce redundant information, is a key for generating fine cracks in the application, and is easy to generate phenomena of incomplete prediction and target loss, such as a part of predicted cracks; the weighted cross entropy loss function focuses on the pixel-level variability, because the weighted cross entropy loss function is the sum of the distances between each corresponding pixel between the predicted image and the real label, the prediction is comprehensive, no target loss is caused, but the weighted cross entropy loss function easily introduces more background information, so that the prediction result is inaccurate; therefore, the two can be combined to minimize the distance between the image level and the pixel level, and the prediction from the global to the local is realized;

in order to obtain finer prediction feature diagrams of the side output layers of each stage, a side output layer fine loss function is provided, and the following formula is satisfied:

P _side ＝σ(A _side )，A _side ＝{a _j ，j＝1，……|Y|} (5)

wherein L is ^(k) (P _side G) represents a distance loss function at the kth stage; l (W, W) ^(k) ) A weighted cross entropy loss function representing a kth stage; p (P) _side A prediction feature map representing a kth stage side output layer; sigma is a sigmoid activation function; a is that _side Representing a set of activation values at all pixels of the predictive feature map for the k stage side output layers; a, a _j An activation value at any pixel j in the prediction feature map representing the kth stage side output layer;

the fusion layer fine loss function is obtained by the following formula:

L _fuse (W，w)＝L _c (P _fuse ，G) (6)

summarizing the fusion layer fine loss function and the side output layer fine loss functions of all stages by using an argmin function (minimum function, which represents a variable value when the objective function takes the minimum value), so as to obtain the objective function, wherein the objective function is shown as a formula (8); optimizing an objective function by a standard random gradient descent method, and further optimizing the weight of the fine loss function of each side output layer and the weight of the fine loss function of the fusion layer;

s4 model training and testing

S4-1, initializing model parameters: initializing all weight values, bias values and batch normalization scale factor values, inputting initialized parameter data into the rich robust convolution feature model established in the step S2, and setting the initial learning rate lambda=0.001 of the model; the weight standard deviation of the convolution layers in the first to fifth stages is initialized to 0.01, and the weight deviation is initialized to 0; initializing the standard deviation of the weights of all convolution layers of the fusion layer to 0.2, and initializing the deviation of the weights to 0;

s4-2 model training: inputting the target image in the training sample set and the corresponding pixel-level label into the rich robust convolution feature model after initializing parameters in the step S4-1; transmitting the loss to the weight of each convolution layer through a random gradient descent method (Stochastic gradientdescent, SGD) in the model training process, and updating the weight value of the loss, wherein the momentum of the random gradient descent method is 0.9, and the weight attenuation is 0.0005; 1 image is randomly sampled in each training process, training is stopped when the iteration cycle number reaches 100 cycles, and training of the rich robust convolution characteristic model is completed; the above operations are all completed under the window10 system, the training computer CPU is the Kuri 7 series, the memory is 32GB, and the graphic card is NIVIDIAGeforce GTX2080ti; training of the model is realized based on a Keras1.13 deep learning framework;

s4-3 model test: scaling and adjusting the target image in the test sample set to 1024x1024 pixels, and inputting the scaled target image into the rich robust convolution feature model trained in the step S4-2; the test time of a single image is 0.1S so as to meet the requirement of production efficiency, and the operation of the step S4-2 is repeated to complete the model test.

In order to verify the effectiveness of the method, experiments are carried out on the lithium battery image containing the crack defect by using the method, and meanwhile, the method is compared with a traditional segmentation method (Gabor filter method) and a common convolutional neural network method (UNet, U-shaped network), and the comparison result is shown in fig. 2; wherein, (a 1) represents an original image containing a crack defect, and (a 5) is a corresponding real label thereof; (a2) The method is a result of extracting features by adopting a Gabor filter method; (a3) The method is a result of extracting characteristics by adopting a UNet model (U-shaped network) method; (a4) The method is a result of feature extraction for a rich robust convolution feature model (RRCF) provided by the method;

as can be seen from fig. 2, the RRCF model proposed by the method performs superposition fusion on the convolution characteristics at each stage, so that the learned crack information is more, and the defect that the Gabor filter method is easy to misdetect a grid line structure similar to the crack structure and the UNet model is easy to misdetect a part with crystal grain shielding as a crack is overcome; the crack lines predicted by the RRCF model provided by the method are thinner and more approximate to a real label, and the result shows that two fine loss functions in the method are beneficial to predicting fine cracks, so that the prediction results of a Gabor filter and a UNet model which are not sufficiently fine in prediction are improved, and the prediction precision is higher.

In order to quantitatively evaluate the performance of each method, three indexes of cpt (integrity), crt (accuracy) and F-measure (F measure) are respectively used for quantitative analysis, wherein F-measure is a calculation result obtained based on the cpt and the crt, and the higher the F-measure value is, the more effective the adopted method is; each expression is shown in formulas (9) - (11);

wherein L is _g Representing the number of cracked pixels in the real label marked manually; l (L) _t The number of pixels extracted in the detection method; l is the number of pixels matched with the real label in the extraction result of the detection method;

the index values of the three methods are shown in fig. 3, and the UNet model and the rich robust convolution feature model both show higher integrity cpt and reflect the advantage of the convolution neural network in solving the problem of crack detection under complex background interference; the F-measure of the abundant robust convolution characteristic model is 85.81%, and the performance is superior to the other two methods; the integrity and the accuracy of the abundant robust convolution feature model are 93.02% and 79.64%, on one hand, the integrity of crack segmentation is improved by multi-level fusion of a network, on the other hand, two fine loss functions are designed according to the characteristic of the crack, namely the extreme aspect ratio, so that the background information interference is reduced, the accuracy is improved, the identification accuracy is remarkably improved, crack features are not easy to lose in the process, and crack omission is avoided; the accuracy of the UNet model is the lowest (69.5%), and is the lowest because the UNet model is greatly influenced by background interference, excessive background information is introduced, fine segmentation of cracks cannot be realized, and therefore the UNet model has the lowest accuracy; in summary, the method has the highest integrity and accuracy of crack segmentation, has the best segmentation effect, and can realize fine crack segmentation.

The invention is applicable to the prior art where it is not described.

Claims

1. The method is characterized by comprising the steps of obtaining an image containing an object to be segmented, and carrying out feature recombination on the image containing the object to be segmented by utilizing the rich robust convolution feature model to obtain a feature map of each side output layer; the feature map of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature map of each stage side output layer;

the structure of the rich robust convolution feature model is as follows:

P _side ＝σ(A _side )，A _side ＝{a _j ,j＝1,……|Y|} (2)

wherein L is ^(k) (P _side G) represents a distance loss function at the kth stage; l (W, W) ^(k) ) A weighted cross entropy loss function representing a kth stage; p (P) _side Prediction feature representing a kth stage side output layerA sign map; sigma is a sigmoid activation function; a is that _side Representing a set of activation values at all pixels of the predictive feature map for the k stage side output layers; a, a _j An activation value at any pixel j in the prediction feature map representing the kth stage side output layer; y represents the sum of defective pixels and non-defective pixels in the diagram;

the fusion layer fine loss function is obtained by the following formula:

L _fuse (W,w)＝L _c (P _fuse ,G) (3)

2. The segmentation method according to claim 1, wherein the convolution kernel size of the convolution layer laterally connected by the first and second identification block layers is 1×1, and the step size and the channel number are all 1; the convolution kernel size of the laterally connected convolution layers behind each block layer in the third, fourth and fifth stages is 1×1, the step size is 1, and the channel number is 21.

3. The segmentation method according to claim 1, wherein the original ResNet40 network comprises 40 convolution layers and a full connection layer positioned at the last layer of the network, and is divided into 5 stages, each stage comprises one convolution block layer and one or more identification block layers, wherein the first stage and the second stage respectively comprise one convolution block layer and one identification block layer, the third stage, the fourth stage and the fifth stage respectively comprise one convolution block layer and two identification block layers, and each convolution block layer and each identification block layer comprise a plurality of convolution layers; each stage adds a pooling layer with a pooling window size of 2×2 and a step size of 2 after all the identification block layers.

4. A segmentation method according to claim 3, characterized in that the specific structure of the original res net40 network is:

firstly, an input target image sequentially passes through convolution with a convolution kernel size of 5 multiplied by 5, a step length of 1 and a channel number of 32 and a maximum pooling layer with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain an input characteristic of a first stage; the input features of the first stage are sequentially subjected to residual connection of three convolutions with convolution kernel sizes of 1×1, 3×3 and 1×1, step sizes of 1 and channel numbers of 32 and one convolution kernel with the size of 1×1, step sizes of 1 and channel numbers of 32 to obtain the output features of the convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are obtained after three convolutions with convolution kernel sizes of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, step sizes of 1 and channel numbers of 32 are sequentially carried out; the output characteristics of the first stage identification block layer are obtained after passing through a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2;

5. The segmentation method according to any one of claims 1-4, characterized in that the method comprises the specific steps of:

s1 image preprocessing

s2 constructing rich robust convolution characteristic model

respectively connecting the feature images of the five stage side output layers with a deconvolution layer for up-sampling to obtain feature images after deconvolution of the respective stages, and respectively connecting the feature images after deconvolution of each stage with a side output layer fine loss function for pixel-by-pixel classification to obtain a prediction feature image of each stage side output layer;

s3 model training and testing

6. The segmentation method according to claim 1, wherein the object to be segmented is a crack, an edge or a linear structure.