CN115049848B

CN115049848B - General elliptical target detection method based on deep learning

Info

Publication number: CN115049848B
Application number: CN202210768594.5A
Authority: CN
Inventors: 王田浩; 夏思宇
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2024-07-05
Anticipated expiration: 2042-07-01
Also published as: CN115049848A

Abstract

The invention discloses a general elliptical target detection method based on deep learning, which comprises the following steps: constructing a universal elliptical target detector; building a convolutional neural network structure based on Anchor-free target detection; training parameters of the established convolutional neural network by using a universal elliptical target detection data set; and inputting the picture to be detected into a trained convolutional neural network for detection, and outputting a detection result graph. According to the invention, by using the Anchor-free target detection convolutional neural network, any elliptical target can be automatically and accurately detected and positioned.

Description

General elliptical target detection method based on deep learning

Technical Field

The invention relates to the field of pattern recognition, in particular to a general elliptical target detection method based on deep learning.

Background

Geometry is one of the main features of an object and has important significance for detection and identification of targets. In some application scenarios, shape information such as medical image diagnosis and object counting is often of greater concern. Elliptical shapes are more common in real scenes, have a better fitting effect on objects, are easier to express in mathematical parameters, and are therefore more common in shape detection. The traditional ellipse detection method usually establishes a mathematical model according to the edges, contours, curvatures and the like of images to fit a target ellipse, but the detection effect of the method depends on the quality of image preprocessing and edge extraction to a great extent, has poor resistance to irrelevant interference lines and noise, and has weak robustness. We therefore use a deep learning approach to enhance the robustness and generalization of the detection method.

Disclosure of Invention

In order to solve the problems, the invention discloses a general elliptical target detection method based on deep learning, and the method can realize detection and parameter regression of any elliptical target.

The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: the invention designs a general elliptical target detection method based on deep learning, which comprises the following steps:

step 1: constructing a general elliptical target detection data set;

Step 2: building a convolutional neural network structure based on Anchor-free target detection; the method uses the graying and the edge extraction of the Laplace operator as a data enhancement mode, so that the model focuses on the shape information of the target more; the weight based on the ratio of the long axis to the short axis is added on the loss function of the angle parameter regression, so that the adverse effect of the ratio of the long axis to the short axis on the angle regression accuracy is solved; the Wasserstein distance of two-dimensional Gaussian distribution is used for calculating the similarity of a real boundary frame and a prediction boundary frame on the design of a loss function, and the similarity is used as a part of the loss function, so that the regression accuracy of a model is improved; meanwhile, binary mask prediction branches are used, and model parameters can be further optimized based on the thought of hard parameter sharing of multi-task learning, so that regression accuracy is improved.

Step 3: training parameters of the established convolutional neural network by using a universal elliptical target detection data set;

Step 4: and inputting the picture to be detected into a trained convolutional neural network for detection, and outputting a detection result graph.

The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

(1) The elliptical target detection is carried out by a deep learning method, so that the robustness and generalization are stronger;

(2) Aiming at the elliptical target, the model structure is improved in a targeted manner, and the parameter regression accuracy of the elliptical target is higher.

Drawings

FIG. 1 is a schematic diagram of a general ellipse target detection method based on deep learning, which is implemented by the present invention;

FIG. 2, a convolutional neural network structure based on Anchor-free target detection is established.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention. It should be noted that the words "front", "rear", "left", "right", "upper" and "lower" used in the following description refer to directions in the drawings, and the words "inner" and "outer" refer to directions toward or away from, respectively, the geometric center of a particular component.

The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

Additionally, the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that herein.

Fig. 1 is a schematic diagram of a general ellipse target detection method based on deep learning according to the present invention, and each step is described in detail below with reference to fig. 1.

And 1, constructing a general elliptical target detection data set. The deep learning method requires a large amount of data sets for training, so that we search elliptical target pictures in a real scene from the network, including various categories such as tires, dishes, balls, buttons and the like, and label the elliptical target pictures to construct a universal elliptical target detection data set.

And 2, building a convolutional neural network structure based on Anchor-free target detection, as shown in fig. 2. The model is based on a commonly used Anchor-free target detection network CENTERNET, and some targeted improvements are made on the basis. Before the picture is input into a model network, the picture size is unified to 512 x 512, then the data enhancement of gray scale and 3*3 Laplacian edge extraction is carried out according to the probability p, then the picture is input into a backbone network DLA-34 of the model for feature extraction, the downsampling multiplying power is 4, namely the output feature map size is 1/4 of the input map size, namely 128 x 128, the obtained feature map of the feature is respectively provided with five parallel output heads, and a central point coordinate heat map, a long axis and a short axis, an offset, a rotation angle and a binary mask map of a target are respectively predicted.

In the center point coordinate heat map regression, focal Loss is used as a Loss function in training to perform parameter optimization, and the calculation formula is as follows:

Where xyc represents its coordinates in the feature map, alpha and beta are two superparameters, N is the number of keypoints, Y _xyc represents the real label, Representing the predicted value. The real tag uses gaussian kernel computation, i.e. the center point value is 1, the farther from the center point value is smaller. And acquiring a local maximum value from the 128 x 128 central point coordinate heat map output by the output head through non-maximum inhibition, and selecting the maximum K points as central points of the prediction targets according to the size of the prediction score.

In both long-short axis regression and offset regression, the parameters are optimized by using the smooth-L1 Loss as a Loss function in training, and the calculation formula is as follows:

The output head outputs K groups of long and short axes and offset. The offset is used to solve the accuracy loss caused by the model in the down sampling process.

In the rotation angle regression, the smooth-L1 Loss was also used as a Loss function during training for parameter optimization. On this basis, the model uses a weight based on the ratio of the long axis to the short axis, and is multiplied by an angle loss function in the loss calculation. Specifically, given a threshold, when the ratio of the target predicted long-short axis is greater than the threshold, the weight is reset to 2, otherwise the weight is reset to 1. Therefore, the model pays attention to the regression accuracy of the rotation angle of the oblong elliptical target, and the influence of the rotation angle on the fitting effect is more obvious. The overall calculation formula is as follows:

Where w _θ is a weight based on the ratio of the long axis to the short axis, R is a threshold value of the ratio of the long axis to the short axis, θ _p is a predicted value of the rotation angle, and θ _g is a true value of the rotation angle.

In binary mask prediction, binary cross entropy is used as a loss function to perform parameter optimization, and the real labels of the binary cross entropy are a series of elliptical masks drawn according to transformation of the real labels of five parameters of an elliptical target to 1/4 original size. The feature map output by the output head is 128 x 128, the value of each point is 1 or 0,1 represents the target area, and 0 represents the non-target background area. The branch does not directly participate in regression of the ellipse parameters, but rather, the model parameters are further optimized based on hard parameter sharing of multitask learning, so that the detection precision is indirectly improved.

Furthermore, in the loss function calculation, the model also increases the loss function of the Wasserstein distance based on two-dimensional Gaussian distribution. The two-dimensional Gaussian distribution is represented by a mean value and a covariance, and for the conversion from the elliptical parameter to the two-dimensional Gaussian distribution, the mean value is equal to the abscissa of the center point, and the covariance matrix can be represented by the long and short axes of the ellipse and the rotation angle, and the specific calculation method is as follows:

μ＝[x,y]

Where μ and Σ represent the mean and covariance of the two-dimensional gaussian distribution, x and y are the ellipse center point abscissas, a and b are the ellipse major and minor axes, and θ is the ellipse rotation angle. And then calculating the fitting degree of the real boundary frame and the prediction boundary frame through the Wasserstein distance, wherein the calculation formula is as follows:

Where μ ₁,Σ₁ and μ ₂,Σ₂ are the mean and covariance matrices of the two-dimensional gaussian distribution of predicted and actual values, respectively.

And step 3, training parameters of the constructed convolutional neural network model by using the established general elliptical target detection data set.

And step 4, inputting the picture to be predicted into a trained convolutional neural network model to obtain a detection result diagram.

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features.

Claims

1. The general elliptical target detection method based on deep learning is characterized by comprising the following steps of:

step (1) constructing a general elliptical target detection data set;

Step (2) building a convolutional neural network structure based on Anchor-free target detection; in the convolutional neural network structure based on Anchor-free target detection established in the step (2), the edge extraction of the graying and Laplacian is used as a data enhancement mode, so that the model focuses on the shape information of the target more; the weight based on the ratio of the long axis to the short axis is added on the loss function of the angle parameter regression, so that the adverse effect of the ratio of the long axis to the short axis on the angle regression accuracy is solved; the Wasserstein distance of two-dimensional Gaussian distribution is used for calculating the similarity of a real boundary frame and a prediction boundary frame on the design of a loss function, and the similarity is used as a part of the loss function, so that the regression accuracy of a model is improved; meanwhile, binary mask prediction branches are used, and model parameters can be further optimized based on the thought of hard parameter sharing of multi-task learning, so that regression accuracy is improved; in the step (2): before a picture is input into a model network, unifying the picture size to 512 x 512, carrying out grey scale treatment and data enhancement extracted by the Laplacian edge of 3*3 according to probability p, then inputting the picture into a backbone network DLA-34 of the model for feature extraction, wherein the downsampling magnification is 4, namely the output feature map size is 1/4 of the input map size, namely 128 x 128, and the obtained feature map of the feature is respectively provided with five parallel output heads for respectively predicting a central point coordinate heat map, a long axis and a short axis of a target, an offset, a rotation angle and a binary mask map;

In the center point coordinate heat map regression, focalLoss is used as a loss function in training to perform parameter optimization, and the calculation formula is as follows:

Where xyc represents its coordinates in the feature map, alpha and beta are two superparameters, N is the number of keypoints, Y _xyc represents the real label, Representing the predicted value; the real label uses Gaussian kernel calculation, namely, the center point value is 1, and the farther the center point value is, the smaller the value is; the method comprises the steps of firstly obtaining a local maximum value through non-maximum inhibition of a 128 x 128 central point coordinate heat map output by an output head, and then selecting the maximum K points as central points of a prediction target according to the size of a prediction score;

the output head outputs K groups of long and short axes and offset; the offset is used for solving the precision loss brought by the model in the down sampling process;

In the rotation angle regression, the smooth-L1 Loss is also used as a Loss function in training to perform parameter optimization; on the basis, the model uses a weight based on the ratio of the long axis to the short axis, and is compared with an angle loss function in the loss calculation; specifically, a threshold is given, when the ratio of the target predicted length to the target predicted short length is larger than the threshold, the weight is reset to 2, otherwise, the weight is reset to 1; therefore, the model pays attention to the regression accuracy of the rotation angle of the oblong elliptical target, and the overall calculation formula is as follows:

Wherein w _θ is a weight based on the ratio of the long axis to the short axis, R is a threshold value of the ratio of the long axis to the short axis, θ _p is a predicted value of the rotation angle, and θ _g is a true value of the rotation angle;

in binary mask prediction, binary cross entropy is used as a loss function to perform parameter optimization, and the real labels of the binary cross entropy are a series of elliptical masks drawn according to transformation of the real labels of five parameters of an elliptical target to 1/4 original size; the feature map output by the output head is 128 x 128, the value of each point is 1 or 0,1 represents a target area, and 0 represents a non-target background area; the branches do not directly participate in regression of the ellipse parameters, but are further optimized based on hard parameter sharing of multitask learning, so that detection accuracy is indirectly improved;

In addition, in the loss function calculation, the model also increases a loss function of Wasserstein distance based on two-dimensional Gaussian distribution; the two-dimensional Gaussian distribution is represented by a mean value and a covariance, and for the conversion from the elliptical parameter to the two-dimensional Gaussian distribution, the mean value is equal to the abscissa of the center point, and the covariance matrix can be represented by the long and short axes of the ellipse and the rotation angle, and the specific calculation method is as follows:

μ＝[x,y]

wherein mu and sigma represent the mean and covariance of the two-dimensional Gaussian distribution, x and y are the abscissa of the ellipse center point, a and b are the ellipse long and short axes, and θ is the ellipse rotation angle; and then calculating the fitting degree of the real boundary frame and the prediction boundary frame through the Wasserstein distance, wherein the calculation formula is as follows:

Wherein μ ₁,Σ₁ and μ ₂,Σ₂ are the mean and covariance matrices of two-dimensional gaussian distributions of predicted and true values, respectively

Training the parameters of the established convolutional neural network by using a universal elliptical target detection data set;

And (4) inputting the picture to be detected into a trained convolutional neural network for detection, and outputting a detection result graph.

2. The method for detecting the general elliptical target based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step (1), a general elliptical target detection data set is constructed, wherein the data set comprises a plurality of types of elliptical objects in a real scene, and 537 pictures are taken in total.