CN115049848B - General elliptical target detection method based on deep learning - Google Patents

General elliptical target detection method based on deep learning Download PDF

Info

Publication number
CN115049848B
CN115049848B CN202210768594.5A CN202210768594A CN115049848B CN 115049848 B CN115049848 B CN 115049848B CN 202210768594 A CN202210768594 A CN 202210768594A CN 115049848 B CN115049848 B CN 115049848B
Authority
CN
China
Prior art keywords
target
regression
elliptical
loss function
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210768594.5A
Other languages
Chinese (zh)
Other versions
CN115049848A (en
Inventor
王田浩
夏思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210768594.5A priority Critical patent/CN115049848B/en
Publication of CN115049848A publication Critical patent/CN115049848A/en
Application granted granted Critical
Publication of CN115049848B publication Critical patent/CN115049848B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a general elliptical target detection method based on deep learning, which comprises the following steps: constructing a universal elliptical target detector; building a convolutional neural network structure based on Anchor-free target detection; training parameters of the established convolutional neural network by using a universal elliptical target detection data set; and inputting the picture to be detected into a trained convolutional neural network for detection, and outputting a detection result graph. According to the invention, by using the Anchor-free target detection convolutional neural network, any elliptical target can be automatically and accurately detected and positioned.

Description

General elliptical target detection method based on deep learning
Technical Field
The invention relates to the field of pattern recognition, in particular to a general elliptical target detection method based on deep learning.
Background
Geometry is one of the main features of an object and has important significance for detection and identification of targets. In some application scenarios, shape information such as medical image diagnosis and object counting is often of greater concern. Elliptical shapes are more common in real scenes, have a better fitting effect on objects, are easier to express in mathematical parameters, and are therefore more common in shape detection. The traditional ellipse detection method usually establishes a mathematical model according to the edges, contours, curvatures and the like of images to fit a target ellipse, but the detection effect of the method depends on the quality of image preprocessing and edge extraction to a great extent, has poor resistance to irrelevant interference lines and noise, and has weak robustness. We therefore use a deep learning approach to enhance the robustness and generalization of the detection method.
Disclosure of Invention
In order to solve the problems, the invention discloses a general elliptical target detection method based on deep learning, and the method can realize detection and parameter regression of any elliptical target.
The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: the invention designs a general elliptical target detection method based on deep learning, which comprises the following steps:
step 1: constructing a general elliptical target detection data set;
Step 2: building a convolutional neural network structure based on Anchor-free target detection; the method uses the graying and the edge extraction of the Laplace operator as a data enhancement mode, so that the model focuses on the shape information of the target more; the weight based on the ratio of the long axis to the short axis is added on the loss function of the angle parameter regression, so that the adverse effect of the ratio of the long axis to the short axis on the angle regression accuracy is solved; the Wasserstein distance of two-dimensional Gaussian distribution is used for calculating the similarity of a real boundary frame and a prediction boundary frame on the design of a loss function, and the similarity is used as a part of the loss function, so that the regression accuracy of a model is improved; meanwhile, binary mask prediction branches are used, and model parameters can be further optimized based on the thought of hard parameter sharing of multi-task learning, so that regression accuracy is improved.
Step 3: training parameters of the established convolutional neural network by using a universal elliptical target detection data set;
Step 4: and inputting the picture to be detected into a trained convolutional neural network for detection, and outputting a detection result graph.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
(1) The elliptical target detection is carried out by a deep learning method, so that the robustness and generalization are stronger;
(2) Aiming at the elliptical target, the model structure is improved in a targeted manner, and the parameter regression accuracy of the elliptical target is higher.
Drawings
FIG. 1 is a schematic diagram of a general ellipse target detection method based on deep learning, which is implemented by the present invention;
FIG. 2, a convolutional neural network structure based on Anchor-free target detection is established.
Detailed Description
The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention. It should be noted that the words "front", "rear", "left", "right", "upper" and "lower" used in the following description refer to directions in the drawings, and the words "inner" and "outer" refer to directions toward or away from, respectively, the geometric center of a particular component.
The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.
Additionally, the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that herein.
Fig. 1 is a schematic diagram of a general ellipse target detection method based on deep learning according to the present invention, and each step is described in detail below with reference to fig. 1.
And 1, constructing a general elliptical target detection data set. The deep learning method requires a large amount of data sets for training, so that we search elliptical target pictures in a real scene from the network, including various categories such as tires, dishes, balls, buttons and the like, and label the elliptical target pictures to construct a universal elliptical target detection data set.
And 2, building a convolutional neural network structure based on Anchor-free target detection, as shown in fig. 2. The model is based on a commonly used Anchor-free target detection network CENTERNET, and some targeted improvements are made on the basis. Before the picture is input into a model network, the picture size is unified to 512 x 512, then the data enhancement of gray scale and 3*3 Laplacian edge extraction is carried out according to the probability p, then the picture is input into a backbone network DLA-34 of the model for feature extraction, the downsampling multiplying power is 4, namely the output feature map size is 1/4 of the input map size, namely 128 x 128, the obtained feature map of the feature is respectively provided with five parallel output heads, and a central point coordinate heat map, a long axis and a short axis, an offset, a rotation angle and a binary mask map of a target are respectively predicted.
In the center point coordinate heat map regression, focal Loss is used as a Loss function in training to perform parameter optimization, and the calculation formula is as follows:
Where xyc represents its coordinates in the feature map, alpha and beta are two superparameters, N is the number of keypoints, Y xyc represents the real label, Representing the predicted value. The real tag uses gaussian kernel computation, i.e. the center point value is 1, the farther from the center point value is smaller. And acquiring a local maximum value from the 128 x 128 central point coordinate heat map output by the output head through non-maximum inhibition, and selecting the maximum K points as central points of the prediction targets according to the size of the prediction score.
In both long-short axis regression and offset regression, the parameters are optimized by using the smooth-L1 Loss as a Loss function in training, and the calculation formula is as follows:
The output head outputs K groups of long and short axes and offset. The offset is used to solve the accuracy loss caused by the model in the down sampling process.
In the rotation angle regression, the smooth-L1 Loss was also used as a Loss function during training for parameter optimization. On this basis, the model uses a weight based on the ratio of the long axis to the short axis, and is multiplied by an angle loss function in the loss calculation. Specifically, given a threshold, when the ratio of the target predicted long-short axis is greater than the threshold, the weight is reset to 2, otherwise the weight is reset to 1. Therefore, the model pays attention to the regression accuracy of the rotation angle of the oblong elliptical target, and the influence of the rotation angle on the fitting effect is more obvious. The overall calculation formula is as follows:
Where w θ is a weight based on the ratio of the long axis to the short axis, R is a threshold value of the ratio of the long axis to the short axis, θ p is a predicted value of the rotation angle, and θ g is a true value of the rotation angle.
In binary mask prediction, binary cross entropy is used as a loss function to perform parameter optimization, and the real labels of the binary cross entropy are a series of elliptical masks drawn according to transformation of the real labels of five parameters of an elliptical target to 1/4 original size. The feature map output by the output head is 128 x 128, the value of each point is 1 or 0,1 represents the target area, and 0 represents the non-target background area. The branch does not directly participate in regression of the ellipse parameters, but rather, the model parameters are further optimized based on hard parameter sharing of multitask learning, so that the detection precision is indirectly improved.
Furthermore, in the loss function calculation, the model also increases the loss function of the Wasserstein distance based on two-dimensional Gaussian distribution. The two-dimensional Gaussian distribution is represented by a mean value and a covariance, and for the conversion from the elliptical parameter to the two-dimensional Gaussian distribution, the mean value is equal to the abscissa of the center point, and the covariance matrix can be represented by the long and short axes of the ellipse and the rotation angle, and the specific calculation method is as follows:
μ=[x,y]
Where μ and Σ represent the mean and covariance of the two-dimensional gaussian distribution, x and y are the ellipse center point abscissas, a and b are the ellipse major and minor axes, and θ is the ellipse rotation angle. And then calculating the fitting degree of the real boundary frame and the prediction boundary frame through the Wasserstein distance, wherein the calculation formula is as follows:
Where μ 11 and μ 22 are the mean and covariance matrices of the two-dimensional gaussian distribution of predicted and actual values, respectively.
And step 3, training parameters of the constructed convolutional neural network model by using the established general elliptical target detection data set.
And step 4, inputting the picture to be predicted into a trained convolutional neural network model to obtain a detection result diagram.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features.

Claims (2)

1. The general elliptical target detection method based on deep learning is characterized by comprising the following steps of:
step (1) constructing a general elliptical target detection data set;
Step (2) building a convolutional neural network structure based on Anchor-free target detection; in the convolutional neural network structure based on Anchor-free target detection established in the step (2), the edge extraction of the graying and Laplacian is used as a data enhancement mode, so that the model focuses on the shape information of the target more; the weight based on the ratio of the long axis to the short axis is added on the loss function of the angle parameter regression, so that the adverse effect of the ratio of the long axis to the short axis on the angle regression accuracy is solved; the Wasserstein distance of two-dimensional Gaussian distribution is used for calculating the similarity of a real boundary frame and a prediction boundary frame on the design of a loss function, and the similarity is used as a part of the loss function, so that the regression accuracy of a model is improved; meanwhile, binary mask prediction branches are used, and model parameters can be further optimized based on the thought of hard parameter sharing of multi-task learning, so that regression accuracy is improved; in the step (2): before a picture is input into a model network, unifying the picture size to 512 x 512, carrying out grey scale treatment and data enhancement extracted by the Laplacian edge of 3*3 according to probability p, then inputting the picture into a backbone network DLA-34 of the model for feature extraction, wherein the downsampling magnification is 4, namely the output feature map size is 1/4 of the input map size, namely 128 x 128, and the obtained feature map of the feature is respectively provided with five parallel output heads for respectively predicting a central point coordinate heat map, a long axis and a short axis of a target, an offset, a rotation angle and a binary mask map;
In the center point coordinate heat map regression, focalLoss is used as a loss function in training to perform parameter optimization, and the calculation formula is as follows:
Where xyc represents its coordinates in the feature map, alpha and beta are two superparameters, N is the number of keypoints, Y xyc represents the real label, Representing the predicted value; the real label uses Gaussian kernel calculation, namely, the center point value is 1, and the farther the center point value is, the smaller the value is; the method comprises the steps of firstly obtaining a local maximum value through non-maximum inhibition of a 128 x 128 central point coordinate heat map output by an output head, and then selecting the maximum K points as central points of a prediction target according to the size of a prediction score;
In both long-short axis regression and offset regression, the parameters are optimized by using the smooth-L1 Loss as a Loss function in training, and the calculation formula is as follows:
the output head outputs K groups of long and short axes and offset; the offset is used for solving the precision loss brought by the model in the down sampling process;
In the rotation angle regression, the smooth-L1 Loss is also used as a Loss function in training to perform parameter optimization; on the basis, the model uses a weight based on the ratio of the long axis to the short axis, and is compared with an angle loss function in the loss calculation; specifically, a threshold is given, when the ratio of the target predicted length to the target predicted short length is larger than the threshold, the weight is reset to 2, otherwise, the weight is reset to 1; therefore, the model pays attention to the regression accuracy of the rotation angle of the oblong elliptical target, and the overall calculation formula is as follows:
Wherein w θ is a weight based on the ratio of the long axis to the short axis, R is a threshold value of the ratio of the long axis to the short axis, θ p is a predicted value of the rotation angle, and θ g is a true value of the rotation angle;
in binary mask prediction, binary cross entropy is used as a loss function to perform parameter optimization, and the real labels of the binary cross entropy are a series of elliptical masks drawn according to transformation of the real labels of five parameters of an elliptical target to 1/4 original size; the feature map output by the output head is 128 x 128, the value of each point is 1 or 0,1 represents a target area, and 0 represents a non-target background area; the branches do not directly participate in regression of the ellipse parameters, but are further optimized based on hard parameter sharing of multitask learning, so that detection accuracy is indirectly improved;
In addition, in the loss function calculation, the model also increases a loss function of Wasserstein distance based on two-dimensional Gaussian distribution; the two-dimensional Gaussian distribution is represented by a mean value and a covariance, and for the conversion from the elliptical parameter to the two-dimensional Gaussian distribution, the mean value is equal to the abscissa of the center point, and the covariance matrix can be represented by the long and short axes of the ellipse and the rotation angle, and the specific calculation method is as follows:
μ=[x,y]
wherein mu and sigma represent the mean and covariance of the two-dimensional Gaussian distribution, x and y are the abscissa of the ellipse center point, a and b are the ellipse long and short axes, and θ is the ellipse rotation angle; and then calculating the fitting degree of the real boundary frame and the prediction boundary frame through the Wasserstein distance, wherein the calculation formula is as follows:
Wherein μ 11 and μ 22 are the mean and covariance matrices of two-dimensional gaussian distributions of predicted and true values, respectively
Training the parameters of the established convolutional neural network by using a universal elliptical target detection data set;
And (4) inputting the picture to be detected into a trained convolutional neural network for detection, and outputting a detection result graph.
2. The method for detecting the general elliptical target based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step (1), a general elliptical target detection data set is constructed, wherein the data set comprises a plurality of types of elliptical objects in a real scene, and 537 pictures are taken in total.
CN202210768594.5A 2022-07-01 2022-07-01 General elliptical target detection method based on deep learning Active CN115049848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210768594.5A CN115049848B (en) 2022-07-01 2022-07-01 General elliptical target detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210768594.5A CN115049848B (en) 2022-07-01 2022-07-01 General elliptical target detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN115049848A CN115049848A (en) 2022-09-13
CN115049848B true CN115049848B (en) 2024-07-05

Family

ID=83165299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210768594.5A Active CN115049848B (en) 2022-07-01 2022-07-01 General elliptical target detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN115049848B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298169A (en) * 2021-06-02 2021-08-24 浙江工业大学 Convolutional neural network-based rotating target detection method and device
CN114372502A (en) * 2021-12-02 2022-04-19 北京工业大学 Angle self-adaptive ellipse template target detector

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259930B (en) * 2020-01-09 2023-04-25 南京信息工程大学 General target detection method of self-adaptive attention guidance mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298169A (en) * 2021-06-02 2021-08-24 浙江工业大学 Convolutional neural network-based rotating target detection method and device
CN114372502A (en) * 2021-12-02 2022-04-19 北京工业大学 Angle self-adaptive ellipse template target detector

Also Published As

Publication number Publication date
CN115049848A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
US11908244B2 (en) Human posture detection utilizing posture reference maps
CN110084173B (en) Human head detection method and device
CN109886121B (en) Human face key point positioning method for shielding robustness
Santoni et al. Cattle race classification using gray level co-occurrence matrix convolutional neural networks
CN110705425B (en) Tongue picture multi-label classification method based on graph convolution network
CN110909618B (en) Method and device for identifying identity of pet
CN111160269A (en) Face key point detection method and device
CN111784747B (en) Multi-target vehicle tracking system and method based on key point detection and correction
CN102332084B (en) Identity identification method based on palm print and human face feature extraction
Hagelskjær et al. Pointvotenet: Accurate object detection and 6 dof pose estimation in point clouds
CN114821014B (en) Multi-mode and countermeasure learning-based multi-task target detection and identification method and device
Fang et al. Laser stripe image denoising using convolutional autoencoder
WO2013075295A1 (en) Clothing identification method and system for low-resolution video
CN110188646B (en) Human ear identification method based on fusion of gradient direction histogram and local binary pattern
Suárez et al. Cross-spectral image patch similarity using convolutional neural network
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN114118303A (en) Face key point detection method and device based on prior constraint
Marzan et al. Towards tobacco leaf detection using Haar cascade classifier and image processing techniques
Patil et al. Techniques of deep learning for image recognition
CN115049848B (en) General elliptical target detection method based on deep learning
Zhao Motion track enhancement method of sports video image based on otsu algorithm
CN108665470B (en) Interactive contour extraction method
Li et al. Brown rice germ integrity identification based on deep learning network
CN109146861A (en) A kind of improved ORB feature matching method
Bai et al. Robust visual tracking via ranking SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant