CN113673540A - Target detection method based on positioning information guidance - Google Patents

Target detection method based on positioning information guidance Download PDF

Info

Publication number
CN113673540A
CN113673540A CN202110960804.6A CN202110960804A CN113673540A CN 113673540 A CN113673540 A CN 113673540A CN 202110960804 A CN202110960804 A CN 202110960804A CN 113673540 A CN113673540 A CN 113673540A
Authority
CN
China
Prior art keywords
rectangular anchor
frame
matrix
target detection
prior rectangular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110960804.6A
Other languages
Chinese (zh)
Inventor
缪玲娟
明奇
周志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110960804.6A priority Critical patent/CN113673540A/en
Publication of CN113673540A publication Critical patent/CN113673540A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method based on positioning information guidance, which is characterized in that positioning information represented by an intersection ratio is embedded into a classification task to serve as a training label, and a hierarchical intersection ratio function enables continuous intersection ratio to better adapt to the classification task than discretization; on one hand, compared with discrete intersection ratio, learning continuous intersection ratio in the classification task does not bring great performance gain, but makes the optimization process slower or even not converge; on the other hand, for the anchor frame with low overlap ratio with the object, learning intersection is meaningless; therefore, the invention guides the classification task by improving the real label of the classification branch and adopting the cross-comparison index with the positioning accuracy consistent with that of the detection frame, and meanwhile, the invention also adopts the hierarchical cross-comparison function to carry out hierarchical processing on the cross-comparison vector, thereby keeping the uniformity of the classification and regression tasks and effectively improving the detection accuracy.

Description

Target detection method based on positioning information guidance
Technical Field
The invention belongs to the technical fields of computer vision identification technology, artificial intelligence and target detection, and particularly relates to a target detection method based on positioning information guidance.
Background
In the last decade, deep learning has been rapidly developed and gradually becomes the mainstream development direction in the field of artificial intelligence. Artificial intelligence methods represented by deep learning have brought breakthrough development in the fields of computer vision, natural language processing and the like. Computer vision has been a major area in computer technology and has been receiving widespread attention in the academic and industrial fields. Thanks to the revolutionary progress of deep learning, artificial intelligence also brings new ideas and methods for the field of computer vision and obtains good results.
Object detection is a basic task of computer vision, which aims at identifying and detecting objects appearing in an image. Its tasks may be specifically defined as: and giving an input image, judging whether the image contains the target or not, and if so, giving the position and the category of the target. The target detection is the basis of the applications such as target tracking, object segmentation, auxiliary navigation and the like, so the method has wide application scenes and great significance. However, the conventional target detection algorithm has a bottleneck in both detection accuracy and operation speed, and cannot meet wider application requirements, and a target detection technology needs to be urgently innovated.
The successful application of the deep learning technology in the field breaks through the elbow control of the traditional method, and great success is achieved. A series of algorithms represented by a convolutional neural network can efficiently and automatically extract features in an image for a target detection task, and the speed and the precision far exceeding those of the traditional algorithms are realized. The model based on the convolutional neural network first extracts a feature map from an image to be detected through convolution and pooling operations. Then, on the extracted multi-scale features, the deviation of an initially set prior frame (also called an anchor frame) relative to the target in the image to be detected is predicted, and the category of the target is predicted. When the detection result is output, the redundant detection frames are inhibited (also called non-maximum inhibition) to obtain sparse and effective detection frames, so that the detection of the target in the image is completed.
At present, most algorithms based on the convolutional neural network divide a target detection task into two subtasks of classification and positioning, and respectively predict without mutual interference. And in the non-maximum value inhibition stage, the classified prediction result is selected as a basis for the prediction frame, and the detection result is output. However, the classification and localization are handled separately by the algorithm, which results in a large divergence between the two tasks, i.e. the classification score does not characterize the localization accuracy of the prediction box. This will cause the output detection result after non-maximum suppression to be unreliable, reducing the detection performance.
Disclosure of Invention
In order to solve the problems, the invention provides a target detection method based on positioning information guidance, which can better adapt to classification tasks and improve detection precision.
A target detection method based on positioning information guidance inputs an image to be detected into a trained target detection model to obtain the category and the positioning result of an object contained in the image to be detected; the training method of the target detection model comprises the following steps:
s1: performing feature extraction on the sample image by adopting a first convolution branch to obtain a feature map, and meanwhile, setting a plurality of prior rectangular anchor frames on the feature map, wherein the number and the positions of the prior rectangular anchor frames meet the coverage of the whole feature map;
s2: performing convolution operation on the characteristic diagram by respectively adopting a second convolution branch and a third convolution branch to correspondingly obtain a classification matrix S of a prior rectangular anchor frame and an offset matrix D of the prior rectangular anchor frame, wherein B is a position matrix of the prior rectangular anchor frame, and P is an offset matrix of the prior rectangular anchor frame relative to the prior rectangular anchor frame;
s3: constructing a total loss function L, calculating and judging whether the total loss function L is smaller than a set value, if not, changing convolution layer parameters of the first convolution branch, the second convolution branch, the third convolution branch and the fourth convolution branch, and repeating the steps S1 to S3 until the total loss function L is smaller than the set value; if the target detection result is less than the target detection result, the first convolution branch, the second convolution branch and the third convolution branch are final target detection models;
the total loss function L is constructed by the following method:
s31: respectively obtaining the intersection and parallel ratio between each prior rectangular anchor frame and each object surrounding frame in the sample image to obtain an M multiplied by N dimensional intersection and parallel ratio matrix IoU, wherein M is the number of the prior rectangular anchor frames, and N is the number of the object surrounding frames;
s32: extracting maximum elements from each row of the intersection ratio matrix IoU to construct an M × 1-dimensional intersection ratio vector IoUmaxRespectively matching the positions of the prior rectangular anchor frames corresponding to the rows where the maximum elements are located with the positions of the object surrounding frames corresponding to the columns where the maximum elements are located to obtain an object position matrix G 'expected to be learned by the prior rectangular anchor frames, and meanwhile, forming an M multiplied by 1-dimensional class vector CLS by the class number of each object in the G';
s33: cross-over vector IoU using hierarchical cross-over functionmaxCarrying out hierarchical processing to obtain a hierarchical vector hIoU;
s34: constructing a total loss function L of the target detection model according to the hierarchical vector hIoU, the class vector CLS, the classification matrix S and the offset matrix D as follows:
L=|S-hIoU·hIoUT·OneHot(CLS)|+|D[pos]-G′[pos]|
wherein, T is the transpose, onehot (CLS) represents the conversion of CLS into one-hot code, D [ pos ] is the position coordinate after the shift of the prior rectangular anchor frame selected as the positive sample in the shift matrix D, and G' [ pos ] is the position coordinate of the object expected to be learned by the prior rectangular anchor frame selected as the positive sample.
Further, the method for selecting the prior rectangular anchor frame as the positive sample comprises the following steps:
the cross-over-ratio vector IoU is cross-over-ratio using a hierarchical cross-over-ratio function as followsmaxCarrying out hierarchical processing to obtain a hierarchical vector hIoU:
Figure BDA0003222262710000041
IoUmax(m) is a cross-over ratio vector IoUmaxThe mth element in the hierarchical vector hlou, (m) is the mth element in the hierarchical vector hlou, δ is a set interval division threshold, and E (·) is a lower rounding function;
will satisfy IoUmax(m)>A priori rectangular anchor box of 0.5 is used as a positive sample for training.
Further, the calculation formula of the intersection ratio between each prior rectangular anchor frame and each object enclosure frame in the sample image is as follows:
Figure BDA0003222262710000042
wherein IoU represents the intersection and comparison between any two prior rectangular anchor frames and the object enclosure frame, A ^ B represents the intersection of any two prior rectangular anchor frames and the object enclosure frame, and AomebB represents the union of any two prior rectangular anchor frames and the object enclosure frame.
And further, after the image to be detected is input into the trained target detection model, performing non-maximum suppression and redundant frame elimination on the obtained belonged category and positioning result, and taking the result after the non-maximum suppression and the redundant frame elimination as the final belonged category and positioning result of the object contained in the image to be detected.
Further, a classification matrix S of the prior rectangular anchor frame is an M x C dimensional matrix, wherein C is the number of all possible classes of the object, and each element in the classification matrix S represents the probability value of each prior rectangular anchor frame belonging to each class;
and taking the category corresponding to the maximum element in each row of the classification matrix S as the category of the prior rectangular anchor frame corresponding to each row, wherein each maximum element is the category score of the category of each prior rectangular anchor frame.
Has the advantages that:
1. the invention provides a target detection method based on positioning information guidance, which is characterized in that positioning information represented by an intersection ratio is embedded into a classification task to serve as a training label, and a hierarchical intersection ratio function enables continuous intersection ratio to better adapt to the classification task than discretization; on one hand, compared with discrete intersection ratio, learning continuous intersection ratio in the classification task does not bring great performance gain, but makes the optimization process slower or even not converge; on the other hand, for the anchor frame with low overlap ratio with the object, learning intersection is meaningless; therefore, the invention guides the classification task by improving the real label of the classification branch and adopting the cross-comparison index with the positioning accuracy consistent with that of the detection frame, and meanwhile, the invention also adopts the hierarchical cross-comparison function to carry out hierarchical processing on the cross-comparison vector, thereby keeping the uniformity of the classification and regression tasks and effectively improving the detection accuracy.
2. The invention provides a target detection method based on positioning information guidance, which expands each category number in a category vector CLS into a one-hot coding form, and obtains an expected discrete cross-over ratio after hierarchical cross-over ratio weighting by an OneHot (CLS), thereby realizing the optimization of the representation form of the cross-over ratio in a classification task; then, the merging ratio is used as a real label to replace the traditional one-hot coding, so that the fine-grained discrete positioning information supervision is introduced into the classification task, the classification task can be better adapted, and the more accurate performance evaluation of the prediction result can be realized.
3. The invention provides a target detection method based on positioning information guidance, which directly sets the cross-comparison labels of the anchor frames to be 0 by adopting a hierarchical cross-comparison processing function, thereby facilitating the optimization and convergence of the algorithm and not influencing the final detection performance.
4. The invention provides a target detection method based on positioning information guidance, wherein each element in a classification matrix S represents the probability value of each prior rectangular anchor frame belonging to each category, so that the target detection model not only can obtain the final belonging category of an object contained in an image to be detected, but also can obtain the score of the final belonging category of the object contained in the image to be detected, and is favorable for more accurate evaluation of a classification prediction result.
Drawings
Fig. 1 is a flowchart of a training method of a target detection model provided in the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Most of the existing target detection algorithms based on the convolutional neural network divide a detection task into an independent multi-classification task and a target position positioning task, so that the two tasks are excessively split, and misjudgment is easy to occur when a non-maximum value is used for inhibiting and screening repeated detection frames. The invention provides a target detection method based on positioning information guidance, which improves the real label of a classification branch on one hand, and guides a classification task by adopting an intersection ratio index consistent with the positioning accuracy of a detection frame; on the other hand, the representation form of the intersection in the classification task is optimized, and the discrete positioning information supervision is adopted, so that the method is better suitable for the classification task. Finally, the purposes of keeping the uniformity of classification and regression tasks and improving the detection precision are achieved.
A target detection method based on positioning information guidance comprises the following specific implementation steps:
inputting the image to be detected into a trained target detection model to obtain the initial category and the initial positioning result of the object contained in the image to be detected; performing Non-maximum suppression (Non maximum suppression) and redundant frame elimination on the obtained initial belonged category and initial positioning result, and taking the result after the Non-maximum suppression and the redundant frame elimination as the final belonged category cls of the object contained in the image to be detected*And the positioning result D*
As shown in fig. 1, the training method of the target detection model includes the following steps:
s1: and performing feature extraction on the sample image by adopting a first convolution branch to obtain a feature map, and meanwhile, setting a plurality of prior rectangular anchor frames on the feature map, wherein the number and the position of the prior rectangular anchor frames meet the coverage of the whole feature map.
It should be noted that, in the present invention, for an input image I, a convolution operation is adopted to extract features, and five times of downsampling are performed to obtain a feature map, where a formula of the convolution operation is:
o(i,j)=∑mnX(m,n)·K(i-m,j-n)
in the formula, X (m, n) represents a numerical value of each pixel point position of an input image or feature map; k (i-m, j-n) represents the value of the convolution kernel location at the corresponding location.
Meanwhile, the invention marks a single anchor frame as b ═ (x, y, w, h). b is a four-dimensional vector, where (x, y) represents the position of the center point of the rectangular anchor box on the feature map, and (w, h) represents the width and height of the rectangular anchor box. All anchor frames are represented as B, which is an M multiplied by 4 matrix, and M represents the total number of the anchor frames; all objects in the image are denoted as G, which is an N × 4 matrix, where N represents the total number of objects.
S2: and performing convolution operation on the characteristic diagram by respectively adopting a second convolution branch and a third convolution branch to correspondingly obtain a classification matrix S of the prior rectangular anchor frame and an offset matrix D of the prior rectangular anchor frame, wherein B is a position matrix of the prior rectangular anchor frame, and P is an offset matrix of the prior rectangular anchor frame relative to the prior rectangular anchor frame.
It should be noted that the second convolution branch is used to obtain the classification prediction result as the classification branch, and the output S of the second convolution branch is an M × C matrix, where C is the number of all possible classes of the object. S represents the probability value that each anchor box belongs to each category; further, the final category score for each anchor box may be calculated by the following formula:
s=max(S,0)
cls=arg max(S,0)
the above expression represents that the category corresponding to the largest element in each row of the classification matrix S is used as the category to which the prior rectangular anchor frame corresponding to each row belongs, and each largest element is the category score of the category to which each prior rectangular anchor frame belongs and the probability scores S of all anchor frames at the same time, and the corresponding category index cls is obtained. Therefore, the target detection model of the invention obtains the final class cls of the object contained in the image to be detected*The score, that is, the probability, of the category to which the object included in the image to be measured finally belongs can also be obtained.
The third convolution branch is used as a positioning branch and is used for obtaining a positioning prediction result, and the output D of the third convolution branch is an M multiplied by 4 matrix which represents the sum of the anchor frame matrix B and the offset P; that is, the third convolution branch first obtains the offset of each anchor frame, and then superimposes the offset on the original position coordinates of each anchor frame to obtain the predicted coordinates of each anchor frame.
S3: constructing a total loss function L, calculating and judging whether the total loss function L is smaller than a set value, if not, changing convolution layer parameters of the first convolution branch, the second convolution branch, the third convolution branch and the fourth convolution branch, and repeating the steps S1 to S3 until the total loss function L is smaller than the set value; and if the target detection model is smaller than the preset target detection model, the first convolution branch, the second convolution branch and the third convolution branch are final target detection models.
The total loss function L is constructed by the following method:
s31: and respectively obtaining the intersection and parallel ratio between each prior rectangular anchor frame and each object surrounding frame in the sample image to obtain an M multiplied by N dimensional intersection and parallel ratio matrix IoU, wherein M is the number of the prior rectangular anchor frames, and N is the number of the object surrounding frames.
It should be noted that the intersection ratio is used to measure the coincidence degree between two bounding boxes, and is calculated as follows:
Figure BDA0003222262710000081
wherein A and B represent two different rectangular bounding boxes; a ≈ B represents the intersection of A and B; and A ≦ B represents the union of A and B. The intersection ratio between all anchor frames and the target object in the image to be detected is obtained and is represented as IoU, and the formula is as follows:
IoU=IoU(B,G)
where IoU is a matrix of dimension M x N.
S32: extracting maximum elements from each row of the intersection ratio matrix IoU to construct an M × 1-dimensional intersection ratio vector IoUmaxAnd respectively matching the positions of the prior rectangular anchor frame corresponding to the row where each maximum element is located with the positions of the object surrounding frames corresponding to the columns where the maximum elements are located to obtain an object position matrix G 'expected to be learned by each prior rectangular anchor frame, and meanwhile, forming an M multiplied by 1-dimensional class vector CLS by the class number of each object in G'.
That is, the present invention finds the largest element of each row and the position index of the row where the elements form an M × 1-dimensional intersectionRatio vector, noted IoUmaxAnd allocating a target with the maximum intersection ratio for each anchor frame according to the position index and enabling the anchor frame to be responsible for predicting the target. These objects constitute a matrix G 'of dimension M × 4, and G' represents a matrix of position coordinates of the object that the anchor frame is expected to learn. Meanwhile, the class number of each object in G' constitutes an M × 1-dimensional class vector, which is denoted as CLS.
S33: cross-over vector IoU using hierarchical cross-over functionmaxAnd carrying out hierarchical processing to obtain a hierarchical vector hIoU.
Wherein, the hierarchical intersection ratio function is as follows:
Figure BDA0003222262710000091
IoUmax(m) is a cross-over ratio vector IoUmaxThe mth element in the hierarchical vector hlou, (m) is the mth element in the hierarchical vector hlou, δ is a set interval division threshold value responsible for dividing intersection into discrete intervals, where the value is 0.1, and E (·) is the following rounding function, and the specific formula is as follows:
Figure BDA0003222262710000092
further, IoU will be satisfiedmaxAnd (m) > 0.5, taking a priori rectangular anchor box as a positive sample used for training, participating in the calculation of the positioning loss, and recording the index of the anchor box in all anchor boxes as pos.
S34: constructing a total loss function L of the target detection model according to the hierarchical vector hIoU, the class vector CLS, the classification matrix S and the offset matrix D as follows:
L=|S-hIoU·hIoUT·OneHot(CLS)|+|D[pos]-G′[pos]|
wherein, T is transposition, D [ pos ] is the position coordinate after the deviation of the prior rectangular anchor frame selected as the positive sample in the deviation matrix D, and G' [ pos ] is the position coordinate of the object expected to be learned by the prior rectangular anchor frame selected as the positive sample. Onehot (CLS) is an M × C matrix, which represents that CLS is converted into one-hot coding, that is, only the position of the category number is 1, and the other positions are 0 row vectors; after the matrix is subjected to hierarchical intersection ratio weighting, an expected discrete intersection ratio can be obtained, the intersection ratio is used as a real label to replace the traditional one-hot coding, fine-grained positioning information supervision is introduced into a classification task, and more accurate performance evaluation of a prediction result can be realized.
It should be noted that the total loss function L is actually composed of a classification loss function and a positioning loss function, and specifically, the following is:
L=Lcls+Lloc
Lcls=|S-hIoU·hIoUT·OneHot(CLS)|
Lloc=|D[pos]-G′[pos]|
wherein L isclsRepresenting the deviation between the prediction result of the classification task and the real label for the classification loss function; l islocFor the localization loss function, the deviation between the predicted position of the anchor frame as a positive sample and the position of the object that the anchor frame is expected to learn is represented.
Therefore, the common target detection algorithm calculates the difference between the prediction result and the real category by adopting the cross entropy loss of the one-hot coding for the classification task, but the index and the positioning task have no relevance, so that the difference exists between the two, and the detection performance is influenced. According to the invention, the positioning information represented by the intersection ratio is embedded into the classification task as a training label, and the hierarchical intersection ratio function adapts continuous intersection to the classification task better than discretization. On the one hand, learning a continuous cross-over ratio in the classification task does not bring a large performance gain, but rather makes the optimization process slower or even non-converging, compared to a discrete cross-over ratio. On the other hand, the intersection comparison is meaningless when the anchor frame with low overlap ratio with the object is learned, so that the intersection comparison label of the anchor frame is directly set to be 0 by the hierarchical intersection comparison processing function, the optimization convergence of the algorithm is facilitated, and the final detection performance is not influenced.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. A target detection method based on positioning information guidance is characterized in that an image to be detected is input into a trained target detection model, and the category and the positioning result of an object contained in the image to be detected are obtained; the training method of the target detection model comprises the following steps:
s1: performing feature extraction on the sample image by adopting a first convolution branch to obtain a feature map, and meanwhile, setting a plurality of prior rectangular anchor frames on the feature map, wherein the number and the positions of the prior rectangular anchor frames meet the coverage of the whole feature map;
s2: performing convolution operation on the characteristic diagram by respectively adopting a second convolution branch and a third convolution branch to correspondingly obtain a classification matrix S of a prior rectangular anchor frame and an offset matrix D of the prior rectangular anchor frame, wherein B is a position matrix of the prior rectangular anchor frame, and P is an offset matrix of the prior rectangular anchor frame relative to the prior rectangular anchor frame;
s3: constructing a total loss function L, calculating and judging whether the total loss function L is smaller than a set value, if not, changing convolution layer parameters of the first convolution branch, the second convolution branch, the third convolution branch and the fourth convolution branch, and repeating the steps S1 to S3 until the total loss function L is smaller than the set value; if the target detection result is less than the target detection result, the first convolution branch, the second convolution branch and the third convolution branch are final target detection models;
the total loss function L is constructed by the following method:
s31: respectively obtaining the intersection and parallel ratio between each prior rectangular anchor frame and each object surrounding frame in the sample image to obtain an M multiplied by N dimensional intersection and parallel ratio matrix IoU, wherein M is the number of the prior rectangular anchor frames, and N is the number of the object surrounding frames;
s32: extracting maximum elements from each row of the intersection ratio matrix IoU to construct an M × 1-dimensional intersection ratio vector IoUmaxAnd respectively corresponding the prior rectangular anchor frame corresponding to the row of each maximum element to the column thereofThe positions of the object surrounding frames are matched to obtain an object position matrix G 'expected to be learned by each prior rectangular anchor frame, and simultaneously, the class number of each object in G' is formed into an M multiplied by 1 dimensional class vector CLS;
s33: cross-over vector IoU using hierarchical cross-over functionmaxCarrying out hierarchical processing to obtain a hierarchical vector hIoU;
s34: constructing a total loss function L of the target detection model according to the hierarchical vector hIoU, the class vector CLS, the classification matrix S and the offset matrix D as follows:
L=|S-hIoU·hIoUT·OneHot(CLS)|+|D[pos]-G′[pos]|
wherein, T is the transpose, onehot (CLS) represents the conversion of CLS into one-hot code, D [ pos ] is the position coordinate after the shift of the prior rectangular anchor frame selected as the positive sample in the shift matrix D, and G' [ pos ] is the position coordinate of the object expected to be learned by the prior rectangular anchor frame selected as the positive sample.
2. The target detection method based on the guidance of the positioning information as claimed in claim 1, wherein the prior rectangular anchor frame as the positive sample is selected by:
the cross-over-ratio vector IoU is cross-over-ratio using a hierarchical cross-over-ratio function as followsmaxCarrying out hierarchical processing to obtain a hierarchical vector hIoU:
Figure FDA0003222262700000021
IoUmax(m) is a cross-over ratio vector IoUmaxThe mth element in the hierarchical vector hlou, (m) is the mth element in the hierarchical vector hlou, δ is a set interval division threshold, and E (·) is a lower rounding function;
will satisfy IoUmaxThe a priori rectangular anchor box with (m) > 0.5 serves as a positive sample for training.
3. The positioning-information-guided target detection method as claimed in claim 1, wherein the calculation formula of the intersection-to-parallel ratio between each prior rectangular anchor frame and each object bounding frame in the sample image is as follows:
Figure FDA0003222262700000022
wherein IoU represents the intersection and comparison between any two prior rectangular anchor frames and the object enclosure frame, A ^ B represents the intersection of any two prior rectangular anchor frames and the object enclosure frame, and AomebB represents the union of any two prior rectangular anchor frames and the object enclosure frame.
4. The method as claimed in claim 1, wherein the image to be detected is input into a trained target detection model, and then the obtained belonged category and positioning result are subjected to non-maximum suppression and redundant frame elimination, and the result after the non-maximum suppression and redundant frame elimination is used as the final belonged category and positioning result of the object contained in the image to be detected.
5. The positioning information guidance-based object detection method as claimed in claim 1, wherein the classification matrix S of the prior rectangular anchor boxes is an M × C matrix, where C is the number of all possible classes of the object, and each element in the classification matrix S represents the probability value of each prior rectangular anchor box belonging to each class;
and taking the category corresponding to the maximum element in each row of the classification matrix S as the category of the prior rectangular anchor frame corresponding to each row, wherein each maximum element is the category score of the category of each prior rectangular anchor frame.
CN202110960804.6A 2021-08-20 2021-08-20 Target detection method based on positioning information guidance Pending CN113673540A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110960804.6A CN113673540A (en) 2021-08-20 2021-08-20 Target detection method based on positioning information guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110960804.6A CN113673540A (en) 2021-08-20 2021-08-20 Target detection method based on positioning information guidance

Publications (1)

Publication Number Publication Date
CN113673540A true CN113673540A (en) 2021-11-19

Family

ID=78544538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110960804.6A Pending CN113673540A (en) 2021-08-20 2021-08-20 Target detection method based on positioning information guidance

Country Status (1)

Country Link
CN (1) CN113673540A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372502A (en) * 2021-12-02 2022-04-19 北京工业大学 Angle self-adaptive ellipse template target detector
CN114638784A (en) * 2022-02-17 2022-06-17 中南大学 Method and device for detecting surface defects of copper pipe based on FE-YOLO

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372502A (en) * 2021-12-02 2022-04-19 北京工业大学 Angle self-adaptive ellipse template target detector
CN114372502B (en) * 2021-12-02 2024-05-28 北京工业大学 Angle-adaptive elliptical template target detector
CN114638784A (en) * 2022-02-17 2022-06-17 中南大学 Method and device for detecting surface defects of copper pipe based on FE-YOLO

Similar Documents

Publication Publication Date Title
Gao et al. A mutually supervised graph attention network for few-shot segmentation: The perspective of fully utilizing limited samples
Wang et al. Hybrid feature aligned network for salient object detection in optical remote sensing imagery
CN108288088B (en) Scene text detection method based on end-to-end full convolution neural network
CN110930454B (en) Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning
CN110147743B (en) Real-time online pedestrian analysis and counting system and method under complex scene
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN108399406B (en) Method and system for detecting weakly supervised salient object based on deep learning
CN113744311A (en) Twin neural network moving target tracking method based on full-connection attention module
CN113673540A (en) Target detection method based on positioning information guidance
CN111507222A (en) Three-dimensional object detection framework based on multi-source data knowledge migration
CN114913386A (en) Training method of multi-target tracking model and multi-target tracking method
CN117252904B (en) Target tracking method and system based on long-range space perception and channel enhancement
CN113298036A (en) Unsupervised video target segmentation method
CN112734803A (en) Single target tracking method, device, equipment and storage medium based on character description
Liu et al. Building outline delineation from VHR remote sensing images using the convolutional recurrent neural network embedded with line segment information
CN114241250A (en) Cascade regression target detection method and device and computer readable storage medium
Ding et al. Cf-yolo: Cross fusion yolo for object detection in adverse weather with a high-quality real snow dataset
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN114187653A (en) Behavior identification method based on multi-stream fusion graph convolution network
Peng et al. Semi-supervised bolt anomaly detection based on local feature reconstruction
Li A deep learning-based text detection and recognition approach for natural scenes
Xu et al. Representative feature alignment for adaptive object detection
CN112418358A (en) Vehicle multi-attribute classification method for strengthening deep fusion network
CN112651294A (en) Method for recognizing human body shielding posture based on multi-scale fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination