CN114663879B - Target detection method and device, electronic equipment and storage medium - Google Patents

Target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114663879B
CN114663879B CN202210122800.5A CN202210122800A CN114663879B CN 114663879 B CN114663879 B CN 114663879B CN 202210122800 A CN202210122800 A CN 202210122800A CN 114663879 B CN114663879 B CN 114663879B
Authority
CN
China
Prior art keywords
target
dimensional
prediction result
result
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210122800.5A
Other languages
Chinese (zh)
Other versions
CN114663879A (en
Inventor
张兆翔
张驰
陈文博
裴仪瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202210122800.5A priority Critical patent/CN114663879B/en
Publication of CN114663879A publication Critical patent/CN114663879A/en
Application granted granted Critical
Publication of CN114663879B publication Critical patent/CN114663879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target point cloud sequence; inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence; the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, and the second prediction result is obtained by propagating the first prediction result along the time dimension. According to the embodiment of the invention, the second prediction result can be obtained by transmitting the first prediction result along the time dimension, and then the pseudo label can be obtained based on the first prediction result and the second prediction result, so that the three-dimensional target detection model can be trained under the condition of no artificial labeling data, and a better detection effect is achieved.

Description

Target detection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.
Background
Two-dimensional target detection can only provide two-dimensional position information of an object in a picture, and nowadays, along with the rapid development of application fields such as vehicle unmanned driving, intelligent robots, augmented reality and security frontier defense, information of the object in a three-dimensional space is often needed so as to more accurately position and identify the target. The input of the three-dimensional target detection is two-dimensional images or three-dimensional data, and the output of the three-dimensional target detection is the position of an object bounding box in a three-dimensional space and a classification result.
In the prior art, most of three-dimensional related tasks such as unmanned driving and the like utilize laser radar point cloud data to obtain more accurate three-dimensional space information. The point cloud data is different from the conventional image data, has the properties of high dimensionality, disorder and the like, and also causes the problems of large data annotation difficulty and small scale of a data set, and compared with two-dimensional target detection, the acquisition difficulty of three-dimensional target detection data annotation is higher.
Disclosure of Invention
The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which are used for solving the defect of high difficulty in obtaining three-dimensional target detection data labels in the prior art, realizing the training and obtaining of a three-dimensional target detection model under the condition of no manual labeling data and achieving a better detection effect.
In a first aspect, the present invention provides a target detection method, including:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
Optionally, according to a target detection method provided by the present invention, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:
obtaining a classification result based on the target point cloud sequence and a classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional bounding box based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
Optionally, according to the target detection method provided by the present invention, the three-dimensional target detection model is constructed in the following manner:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
acquiring the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
Optionally, according to a target detection method provided by the present invention, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
Optionally, according to the target detection method provided by the present invention, in a case that the second regression branch is a regression model based on gaussian distribution, a loss function corresponding to the second regression branch specifically is:
Figure GDA0003901051550000031
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure GDA0003901051550000032
wherein, t i Target regression value, μ, representing the ith dimension i Three-dimensional bounding box regression result, sigma, representing the ith dimension i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.
Optionally, according to a target detection method provided by the present invention, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
Optionally, according to a target detection method provided by the present invention, the determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result includes:
using the formula conf cls =sigmoid(output cls ) Obtaining the classification result pair in the third prediction resultCorresponding classification confidence conf cls ,output cls Is a classification result in the third prediction result;
using the formula conf loc =1-∑ i∈{x,y,z,w,h,l,θ} sigmoid(σ i ) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result loc Wherein σ is i Representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conf total =conf cls ·conf loc Obtaining a first target confidence conf corresponding to the third prediction result total
Optionally, according to a target detection method provided by the present invention, the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result includes:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target conversion matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
Optionally, according to a target detection method provided by the present invention, the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Optionally, according to a target detection method provided by the present invention, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:
applications of
Figure GDA0003901051550000051
Obtaining a second angular point form V corresponding to the j frame j
Where k is the target propagation range, s n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is a unit of nj Representing the target transition matrix from the nth frame to the jth frame, V n Indicating the first corner form corresponding to the nth frame.
Optionally, according to a target detection method provided by the present invention, the obtaining the pseudo tag based on the first target confidence, the first prediction result, and the second prediction result includes:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
Optionally, according to a target detection method provided by the present invention, the obtaining, based on the first target confidence, the first prediction result, and the second prediction result, a second target confidence corresponding to the prediction result of the target frame includes:
applying in the case that the target frame is the jth frame
Figure GDA0003901051550000061
Figure GDA0003901051550000062
Acquiring a second target confidence conf 'corresponding to the prediction result of the jth frame' j
Wherein conf n And representing a first target confidence corresponding to the prediction result of the nth frame, wherein alpha is a preset attenuation factor, and k is the target propagation range.
In a second aspect, the present invention further provides an object detecting apparatus, including:
the first acquisition module is used for acquiring a target point cloud sequence;
the second acquisition module is used for inputting the target point cloud sequence into the three-dimensional target detection model and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the above-mentioned object detection methods.
In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
According to the target detection method, the device, the electronic equipment and the storage medium, the initial three-dimensional target detection model is trained through the virtual sample, the pre-trained three-dimensional target detection model can be obtained, the real sample is input into the pre-trained three-dimensional target detection model to obtain the first prediction result, the first prediction result is transmitted along the time dimension to obtain the second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model, the target point cloud sequence is input into the three-dimensional target detection model, the accurate three-dimensional target detection result can be obtained, the training and obtaining of the three-dimensional target detection model can be realized under the condition of no manual labeling data, and the better detection effect is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a target detection method provided by the present invention;
FIG. 2 is a second schematic flow chart of a target detection method provided by the present invention;
FIG. 3 is a third schematic flow chart of a target detection method provided by the present invention;
FIG. 4 is a fourth schematic flowchart of a target detection method provided by the present invention;
FIG. 5 is a schematic structural diagram of an object detecting device provided in the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The object detection method and apparatus of the present invention are described below with reference to the drawings.
Fig. 1 is a schematic flow chart of an object detection method provided by the present invention, and as shown in fig. 1, an execution subject of the object detection method may be an electronic device, such as a mobile phone or a server. The method comprises the following steps 101 to 102:
step 101, acquiring a target point cloud sequence;
step 102, inputting the target point cloud sequence into a three-dimensional target detection model, and obtaining a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
Specifically, after the target point cloud sequence is obtained, the target point cloud sequence may be input to the three-dimensional target detection model, and then a three-dimensional target detection result corresponding to the target point cloud sequence may be obtained.
For example, a target point cloud sequence a of a certain unmanned vehicle may be obtained, and then the target point cloud sequence a may be input into a three-dimensional target detection model, and then a three-dimensional target detection result a corresponding to the target point cloud sequence a may be obtained, where the three-dimensional target may be a target object on a motion trajectory of the unmanned vehicle, and the three-dimensional target detection result a may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.
For example, a target point cloud sequence B of an intelligent robot may be obtained, and then the target point cloud sequence B may be input into a three-dimensional target detection model, and then a three-dimensional target detection result B corresponding to the target point cloud sequence B may be obtained, where the three-dimensional target may be a target object of a motion trajectory of the intelligent robot, and the three-dimensional target detection result B may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.
The above examples are merely illustrative of the embodiments of the present invention, and are not intended to limit the embodiments of the present invention.
It can be understood that the pre-trained three-dimensional target detection model can be obtained by training the initial three-dimensional target detection model through the virtual sample, the real sample is input into the pre-trained three-dimensional target detection model to obtain a first prediction result, the first prediction result is propagated along the time dimension to obtain a second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, and then the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
According to the target detection method provided by the invention, the initial three-dimensional target detection model is trained through the virtual sample, the pre-trained three-dimensional target detection model can be obtained, the real sample is input into the pre-trained three-dimensional target detection model to obtain the first prediction result, the first prediction result is transmitted along the time dimension to obtain the second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model, the target point cloud sequence is input into the three-dimensional target detection model, the accurate three-dimensional target detection result can be obtained, the three-dimensional target detection model can be trained and obtained under the condition of no manual labeling data, and the better detection effect can be achieved.
Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
Specifically, through each branch model of the three-dimensional target detection model, the target point cloud sequence can be identified, and a classification result, a three-dimensional boundary frame regression result direction classification result, a three-dimensional boundary frame uncertainty regression result and the like are obtained.
Optionally, through a classification branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a classification result corresponding to the target point cloud sequence is obtained.
Optionally, through a first regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box regression result and a direction classification result corresponding to the target point cloud sequence are obtained.
Optionally, through a second regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result may be obtained, where the three-dimensional bounding box uncertainty regression result may be used to represent uncertainty of the three-dimensional bounding box regression result.
Optionally, the three-dimensional object detection model may include a SECOND regression branch and a SECOND SECOND network, wherein the SECOND regression branch may include a classification branch and a first regression branch.
Optionally, the real sample may be identified by a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, the three-dimensional bounding box regression result may be further screened based on the three-dimensional bounding box uncertainty regression result, and the first prediction result may be determined based on the screened three-dimensional bounding box regression result.
It is understood that the first prediction result may include a three-dimensional bounding box regression result after the filtering, a classification result corresponding to the three-dimensional bounding box regression result after the filtering, and a direction classification result corresponding to the three-dimensional bounding box regression result after the filtering.
Optionally, the target point cloud sequence may be identified through a three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, and then the three-dimensional bounding box regression result may be screened based on the three-dimensional bounding box uncertainty regression result, and then the three-dimensional target detection result may be determined based on the screened three-dimensional bounding box regression result.
It is understood that the three-dimensional target detection result may include a filtered three-dimensional bounding box regression result, a classification result corresponding to the filtered three-dimensional bounding box regression result, and a direction classification result corresponding to the filtered three-dimensional bounding box regression result.
Alternatively, the second regression branch may be a regression model based on a gaussian distribution.
Alternatively, the second regression branch may be a regression model based on a laplace distribution.
It is understood that the three-dimensional bounding box uncertainty regression results may be used to obtain more accurate first prediction results in the training session. Specifically, in the training process, a real sample is input into a pre-trained three-dimensional target detection model, a three-dimensional bounding box regression result and a three-dimensional bounding box uncertainty regression result can be obtained, then candidate boxes in the three-dimensional bounding box regression result can be screened based on the three-dimensional bounding box uncertainty regression result, and a more accurate first prediction result can be obtained based on the screened three-dimensional bounding box regression result.
It can be understood that the three-dimensional bounding box uncertainty regression results can be used to obtain more accurate pseudo labels in the training session. Specifically, in the training process, a plurality of candidate frames can be determined based on a first prediction result and a second prediction result, the candidate frames can be screened based on a three-dimensional boundary frame uncertainty regression result, and then a more accurate pseudo label can be obtained based on the screened candidate frames.
Therefore, the three-dimensional boundary frame uncertainty regression result can be obtained through the second regression branch, the accurate pseudo label can be obtained based on the three-dimensional boundary frame uncertainty regression result, fine tuning training can be carried out on the pre-trained three-dimensional target detection model based on the accurate pseudo label and the real sample, and the three-dimensional target detection model obtained through training can achieve a good detection effect under the condition that no manual labeling is used at all.
Optionally, the three-dimensional object detection model is constructed by:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
Specifically, the initial three-dimensional target detection model is trained through a virtual sample, a pre-trained three-dimensional target detection model can be obtained, a real sample can be identified based on the pre-trained three-dimensional target detection model, a first prediction result and a first target confidence coefficient are obtained, the first prediction result is propagated along a time dimension, a second prediction result can be obtained, the first prediction result and the second prediction result can be screened and fused based on the first target confidence coefficient, a pseudo tag is obtained, the pre-trained model can be finely adjusted based on the pseudo tag and the real sample, and the three-dimensional target detection model is obtained.
Alternatively, the target propagation range may be a propagation range between adjacent frames, or may be a propagation range between non-adjacent frames.
Optionally, virtual samples may be generated by the cara simulator, which may include depth images and point cloud data acquired by a laser radar and a depth sensor in the cara simulator.
Alternatively, the prediction result of the nth frame in the first prediction result may be B n Prediction result B of nth frame n Specifically, { B 1 ,B 2 ,...,B m And B = { x, y, z, w, h, l, θ }, { x, y, z } represents coordinates of a center point of the three-dimensional bounding box, w, l, and l respectively represent three side lengths of the three-dimensional bounding box, and θ represents rotation of the three-dimensional bounding box around the y axisAnd (5) rotating the angle.
Optionally, fig. 2 is a second schematic flow chart of the object detection method provided by the present invention, fig. 3 is a third schematic flow chart of the object detection method provided by the present invention, and fig. 2 or fig. 3 are an optional example of the present invention, but are not limited to the present invention; as shown in fig. 2, the three-dimensional object detection model is constructed in a manner including the following steps 201 to 205:
step 201, training an initial three-dimensional target detection model based on a virtual sample and a label of the virtual sample to obtain a pre-trained three-dimensional target detection model;
step 202, inputting a continuous point cloud frame of a real sample into a pre-trained three-dimensional target detection model, and acquiring a first prediction result and a first target confidence coefficient;
optionally, the real sample may be identified through a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, and then the three-dimensional bounding box regression result may be screened based on the three-dimensional bounding box uncertainty regression result, and then the first prediction result and the first target confidence may be determined based on the screened three-dimensional bounding box regression result.
It is understood that the real sample may include continuous point cloud frames, and the first prediction result may include a prediction result corresponding to each point cloud frame, for example, as shown in fig. 3, in the case that the real sample includes a j-th point cloud frame, the first prediction result may include a prediction result corresponding to the j-th point cloud frame.
Step 203, propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
optionally, the first prediction result may include a prediction result corresponding to each point cloud frame, and then the prediction results corresponding to each point cloud frame in the first prediction result may be propagated along the time dimension, respectively, to obtain the second prediction result.
For example, as shown in fig. 3, the first prediction result may include a prediction result A1 of a (j-1) th frame, a prediction result B1 of a j-th frame, and a prediction result C1 of a (j + 1) th frame, the prediction result A1 of the (j-1) th frame in the first prediction result may be propagated along a time dimension to a prediction result B2 of the j-th frame, the prediction result C1 of the (j + 1) th frame in the first prediction result may be propagated along the time dimension to a prediction result B3 of the j-th frame, and the second prediction result may include a prediction result B2 of the j-th frame and a prediction result B3 of the j-th frame.
It is understood that the first prediction result may include a prediction result B1 of a j-th frame, the second prediction result may include a prediction result B2 of the j-th frame, and a prediction result B3 of the j-th frame, and thus, for the j-th frame, the prediction result B1, the prediction result B2, and the prediction result B3 may be obtained.
It will be appreciated that after the second prediction is obtained, the prediction of the real sample may include the first prediction and the second prediction.
Step 204, acquiring a pseudo label based on the first target confidence coefficient, the first prediction result and the second prediction result;
optionally, candidate frames in the first prediction result and the second prediction result may be filtered based on the first target confidence, and a pseudo tag may be determined based on the filtered candidate frames.
For example, as shown in fig. 3, the first prediction result may include a prediction result B1 of a j-th frame, the second prediction result may include a prediction result B2 of the j-th frame, and a prediction result B3 of the j-th frame, and the candidate frame corresponding to the prediction result B1, the candidate frame corresponding to the prediction result B2, and the candidate frame corresponding to the prediction result B3 may be filtered based on the first target confidence, so as to obtain the filtered candidate frames, and then the pseudo tag may be determined based on the filtered candidate frames.
Step 205, training (fine tuning) the pre-trained three-dimensional target detection model based on the real sample and the pseudo label, and obtaining the three-dimensional target detection model.
It can be understood that the pre-trained three-dimensional target detection model obtained through virtual sample training has basic target detection capability, detection performance in a real environment may be reduced, the pre-trained three-dimensional target detection model can be trained based on a real sample and a pseudo label, and detection performance of the three-dimensional target detection model in the real environment can be improved.
Therefore, the first prediction result is propagated along the time dimension, the second prediction result can be obtained, an accurate pseudo label can be determined based on the first target confidence coefficient, the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label, the three-dimensional target detection model is obtained, and a good detection effect is achieved.
Optionally, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
Optionally, based on the target loss function, the virtual sample and the label of the virtual sample may be input to the initial three-dimensional target detection model for training until the loss function value corresponding to the target loss function is smaller than the first threshold.
Optionally, based on the target loss function, the virtual samples and the labels of the virtual samples may be input to the initial three-dimensional target detection model for training until the number of training times is greater than the second threshold.
Optionally, the classification branch in the three-dimensional target detection model may correspond to a classification loss function, the first regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box regression loss function and a direction classification loss function, and the second regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box uncertainty regression loss function, so that the target loss function may be determined based on the classification loss function, the three-dimensional bounding box regression loss function, the direction classification loss function, and the three-dimensional bounding box uncertainty regression loss function.
Alternatively, the target loss function may be:
L total =β 1 L cls2 L reg3 L un_reg4 L dir
wherein L is cls Can be a classification loss function, L reg Can be a three-dimensional bounding box regression loss function, L dir Can be a directional classification loss function, L un_reg Can be a three-dimensional bounding box uncertainty regression loss function, beta 1 、β 2 、β 3 And beta 4 Are hyper-parameters, which are weight coefficients that balance the four loss functions, respectively.
Alternatively, L cls It may be specifically the Focal loss function, L reg Specifically, the loss function can be Smooth L1 loss function, L dir Specifically, the Smooth L1 loss function may be used.
Optionally, fig. 4 is a fourth schematic flowchart of the target detection method provided by the present invention, and fig. 4 is an optional example of the present invention, but is not limited to the present invention; as shown in fig. 4, the process of training the three-dimensional target detection model may include the following steps 401 to 407:
step 401, based on the point cloud data in the virtual sample and the point cloud data in the real sample, a point cloud database D can be constructed;
optionally, the point cloud database D may include one or more point cloud data D i Wherein:
Figure GDA0003901051550000161
wherein D is i Representing the ith point cloud data, x in the point cloud database D i ,y i ,z i Representing the ith point in the cloud database D relative to the laserThree-dimensional positional information of arrival, R i And the reflectivity of the ith point in the laser point cloud is represented, and N is the number of the point clouds in the laser point cloud.
Step 402, carrying out voxelization coding on point cloud data;
optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a virtual sample in the point cloud database D may be coded, and a first voxelization coding corresponding to the virtual sample may be obtained.
Optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a real sample in the point cloud database D may be coded, and a second voxelization coding corresponding to the real sample may be obtained.
Step 403, performing voxel characteristic extraction on the voxelized code;
optionally, based on a voxel feature extractor in the three-dimensional object detection model, a spatially sparse first voxel feature corresponding to the first pixelization code may be acquired.
Optionally, based on the voxel feature extractor in the three-dimensional object detection model, a spatially sparse second voxel feature corresponding to the second pixelization code may be obtained.
Step 404, acquiring a sample-level feature map based on voxel features;
optionally, the first voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a first spatial feature map is obtained, the first spatial feature map is projected to a top view, dimension compression in the vertical direction may be performed, and a sample-level first feature map may be obtained.
Optionally, the second voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a second spatial feature map is obtained, the second spatial feature map is projected to the top view, dimension compression in the vertical direction may be performed, and a sample-level second feature map may be obtained.
Step 405, generating a network through the candidate area, predicting the characteristic diagram of the sample level, and obtaining a three-dimensional target detection result;
optionally, the candidate region generation network in the three-dimensional object detection model may include a classification branch, a first regression branch, and a second regression branch.
Optionally, the classification branch may predict the feature map of the sample stage to obtain a classification result.
Optionally, the feature map of the sample level may be predicted through the first regression branch, and a three-dimensional bounding box regression result and a direction classification result are obtained.
Alternatively, the three-dimensional bounding box (3D bounding box) regression result may be { μ } x ,μ y ,μ z ,μ w ,μ h ,μ l ,μ θ In which, { mu x ,μ y ,μ z Denotes the coordinates of the center point of the three-dimensional bounding box, μ w 、μ h And mu l Respectively representing three side lengths, mu, of the three-dimensional bounding box θ Indicating the rotation angle of the three-dimensional bounding box about the y-axis.
Optionally, the feature map of the sample level may be predicted through the second regression branch, and a three-dimensional bounding box uncertainty regression result is obtained.
Alternatively, the three-dimensional bounding box uncertainty regression result may be { σ } x ,σ y ,σ z ,σ w ,σ h ,σ l ,σ θ } corresponding to the three-dimensional bounding box regression result { mu x ,μ y ,μ z ,μ w ,μ h ,μ l ,μ θ }, e.g. σ x Represents μ x Uncertainty of (2), e.g. sigma y Represents μ y Uncertainty of (2).
In step 406, a loss value is obtained based on the target loss function.
Optionally, based on the classification loss function corresponding to the classification branch, a loss value of the classification result may be obtained.
Optionally, based on the three-dimensional bounding box regression loss function corresponding to the first regression branch, a loss value of the three-dimensional bounding box regression result may be obtained.
Optionally, based on the direction classification loss function corresponding to the first regression branch, a loss value of the direction classification result may be obtained.
Optionally, a loss value of the three-dimensional bounding box uncertainty regression result may be obtained based on the three-dimensional bounding box uncertainty regression loss function corresponding to the second regression branch.
Optionally, the loss value corresponding to the target loss function is determined based on the loss value of the classification result, the loss value of the three-dimensional bounding box regression result, the loss value of the direction classification result, and the loss value of the three-dimensional bounding box uncertainty regression result.
In step 407, based on the loss value corresponding to the target loss function, it may be determined whether to end the training.
Alternatively, in a case where the loss function value corresponding to the target loss function is smaller than the first threshold, it may be determined to end the training.
Alternatively, in the case where the number of training times is greater than the second threshold value, it may be determined to end the training.
Therefore, based on the virtual sample, the label of the virtual sample, and the target loss function, the initial three-dimensional target detection model can be trained, and then a pre-trained three-dimensional target detection model can be obtained.
Optionally, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:
Figure GDA0003901051550000191
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure GDA0003901051550000192
wherein, t i Mesh representing the ith dimensionNormalized regression value, μ i Three-dimensional bounding box regression results, σ, representing the ith dimension i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.
In particular, in case the second regression branch is a regression model based on gaussian distribution, it may be based on L un_reg_Gaussian And obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.
In particular, in the case where the second regression branch is a regression model based on a laplace distribution, it may be based on L un_reg_Laplace And obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.
Alternatively, the target regression value t i Can be obtained by the following formula:
Figure GDA0003901051550000193
Figure GDA0003901051550000194
Figure GDA0003901051550000195
wherein, g i Truth values, a, of a three-dimensional bounding box representing the ith dimension i A defined anchor box representing a three-dimensional bounding box of the ith dimension; i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the center point coordinates of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents the rotation angle of the three-dimensional boundary frame with the y axis as the center.
Therefore, the loss value of the uncertainty regression result of the three-dimensional bounding box can be determined through the loss function corresponding to the second regression branch, and the loss value corresponding to the target loss function can be determined by combining the loss value of the classification result, the loss value of the regression result of the three-dimensional bounding box and the loss value of the direction classification result.
Optionally, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
Specifically, based on a pre-trained three-dimensional target detection model, a third prediction result may be obtained by predicting a real sample, and then based on a classification result and a three-dimensional bounding box uncertainty regression result, a first target confidence corresponding to the third prediction result may be determined, and then based on the first target confidence and a third threshold, a candidate frame corresponding to the third prediction result may be screened, and one or more first target candidate frames may be obtained, and then all the first target candidate frames may be fused, and a fused candidate frame may be obtained, and then based on the fused candidate frame, the first prediction result and a first target confidence corresponding to the first prediction result may be determined.
Optionally, all the first target candidate frames may be sorted from high to low according to the confidence degrees, and then the first target candidate frames with overlapped bounding frames may be screened out based on a Non-Maximum Suppression (NMS) algorithm, so as to obtain the fused candidate frames, and then the first prediction result and the first target confidence degree corresponding to the first prediction result may be determined based on the fused candidate frames.
Therefore, based on the three-dimensional bounding box uncertainty regression result, a first target confidence coefficient can be determined, based on the first target confidence coefficient, candidate boxes in the prediction result can be screened and fused, and further the first prediction result and the first target confidence coefficient corresponding to the first prediction result can be determined.
Optionally, the determining, based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result, the first target confidence corresponding to the third prediction result includes:
using the formula conf cls =sigmoid(output cls ) Obtaining a classification confidence conf corresponding to the classification result in the third prediction result cls ,output cls Is a classification result in the third prediction result;
using the formula conf loc =1-∑ i∈{x,y,z,w,h,l,θ} sigmoid(σ i ) And/7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result loc Wherein σ is i Representing an uncertainty regression result of a three-dimensional bounding box of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional bounding box, w, h and l respectively represent three side lengths of the three-dimensional bounding box, and theta represents a rotation angle of the three-dimensional bounding box by taking a y axis as a center;
using the formula conf total =conf cls ·con floc Obtaining a first target confidence conf corresponding to the third prediction result total
In particular, by the classification result and the formula conf cls =sigmoid(output cls ) The classification confidence corresponding to the classification result can be determined, and the result and the common are regressed through the uncertainty of the three-dimensional bounding boxFormula con floc =1-∑ i∈{x,y,z,w,h,l,θ} sigmoid(σ i ) And/7, determining a positioning uncertainty confidence corresponding to the three-dimensional bounding box uncertainty regression result, and determining a first target confidence corresponding to the third prediction result based on the classification confidence and the positioning uncertainty confidence.
Therefore, the first target confidence corresponding to the third prediction result can be determined through the classification result and the three-dimensional boundary box uncertainty regression result, the candidate boxes in the prediction result can be screened and fused based on the first target confidence, and the first target confidence corresponding to the first prediction result and the first prediction result can be further determined.
Optionally, the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result includes:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning corresponding to a real sample and a map, and acquiring a self-movement track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
Specifically, after the first prediction result is obtained, based on a point cloud sequence coding and instantaneous positioning and Mapping (SLAM) algorithm corresponding to the real sample, a self-moving trajectory can be obtained, and further based on the target propagation range and the self-moving trajectory, a target transformation matrix can be obtained, and further based on the target transformation matrix, the first prediction result can be transformed, and the second prediction result can be obtained.
Alternatively, the point cloud sequence encoding to which the real sample corresponds may be { P } 1 ,P 2 ,...,P num In which P is 1 Point cloud data representing the 1 st time, P 2 Representing point cloud data at time 2, and so on, P num Point cloud data representing a time of num, where num may be greater thanOr an integer equal to 1.
Optionally, based on SLAM algorithm, the point cloud sequence may be encoded { P } 1 ,P 2 ,...,P num Calculating to obtain self-moving track p t ∈R 3×4 Where t may be any value from 1 to num.
Optionally, a trajectory p based on self-motion t ∈R 3×4 And a target propagation range, a target transition matrix T from the nth frame to the jth frame can be determined nj
For example, where the target propagation range is k, n ∈ [ j-k, j + k]Trajectory p based on self-movement t ∈R 3×4 And a target propagation range, a target transition matrix T from the nth frame to the jth frame can be determined nj In the case of n = j, T nj Is an identity matrix.
Therefore, by the SLAM algorithm, the trajectory of the self-movement can be acquired, and then based on the trajectory of the self-movement and the target propagation range, the target transformation matrix can be determined, and then based on the target transformation matrix and the first prediction result, the second prediction result can be acquired.
Optionally, converting the first prediction result based on the target conversion matrix to obtain the second prediction result, where the converting includes:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Alternatively, the prediction result of the nth frame in the first prediction result may be B n Prediction result B of n-th frame n Specifically, { B 1 ,B 2 ,...,B m And B = { x, y, z, w, h, l, θ }, { x, y, z } represents coordinates of a center point of the three-dimensional boundary frame, w, h, and l respectively represent three side lengths of the three-dimensional boundary frame, and θ represents a rotation angle of the three-dimensional boundary frame with the y axis as the center.
Alternatively, B may be transformed by transformation of the bounding box into the corner of the bounding box n 8-corner point form V converted into three-dimensional bounding box n (first corner form).
Optionally, in the case of a target propagation range of k, n ∈ [ j-k, j + k]Trajectory p based on self-movement t ∈R 3×4 And a target propagation range, a target transition matrix T from the nth frame to the jth frame can be determined nj In the case of n = j, T nj Is an identity matrix, and can be based on a target transformation matrix T nj Diagonal form V n Converting to obtain angular point form V j (second corner form).
Optionally based on the corner form V j (second corner form), a candidate frame B corresponding to the second corner form can be obtained j
Optionally based on the candidate box B j And a first prediction result, a candidate box B can be determined j The method comprises the steps of obtaining a corresponding classification result, a three-dimensional boundary box regression result, a direction classification result, a three-dimensional boundary box uncertainty regression result and a first target confidence coefficient. It will be appreciated that all candidate blocks B are based on the first prediction result j A second prediction may be determined.
Therefore, after the three-dimensional bounding box regression result in the first prediction result is converted into the first corner form, the first corner form can be converted based on the target conversion matrix, the second corner form can be obtained, and the second prediction result can be obtained based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Optionally, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:
applications of
Figure GDA0003901051550000241
Obtaining a second angular point form V corresponding to the j frame j
Where k is the target propagation range, s n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is nj Representing the target transition matrix from the nth frame to the jth frame, V n Indicating the first corner form corresponding to the nth frame.
Specifically, the target conversion matrix T from the nth frame to the jth frame is determined nj Then, can pass through
Figure GDA0003901051550000242
Diagonal point form V n Converting to obtain angular point form V j (second corner form).
Therefore, based on the target conversion matrix, the first corner form can be converted to obtain the second corner form, and further, based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result, the second prediction result can be obtained.
Optionally, the obtaining the pseudo tag based on the first target confidence, the first predicted result, and the second predicted result includes:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
Specifically, the first prediction result may include a prediction result of the target frame, the second prediction result may include a prediction result of the target frame, and based on the first target confidence, the first prediction result, and the second prediction result, the second target confidence corresponding to the prediction result of the target frame may be obtained, and further based on the second target confidence and the fourth threshold, the candidate frames corresponding to the prediction result of the target frame may be screened to obtain the second target candidate frame, and further, all the second target candidate frames may be fused to obtain the pseudo tag corresponding to the real sample.
Optionally, all the second target candidate frames may be sorted from high to low according to the confidence, and then the second target candidate frames with overlapped bounding frames may be screened out based on the NMS algorithm, so as to obtain fused candidate frames, and then the pseudo label may be determined based on the fused candidate frames.
Therefore, based on the second target confidence corresponding to the prediction result of the target frame, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and further the pseudo label corresponding to the real sample can be obtained.
Optionally, the obtaining a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result, and the second prediction result includes:
applying in the case that the target frame is the jth frame
Figure GDA0003901051550000251
Figure GDA0003901051550000252
Acquiring a second target confidence conf 'corresponding to the prediction result of the jth frame' j
Wherein conf n And representing a first target confidence corresponding to the prediction result of the nth frame, wherein alpha is a preset attenuation factor.
Specifically, when the target frame is the jth frame, the attenuation factor α may be preset, and the second target confidence corresponding to the prediction result of the jth frame may be obtainedDegree conf' j
Alternatively, the preset attenuation factor α may be 0.7, 0.8, 0.9, or the like, which is not limited thereto.
Optionally, based on a second target confidence conf' j And a fourth threshold value conf thred Candidate frame B corresponding to the prediction result of the j-th frame j Screening is carried out, and a second target candidate frame B 'of the j frame is obtained' j Further, all second target candidate frames B 'of the j-th frame may be processed' j Sorting from high to low according to the confidence degree, screening out a second target candidate frame with overlapped bounding frames based on an NMS algorithm, and further acquiring the candidate frame of the fused jth frame
Figure GDA0003901051550000261
Therefore, the second target confidence corresponding to the prediction result of the target frame can be obtained, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and the pseudo label corresponding to the real sample can be obtained.
Optionally, the public unmanned driving data set KITTI Odometry may be used as a real sample, and input to a pre-trained three-dimensional target detection model (which may include a SECOND network), so as to obtain a first prediction result and a first target confidence corresponding to the first prediction result, further, based on a target propagation range of a time dimension, propagate the first prediction result along the time dimension, obtain a SECOND prediction result, and further, based on the first target confidence, the first prediction result, and the SECOND prediction result, obtain a pseudo tag; and then based on the real sample and the pseudo label, training the pre-trained three-dimensional target detection model to obtain the three-dimensional target detection model.
Alternatively, the three-dimensional Object Detection model is tested based on the val data set of the KITTI 3D Object Detection, and the three-dimensional Object Detection result of the three-dimensional Object Detection model in the KITTI data set can be obtained, as shown in table 1, the evaluation index of the three-dimensional Object Detection result may include the average accuracy rate of the bird's eye view (BEV AP), the average accuracy rate of the three-dimensional frame (3D AP), easy, model and Hard represent the simple, medium and difficult samples in the KITTI data set, respectively, the model a is a model obtained by training based on the original SECOND network and the pseudo tag directly generated by the original SECOND in the related art, and the model B is the three-dimensional Object Detection model (including the SECOND network) provided by the embodiment of the present invention.
TABLE 1 three-dimensional target detection results
Figure GDA0003901051550000271
It can be understood from the data in table 1 that the target detection method provided in the embodiment of the present invention can achieve significant performance improvement on an original model (e.g., an original SECOND network) without using any artificially labeled real data.
According to the target detection method provided by the invention, the initial three-dimensional target detection model is trained through the virtual sample, the pre-trained three-dimensional target detection model can be obtained, the real sample is input into the pre-trained three-dimensional target detection model to obtain the first prediction result, the first prediction result is propagated along the time dimension to obtain the second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model, the target point cloud sequence is input into the three-dimensional target detection model, the accurate three-dimensional target detection result can be obtained, the training and obtaining of the three-dimensional target detection model can be realized under the condition of no manual labeling data, and a better detection effect is achieved.
The object detection device provided by the present invention is described below, and the object detection device described below and the object detection method described above may be referred to in correspondence with each other.
Fig. 5 is a schematic structural diagram of the object detection apparatus provided in the present invention, and as shown in fig. 5, the apparatus includes a first obtaining module 501 and a second obtaining module 502, where:
a first obtaining module 501, configured to obtain a target point cloud sequence;
a second obtaining module 502, configured to input the target point cloud sequence into a three-dimensional target detection model, and obtain a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
It can be understood that the target detection apparatus trains the initial three-dimensional target detection model through the virtual sample, so as to obtain a pre-trained three-dimensional target detection model, inputs the real sample into the pre-trained three-dimensional target detection model to obtain a first prediction result, propagates the first prediction result along the time dimension to obtain a second prediction result, determines the pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, and trains the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
The target detection device provided by the invention can obtain a pre-trained three-dimensional target detection model by training an initial three-dimensional target detection model through a virtual sample, can obtain a first prediction result by inputting a real sample into the pre-trained three-dimensional target detection model, can obtain a second prediction result by transmitting the first prediction result along a time dimension, can determine a pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, can train the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model, further inputs a target point cloud sequence into the three-dimensional target detection model, can obtain an accurate three-dimensional target detection result, and can train and obtain the three-dimensional target detection model without manually marked data and achieve a better detection effect.
Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the second obtaining module is specifically configured to:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
Optionally, the apparatus further comprises a training module configured to:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
Optionally, the training module is specifically configured to:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
Optionally, in a case that the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:
Figure GDA0003901051550000301
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure GDA0003901051550000302
wherein, t i Target regression value, μ, representing the ith dimension i Three-dimensional bounding box regression results, σ, representing the ith dimension i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.
Optionally, the training module is specifically configured to:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
Optionally, the training module is specifically configured to:
using the formula conf cls =sigmoid(output cls ) Obtaining a classification confidence conf corresponding to the classification result in the third prediction result cls ,output cls Is a classification result in the third prediction result;
using the formula conf loc =1-∑ i∈{x,y,z,w,h,l,θ} sigmoid(σ i ) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result loc Wherein σ is i Representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conf total =conf cls ·conf loc Obtaining a first target confidence conf corresponding to the third prediction result total
Optionally, the training module is specifically configured to:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
Optionally, the training module is specifically configured to:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Optionally, the training module is specifically configured to:
applications of
Figure GDA0003901051550000311
Obtaining a second angular point form V corresponding to the j frame j
Where k is the target propagation range, s n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is nj Representing the target transition matrix from the nth frame to the jth frame, V n Indicating the first corner form corresponding to the nth frame.
Optionally, the training module is specifically configured to:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
Optionally, the training module is specifically configured to:
applying in the case that the target frame is the jth frame
Figure GDA0003901051550000321
Figure GDA0003901051550000322
Acquiring a second target confidence conf 'corresponding to the prediction result of the j frame' j
Wherein conf n And representing a first target confidence corresponding to the prediction result of the nth frame, wherein alpha is a preset attenuation factor, and k is the target propagation range.
The target detection device provided by the invention can obtain a pre-trained three-dimensional target detection model by training an initial three-dimensional target detection model through a virtual sample, can obtain a first prediction result by inputting a real sample into the pre-trained three-dimensional target detection model, can obtain a second prediction result by transmitting the first prediction result along a time dimension, can determine a pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, can train the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model, further inputs a target point cloud sequence into the three-dimensional target detection model, can obtain an accurate three-dimensional target detection result, and can train and obtain the three-dimensional target detection model without manually marked data and achieve a better detection effect.
Fig. 6 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor) 610, a communication Interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a target detection method comprising:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the object detection method provided by the above methods, the method comprising:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the object detection method provided by the above methods, the method including:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of object detection, comprising:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by transmitting the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample;
the three-dimensional target detection result corresponding to the target point cloud sequence comprises a classification result, a three-dimensional boundary frame regression result, a direction classification result and a three-dimensional boundary frame uncertainty regression result corresponding to the three-dimensional boundary frame regression result, the target point cloud sequence is input into a three-dimensional target detection model, and the three-dimensional target detection result corresponding to the target point cloud sequence is obtained, and the method comprises the following steps:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
acquiring an uncertainty regression result of the three-dimensional bounding box based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model;
the three-dimensional target detection model is constructed in the following way:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model;
the target propagation range based on the time dimension propagates the first prediction result along the time dimension to obtain the second prediction result, and the method comprises the following steps:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
converting the first prediction result based on the target conversion matrix to obtain a second prediction result;
the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
obtaining a second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result;
the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:
applications of
Figure FDA0003931477010000021
Obtaining a second angular point form V corresponding to the j frame j
Where k is the target propagation range, s n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is nj Representing the target transition matrix from the nth frame to the jth frame, V n Indicating the first corner pattern corresponding to the nth frame.
2. The method for detecting the target of claim 1, wherein the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model comprises:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
3. The method according to claim 2, wherein, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:
Figure FDA0003931477010000031
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure FDA0003931477010000032
wherein, t i Target regression value representing ith dimension,μ i Three-dimensional bounding box regression results, σ, representing the ith dimension i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.
4. The target detection method of claim 1, wherein the inputting the real sample into the pre-trained three-dimensional target detection model to obtain the first prediction result and a first target confidence corresponding to the first prediction result comprises:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
5. The method for detecting the target according to claim 4, wherein the determining the confidence of the first target corresponding to the third predicted result based on the classification result in the third predicted result and the three-dimensional bounding box uncertainty regression result in the third predicted result comprises:
using the formula conf cls =sigmoid(output cls ) Obtaining a classification confidence conf corresponding to the classification result in the third prediction result cls ,output cls Is a classification result in the third prediction result;
using the formula conf loc =1-∑ i∈{x,y,z,w,h,l,θ} sigmoid(σ i ) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result loc Wherein σ is i Representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conf total =conf cls ·conf loc Obtaining a first target confidence conf corresponding to the third prediction result total
6. The object detection method of claim 1, wherein said obtaining the pseudo tag based on the first object confidence, the first predicted result, and the second predicted result comprises:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
7. The object detection method of claim 6, wherein obtaining a second object confidence corresponding to the prediction result of the object frame based on the first object confidence, the first prediction result and the second prediction result comprises:
applying in the case that the target frame is the jth frame
Figure FDA0003931477010000051
Figure FDA0003931477010000052
Acquiring a second target confidence conf corresponding to the prediction result of the j frame j ′;
Wherein conf n And representing a first target confidence corresponding to the prediction result of the nth frame, wherein alpha is a preset attenuation factor, and k is the target propagation range.
8. An object detection device, comprising:
the first acquisition module is used for acquiring a target point cloud sequence;
the second acquisition module is used for inputting the target point cloud sequence into the three-dimensional target detection model and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by transmitting the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample;
the three-dimensional target detection result corresponding to the target point cloud sequence comprises a classification result, a three-dimensional boundary frame regression result, a direction classification result and a three-dimensional boundary frame uncertainty regression result corresponding to the three-dimensional boundary frame regression result, the target point cloud sequence is input into a three-dimensional target detection model, and the three-dimensional target detection result corresponding to the target point cloud sequence is obtained, and the method comprises the following steps:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
acquiring an uncertainty regression result of the three-dimensional bounding box based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model;
the three-dimensional target detection model is constructed in the following way:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model;
the target propagation range based on the time dimension propagates the first prediction result along the time dimension to obtain the second prediction result, and the method comprises the following steps:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning corresponding to a real sample and a map, and acquiring a self-movement track;
acquiring a target conversion matrix based on the target propagation range and the self-movement track;
converting the first prediction result based on the target conversion matrix to obtain a second prediction result;
the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
obtaining a second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result;
the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:
applications of
Figure FDA0003931477010000071
Acquiring a second angular point form V corresponding to the jth frame j
Where k is the target propagation range, s n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is nj Representing the target transition matrix from the nth frame to the jth frame, V n Indicating the first corner form corresponding to the nth frame.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the object detection method according to any of claims 1 to 7 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the object detection method according to any one of claims 1 to 7.
CN202210122800.5A 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium Active CN114663879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210122800.5A CN114663879B (en) 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210122800.5A CN114663879B (en) 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114663879A CN114663879A (en) 2022-06-24
CN114663879B true CN114663879B (en) 2023-02-21

Family

ID=82026631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210122800.5A Active CN114663879B (en) 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114663879B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152770B (en) * 2023-04-19 2023-09-22 深圳佑驾创新科技股份有限公司 3D target matching model building method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154118A (en) * 2017-12-25 2018-06-12 北京航空航天大学 A kind of target detection system and method based on adaptive combined filter with multistage detection
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN111462236A (en) * 2020-04-02 2020-07-28 集美大学 Method and system for detecting relative pose between ships
CN111983600A (en) * 2020-08-31 2020-11-24 杭州海康威视数字技术股份有限公司 Target detection method, device and equipment
CN112257605A (en) * 2020-10-23 2021-01-22 中国科学院自动化研究所 Three-dimensional target detection method, system and device based on self-labeling training sample
CN112613378A (en) * 2020-12-17 2021-04-06 上海交通大学 3D target detection method, system, medium and terminal
WO2021081808A1 (en) * 2019-10-30 2021-05-06 深圳市大疆创新科技有限公司 Artificial neural network-based object detection system and method
CN113269098A (en) * 2021-05-27 2021-08-17 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113763430A (en) * 2021-09-13 2021-12-07 智道网联科技(北京)有限公司 Method, apparatus and computer-readable storage medium for detecting moving object

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784333B (en) * 2019-01-22 2021-09-28 中国科学院自动化研究所 Three-dimensional target detection method and system based on point cloud weighted channel characteristics
CN111474953B (en) * 2020-03-30 2021-09-17 清华大学 Multi-dynamic-view-angle-coordinated aerial target identification method and system
CN113409410B (en) * 2021-05-19 2024-04-02 杭州电子科技大学 Multi-feature fusion IGV positioning and mapping method based on 3D laser radar
CN113920370A (en) * 2021-10-25 2022-01-11 上海商汤智能科技有限公司 Model training method, target detection method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154118A (en) * 2017-12-25 2018-06-12 北京航空航天大学 A kind of target detection system and method based on adaptive combined filter with multistage detection
WO2021081808A1 (en) * 2019-10-30 2021-05-06 深圳市大疆创新科技有限公司 Artificial neural network-based object detection system and method
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN111462236A (en) * 2020-04-02 2020-07-28 集美大学 Method and system for detecting relative pose between ships
CN111983600A (en) * 2020-08-31 2020-11-24 杭州海康威视数字技术股份有限公司 Target detection method, device and equipment
CN112257605A (en) * 2020-10-23 2021-01-22 中国科学院自动化研究所 Three-dimensional target detection method, system and device based on self-labeling training sample
CN112613378A (en) * 2020-12-17 2021-04-06 上海交通大学 3D target detection method, system, medium and terminal
CN113269098A (en) * 2021-05-27 2021-08-17 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113763430A (en) * 2021-09-13 2021-12-07 智道网联科技(北京)有限公司 Method, apparatus and computer-readable storage medium for detecting moving object

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention;Junbo Yin 等;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20201231;第11492-11501页 *
SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection;Yangyang Ye 等;《Neurocomputing》;20191017;第53-63页 *
基于传感器融合里程计的相机与激光雷达自动重标定方法;彭湃 等;《机械工程学报》;20211031;第57卷(第20期);第206-214页 *
基于定位不确定性的鲁棒3D目标检测方法;裴仪瑶 等;《计算机应用》;20211010;第41卷(第10期);第2979-2984页 *
基于深度学习的自动驾驶场景识别研究;程志伟;《中国优秀硕士学位论文全文数据库 工程科技II辑》;20220115;第2022年卷(第01期);第C035-229页 *

Also Published As

Publication number Publication date
CN114663879A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN113039563B (en) Learning to generate synthetic data sets for training neural networks
CN109643383B (en) Domain split neural network
CN111079619B (en) Method and apparatus for detecting target object in image
CN112529015A (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
KR20190021187A (en) Vehicle license plate classification methods, systems, electronic devices and media based on deep running
CN110490959B (en) Three-dimensional image processing method and device, virtual image generating method and electronic equipment
WO2019053052A1 (en) A method for (re-)training a machine learning component
CN112347550A (en) Coupling type indoor three-dimensional semantic graph building and modeling method
EP4075382A1 (en) A method for training a neural network to deliver the viewpoints of objects using pairs of images under different viewpoints
CN114758337A (en) Semantic instance reconstruction method, device, equipment and medium
CN116958492B (en) VR editing method for reconstructing three-dimensional base scene rendering based on NeRf
CN114663879B (en) Target detection method and device, electronic equipment and storage medium
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN113326826A (en) Network model training method and device, electronic equipment and storage medium
CN115147798A (en) Method, model and device for predicting travelable area and vehicle
CN114943870A (en) Training method and device of line feature extraction model and point cloud matching method and device
CN115620122A (en) Training method of neural network model, image re-recognition method and related equipment
CN114565953A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN115115699A (en) Attitude estimation method and device, related equipment and computer product
CN115331194A (en) Occlusion target detection method and related equipment
CN113537258A (en) Action track prediction method and device, computer readable medium and electronic equipment
CN115468778B (en) Vehicle testing method and device, electronic equipment and storage medium
Anju et al. Faster Training of Edge-attention Aided 6D Pose Estimation Model using Transfer Learning and Small Customized Dataset
CN114842304A (en) Machine learning model training method and device, and semantic segmentation method and device
CN116597402A (en) Scene perception method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant