CN114663879A - Target detection method and device, electronic equipment and storage medium - Google Patents

Target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114663879A
CN114663879A CN202210122800.5A CN202210122800A CN114663879A CN 114663879 A CN114663879 A CN 114663879A CN 202210122800 A CN202210122800 A CN 202210122800A CN 114663879 A CN114663879 A CN 114663879A
Authority
CN
China
Prior art keywords
target
dimensional
prediction result
result
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210122800.5A
Other languages
Chinese (zh)
Other versions
CN114663879B (en
Inventor
张兆翔
张驰
陈文博
裴仪瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202210122800.5A priority Critical patent/CN114663879B/en
Publication of CN114663879A publication Critical patent/CN114663879A/en
Application granted granted Critical
Publication of CN114663879B publication Critical patent/CN114663879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target point cloud sequence; inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence; the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, and the second prediction result is obtained by propagating the first prediction result along the time dimension. According to the embodiment of the invention, the second prediction result can be obtained by transmitting the first prediction result along the time dimension, and the pseudo label can be obtained based on the first prediction result and the second prediction result, so that the three-dimensional target detection model can be trained under the condition of no manual labeling data, and a better detection effect is achieved.

Description

Target detection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.
Background
Two-dimensional target detection can only provide two-dimensional position information of an object in a picture, and nowadays, along with the rapid development of application fields such as vehicle unmanned driving, intelligent robots, augmented reality and security frontier defense, information of the object in a three-dimensional space is often needed so as to more accurately position and identify the target. The input of the three-dimensional target detection is two-dimensional images or three-dimensional data, and the output of the three-dimensional target detection is the position of an object bounding box in a three-dimensional space and a classification result.
In the prior art, most of three-dimensional related tasks such as unmanned driving and the like are to utilize laser radar point cloud data to obtain more accurate three-dimensional space information. The point cloud data is different from the original image data, has properties such as high dimensionality and disorder, and also causes the problems of large data annotation difficulty and small scale of a data set, and compared with two-dimensional target detection, the method has the advantage that the three-dimensional target detection data annotation is more difficult to obtain.
Disclosure of Invention
The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which are used for solving the defect of high difficulty in obtaining three-dimensional target detection data labels in the prior art, realizing the training and obtaining of a three-dimensional target detection model under the condition of no manual labeling data and achieving a better detection effect.
In a first aspect, the present invention provides a target detection method, including:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
Optionally, according to a target detection method provided by the present invention, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
Optionally, according to the target detection method provided by the present invention, the three-dimensional target detection model is constructed in the following manner:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
acquiring the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
Optionally, according to a target detection method provided by the present invention, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
Optionally, according to the target detection method provided by the present invention, in a case that the second regression branch is a regression model based on gaussian distribution, a loss function corresponding to the second regression branch specifically is:
Figure BDA0003499181750000031
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure BDA0003499181750000032
wherein, tiTarget regression value, μ, representing the ith dimensioniThree-dimensional bounding box regression result, sigma, representing the ith dimensioniAnd representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.
Optionally, according to a target detection method provided by the present invention, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
Optionally, according to a target detection method provided by the present invention, the determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result includes:
using the formula confcls=sigmoid(outputcls) Obtaining a classification confidence conf corresponding to the classification result in the third prediction resultcls,outputclsIs thatA classification result in the third prediction result;
using the formula confloc=1-∑i∈{x,y,z,w,h,l,θ}sigmoid(σi) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction resultlocWherein σ isiRepresenting a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conftotal=confcls·conflocObtaining a first target confidence conf corresponding to the third prediction resulttotal
Optionally, according to a target detection method provided by the present invention, the propagating the first prediction result along a time dimension based on a target propagation range of the time dimension to obtain the second prediction result includes:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
Optionally, according to a target detection method provided by the present invention, the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Optionally, according to a target detection method provided by the present invention, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:
applications of
Figure BDA0003499181750000051
Obtaining a second angular point form V corresponding to the j framej
Where k is the target propagation range, tiThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at tiThe value of the function I (.) is 0 under the condition that the three-dimensional target in the ith frame is represented in a motion state, and the function I (.) is at tiRepresenting that the value of the function I (.) is 1 under the condition that the three-dimensional target in the ith frame is in a static state; t isijRepresenting the target transition matrix from frame i to frame j, ViIndicating a first corner pattern corresponding to the ith frame.
Optionally, according to a target detection method provided by the present invention, the obtaining the pseudo tag based on the first target confidence, the first prediction result, and the second prediction result includes:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
Optionally, according to a target detection method provided by the present invention, the obtaining, based on the first target confidence, the first prediction result, and the second prediction result, a second target confidence corresponding to the prediction result of the target frame includes:
applying in the case that the target frame is the jth frame
Figure BDA0003499181750000061
α|i-j | obtaining a second target confidence conf corresponding to the prediction result of the j-th framej
Wherein confiAnd representing a first target confidence corresponding to the prediction result of the ith frame, wherein alpha is a preset attenuation factor.
In a second aspect, the present invention further provides an object detecting apparatus, including:
the first acquisition module is used for acquiring a target point cloud sequence;
the second acquisition module is used for inputting the target point cloud sequence into the three-dimensional target detection model and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the above-mentioned object detection methods.
In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
The target detection method, the device, the electronic equipment and the storage medium provided by the invention can obtain a pre-trained three-dimensional target detection model by training an initial three-dimensional target detection model through a virtual sample, can obtain a first prediction result by inputting a real sample into the pre-trained three-dimensional target detection model, can obtain a second prediction result by transmitting the first prediction result along a time dimension, can determine a pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, can train the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model, further inputs a target point cloud sequence into the three-dimensional target detection model, can obtain an accurate three-dimensional target detection result, and can realize the training of obtaining the three-dimensional target detection model without manually marked data, and achieves better detection effect.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a target detection method provided by the present invention;
FIG. 2 is a second schematic flow chart of a target detection method provided by the present invention;
FIG. 3 is a third schematic flow chart of a target detection method provided by the present invention;
FIG. 4 is a fourth schematic flowchart of a target detection method provided by the present invention;
FIG. 5 is a schematic structural diagram of an object detecting device provided in the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The object detection method and apparatus of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an object detection method provided by the present invention, and as shown in fig. 1, an execution subject of the object detection method may be an electronic device, such as a mobile phone or a server. The method comprises the following steps 101 to 102:
step 101, acquiring a target point cloud sequence;
step 102, inputting the target point cloud sequence into a three-dimensional target detection model, and obtaining a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
Specifically, after the target point cloud sequence is obtained, the target point cloud sequence may be input to the three-dimensional target detection model, and then a three-dimensional target detection result corresponding to the target point cloud sequence may be obtained.
For example, a target point cloud sequence a of a certain unmanned vehicle may be obtained, and then the target point cloud sequence a may be input into a three-dimensional target detection model, and then a three-dimensional target detection result a corresponding to the target point cloud sequence a may be obtained, where the three-dimensional target may be a target object on a motion trajectory of the unmanned vehicle, and the three-dimensional target detection result a may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.
For example, a target point cloud sequence B of an intelligent robot may be obtained, and then the target point cloud sequence B may be input into a three-dimensional target detection model, and then a three-dimensional target detection result B corresponding to the target point cloud sequence B may be obtained, where the three-dimensional target may be a target object of a motion trajectory of the intelligent robot, and the three-dimensional target detection result B may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.
The above examples are merely illustrative of the embodiments of the present invention, and are not intended to limit the embodiments of the present invention.
It can be understood that the pre-trained three-dimensional target detection model can be obtained by training the initial three-dimensional target detection model through the virtual sample, the real sample is input into the pre-trained three-dimensional target detection model to obtain a first prediction result, the first prediction result is propagated along the time dimension to obtain a second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, and then the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
The target detection method provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual labeling data, and achieves a good detection effect.
Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
Specifically, through each branch model of the three-dimensional target detection model, the target point cloud sequence can be identified, and a classification result, a three-dimensional boundary frame regression result direction classification result, a three-dimensional boundary frame uncertainty regression result and the like are obtained.
Optionally, through a classification branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a classification result corresponding to the target point cloud sequence is obtained.
Optionally, through a first regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box regression result and a direction classification result corresponding to the target point cloud sequence are obtained.
Optionally, through a second regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result may be obtained, where the three-dimensional bounding box uncertainty regression result may be used to represent uncertainty of the three-dimensional bounding box regression result.
Optionally, the three-dimensional object detection model may include a SECOND regression branch and a SECOND SECOND network, wherein the SECOND regression branch may include a classification branch and a first regression branch.
Optionally, the real sample may be identified by a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, the three-dimensional bounding box regression result may be further screened based on the three-dimensional bounding box uncertainty regression result, and the first prediction result may be determined based on the screened three-dimensional bounding box regression result.
It is understood that the first prediction result may include a three-dimensional bounding box regression result after the filtering, a classification result corresponding to the three-dimensional bounding box regression result after the filtering, and a direction classification result corresponding to the three-dimensional bounding box regression result after the filtering.
Optionally, the target point cloud sequence may be identified through a three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, and then the three-dimensional bounding box regression result may be screened based on the three-dimensional bounding box uncertainty regression result, and then the three-dimensional target detection result may be determined based on the screened three-dimensional bounding box regression result.
It is understood that the three-dimensional target detection result may include a three-dimensional bounding box regression result after being screened, a classification result corresponding to the three-dimensional bounding box regression result after being screened, and a direction classification result corresponding to the three-dimensional bounding box regression result after being screened.
Alternatively, the second regression branch may be a regression model based on a gaussian distribution.
Alternatively, the second regression branch may be a regression model based on a laplace distribution.
It is understood that the three-dimensional bounding box uncertainty regression results may be used to obtain more accurate first prediction results in the training session. Specifically, in the training process, a real sample is input into a pre-trained three-dimensional target detection model, a three-dimensional bounding box regression result and a three-dimensional bounding box uncertainty regression result can be obtained, then candidate boxes in the three-dimensional bounding box regression result can be screened based on the three-dimensional bounding box uncertainty regression result, and a more accurate first prediction result can be obtained based on the screened three-dimensional bounding box regression result.
It can be understood that the three-dimensional bounding box uncertainty regression results can be used to obtain more accurate pseudo labels in the training session. Specifically, in the training process, a plurality of candidate frames can be determined based on a first prediction result and a second prediction result, the candidate frames can be screened based on a three-dimensional boundary frame uncertainty regression result, and then a more accurate pseudo label can be obtained based on the screened candidate frames.
Therefore, the three-dimensional boundary frame uncertainty regression result can be obtained through the second regression branch, the accurate pseudo label can be obtained based on the three-dimensional boundary frame uncertainty regression result, fine tuning training can be carried out on the pre-trained three-dimensional target detection model based on the accurate pseudo label and the real sample, and the three-dimensional target detection model obtained through training can achieve a good detection effect under the condition that no manual labeling is used at all.
Optionally, the three-dimensional object detection model is constructed by:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
Specifically, the initial three-dimensional target detection model is trained through a virtual sample, a pre-trained three-dimensional target detection model can be obtained, a real sample can be identified based on the pre-trained three-dimensional target detection model, a first prediction result and a first target confidence coefficient are obtained, the first prediction result is propagated along a time dimension, a second prediction result can be obtained, the first prediction result and the second prediction result can be screened and fused based on the first target confidence coefficient, a pseudo tag is obtained, the pre-trained model can be finely adjusted based on the pseudo tag and the real sample, and the three-dimensional target detection model is obtained.
Alternatively, the target propagation range may be a propagation range between adjacent frames, or may be a propagation range between non-adjacent frames.
Optionally, virtual samples may be generated by the cara simulator, which may include depth images and point cloud data acquired by a laser radar and a depth sensor in the cara simulator.
Alternatively, the prediction result of the i-th frame in the first prediction result may be BiPrediction result B of i-th frameiSpecifically, { B1,B2,...,BmAnd B, indicating the coordinates of the center point of the three-dimensional boundary frame, w, h, l, θ, and w, h, and l respectively indicate three side lengths of the three-dimensional boundary frame, and θ indicates a rotation angle of the three-dimensional boundary frame around the y axis.
Optionally, fig. 2 is a second schematic flow chart of the object detection method provided by the present invention, fig. 3 is a third schematic flow chart of the object detection method provided by the present invention, and fig. 2 or fig. 3 are an optional example of the present invention, but are not limited to the present invention; as shown in fig. 2, the three-dimensional object detection model is constructed in a manner including the following steps 201 to 205:
step 201, training an initial three-dimensional target detection model based on a virtual sample and a label of the virtual sample to obtain a pre-trained three-dimensional target detection model;
step 202, inputting a continuous point cloud frame of a real sample into a pre-trained three-dimensional target detection model, and acquiring a first prediction result and a first target confidence coefficient;
optionally, the real sample may be identified by a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, the three-dimensional bounding box regression result may be further screened based on the three-dimensional bounding box uncertainty regression result, and the first prediction result and the first target confidence may be determined based on the screened three-dimensional bounding box regression result.
It is understood that the real sample may include continuous point cloud frames, and the first prediction result may include a prediction result corresponding to each point cloud frame, for example, as shown in fig. 3, in the case that the real sample includes a j-th point cloud frame, the first prediction result may include a prediction result corresponding to the j-th point cloud frame.
Step 203, propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
optionally, the first prediction result may include a prediction result corresponding to each point cloud frame, and then the prediction results corresponding to each point cloud frame in the first prediction result may be propagated along the time dimension, respectively, to obtain the second prediction result.
For example, as shown in fig. 3, the first prediction result may include a prediction result a1 of the (j-1) th frame, a prediction result B1 of the j-1 th frame, and a prediction result C1 of the (j +1) th frame, the prediction result a1 of the (j-1) th frame in the first prediction result may be propagated along the time dimension to a prediction result B2 of the j-th frame, the prediction result C1 of the (j +1) th frame in the first prediction result may be propagated along the time dimension to a prediction result B3 of the j-th frame, and the second prediction result may include a prediction result B2 of the j-th frame and a prediction result B3 of the j-th frame.
It is understood that the first prediction result may include the prediction result B1 of the j-th frame, the second prediction result may include the prediction result B2 of the j-th frame and the prediction result B3 of the j-th frame, and thus, for the j-th frame, the prediction result B1, the prediction result B2 and the prediction result B3 may be obtained.
It will be appreciated that after the second prediction is obtained, the prediction of the real sample may include the first prediction and the second prediction.
Step 204, acquiring a pseudo label based on the first target confidence coefficient, the first prediction result and the second prediction result;
optionally, candidate frames in the first prediction result and the second prediction result may be filtered based on the first target confidence, and a pseudo tag may be determined based on the filtered candidate frames.
For example, as shown in fig. 3, the first prediction result may include a prediction result B1 of a j-th frame, the second prediction result may include a prediction result B2 of the j-th frame, and the prediction result B3 of the j-th frame, and the candidate frame corresponding to the prediction result B1, the candidate frame corresponding to the prediction result B2, and the candidate frame corresponding to the prediction result B3 may be filtered based on the first target confidence, and the filtered candidate frames may be obtained, and then the pseudo label may be determined based on the filtered candidate frames.
Step 205, training (fine tuning) the pre-trained three-dimensional target detection model based on the real sample and the pseudo label, and obtaining the three-dimensional target detection model.
It can be understood that the pre-trained three-dimensional target detection model obtained through virtual sample training has basic target detection capability, detection performance in a real environment may be reduced, the pre-trained three-dimensional target detection model can be trained based on a real sample and a pseudo label, and detection performance of the three-dimensional target detection model in the real environment can be improved.
Therefore, the first prediction result is propagated along the time dimension, the second prediction result can be obtained, an accurate pseudo label can be determined based on the first target confidence coefficient, the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label, the three-dimensional target detection model is obtained, and a good detection effect is achieved.
Optionally, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
Optionally, based on the target loss function, the virtual sample and the label of the virtual sample may be input to the initial three-dimensional target detection model for training until the loss function value corresponding to the target loss function is smaller than the first threshold.
Optionally, based on the target loss function, the virtual samples and the labels of the virtual samples may be input to the initial three-dimensional target detection model for training until the training times are greater than the second threshold.
Alternatively, the classification branch in the three-dimensional target detection model may correspond to a classification loss function, the first regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box regression loss function and a directional classification loss function, and the second regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box uncertainty regression loss function, and then the target loss function may be determined based on the classification loss function, the three-dimensional bounding box regression loss function, the directional classification loss function, and the three-dimensional bounding box uncertainty regression loss function.
Alternatively, the target loss function may be:
Ltotal=β1Lcls2Lreg3Lun_reg4Ldir
wherein L isclsCan be a classification loss function, LregCan be a three-dimensional bounding box regression loss function, LdirCan be a directional classification loss function, Lun_regCan be a three-dimensional bounding box uncertainty regression loss function, beta1、β2、β3And beta4Are hyper-parameters, which are weight coefficients that balance the four loss functions, respectively.
Alternatively, LclsIt may be specifically the Focal loss function, LregSpecifically, the loss function can be Smooth L1 loss function, LdirSpecifically, the loss function may be a Smooth L1 loss function.
Optionally, fig. 4 is a fourth schematic flowchart of the target detection method provided by the present invention, and fig. 4 is an optional example of the present invention, but not limiting the present invention; as shown in fig. 4, the process of training the three-dimensional target detection model may include the following steps 401 to 407:
step 401, based on the point cloud data in the virtual sample and the point cloud data in the real sample, a point cloud database D can be constructed;
optionally, the point cloud database D may include one or more point cloud data DiWherein:
Figure BDA0003499181750000161
wherein D isiRepresents the ith point cloud data, x in the point cloud database Di,yi,ziRepresenting the three-dimensional position information of the ith point in the cloud database D relative to the laser radar, RiAnd the reflectivity of the ith point in the laser point cloud is represented, and N is the number of the point clouds in the laser point cloud.
Step 402, carrying out voxelization coding on point cloud data;
optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a virtual sample in the point cloud database D may be coded, and a first voxelization coding corresponding to the virtual sample may be obtained.
Optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a real sample in the point cloud database D may be coded, and a second voxelization coding corresponding to the real sample may be obtained.
Step 403, performing voxel characteristic extraction on the voxelized code;
optionally, based on a voxel feature extractor in the three-dimensional object detection model, a spatially sparse first voxel feature corresponding to the first pixelization code may be acquired.
Optionally, based on the voxel feature extractor in the three-dimensional object detection model, a spatially sparse second voxel feature corresponding to the second pixelization code may be obtained.
Step 404, acquiring a sample-level feature map based on voxel features;
optionally, the first voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a first spatial feature map is obtained, the first spatial feature map is projected to a top view, dimension compression in the vertical direction may be performed, and a sample-level first feature map may be obtained.
Optionally, the second voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a second spatial feature map is obtained, the second spatial feature map is projected to the top view, dimension compression in the vertical direction may be performed, and a sample-level second feature map may be obtained.
Step 405, generating a network through a candidate area, predicting a sample-level feature map, and acquiring a three-dimensional target detection result;
optionally, the candidate region generation network in the three-dimensional object detection model may include a classification branch, a first regression branch, and a second regression branch.
Optionally, the classification branch may predict the feature map of the sample stage to obtain a classification result.
Optionally, the feature map of the sample level may be predicted through the first regression branch, and a three-dimensional bounding box regression result and a direction classification result are obtained.
Alternatively, the three-dimensional bounding box (3D bounding box) regression result may be { μ }xyzwhlθIn which, { muxyzDenotes the coordinates of the center point of the three-dimensional bounding box, muw、μhAnd mulRespectively representing three side lengths, mu, of the three-dimensional bounding boxθIndicating the rotation angle of the three-dimensional bounding box about the y-axis.
Optionally, the feature map of the sample level may be predicted through the second regression branch, and a three-dimensional bounding box uncertainty regression result is obtained.
Alternatively, the three-dimensional bounding box uncertainty regression result may be { σ }xyzwhlθCorresponding to the three-dimensional bounding box regression result [ mu ]xyzwhlθ}, e.g. σxRepresents μxUncertainty of (2), e.g. sigmayRepresents μyUncertainty of (2).
In step 406, a loss value is obtained based on the target loss function.
Optionally, based on the classification loss function corresponding to the classification branch, a loss value of the classification result may be obtained.
Optionally, based on the three-dimensional bounding box regression loss function corresponding to the first regression branch, a loss value of the three-dimensional bounding box regression result may be obtained.
Optionally, based on the directional classification loss function corresponding to the first regression branch, a loss value of the directional classification result may be obtained.
Optionally, a loss value of the three-dimensional bounding box uncertainty regression result may be obtained based on the three-dimensional bounding box uncertainty regression loss function corresponding to the second regression branch.
Optionally, the loss value corresponding to the target loss function is determined based on the loss value of the classification result, the loss value of the three-dimensional bounding box regression result, the loss value of the direction classification result, and the loss value of the three-dimensional bounding box uncertainty regression result.
In step 407, based on the loss value corresponding to the target loss function, it may be determined whether to end the training.
Alternatively, in a case where the loss function value corresponding to the target loss function is smaller than the first threshold, it may be determined to end the training.
Alternatively, in the case where the number of times of training is greater than the second threshold value, it may be determined to end the training.
Therefore, based on the virtual sample, the label of the virtual sample and the target loss function, the initial three-dimensional target detection model can be trained, and then a pre-trained three-dimensional target detection model can be obtained.
Optionally, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:
Figure BDA0003499181750000191
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure BDA0003499181750000192
wherein, tiTarget regression value, μ, representing the ith dimensioniThree-dimensional bounding box regression result, sigma, representing the ith dimensioniRepresenting the uncertainty regression result of the three-dimensional bounding box of the ith dimension, i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the coordinates of the center point of the three-dimensional bounding box, w, h and l respectively represent three side lengths of the three-dimensional bounding box,theta represents the rotation angle of the three-dimensional bounding box by taking the y axis as the center, and epsilon is a preset constant.
In particular, in case the second regression branch is a regression model based on gaussian distribution, it may be based on Lun_reg_GaussianAnd obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.
In particular, in case the second regression branch is a regression model based on a laplacian distribution, it may be based on Lun_reg_LaplaceAnd obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.
Alternatively, the target regression value tiCan be obtained by the following formula:
Figure BDA0003499181750000193
Figure BDA0003499181750000194
tθ=gθ-aθ
Figure BDA0003499181750000195
wherein, giTruth value, a, of a three-dimensional bounding box representing the ith dimensioniA defined anchor box representing a three-dimensional bounding box of the ith dimension; i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the center point coordinates of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents the rotation angle of the three-dimensional boundary frame with the y axis as the center.
Therefore, the loss value of the uncertainty regression result of the three-dimensional bounding box can be determined through the loss function corresponding to the second regression branch, and the loss value corresponding to the target loss function can be determined by combining the loss value of the classification result, the loss value of the regression result of the three-dimensional bounding box and the loss value of the direction classification result.
Optionally, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
Specifically, based on a pre-trained three-dimensional target detection model, a real sample can be predicted to obtain a third prediction result, and further based on a classification result and a three-dimensional bounding box uncertainty regression result, a first target confidence corresponding to the third prediction result can be determined, and further based on the first target confidence and a third threshold, candidate frames corresponding to the third prediction result can be screened to obtain one or more first target candidate frames, and further all the first target candidate frames can be fused to obtain fused candidate frames, and further based on the fused candidate frames, the first prediction result and the first target confidence corresponding to the first prediction result can be determined.
Optionally, all the first target candidate boxes may be sorted from high to low according to the confidence degrees, and then the first target candidate boxes with overlapped bounding boxes may be screened out based on a Non-Maximum Suppression (NMS) algorithm, so as to obtain the fused candidate boxes, and then based on the fused candidate boxes, the first prediction result and the first target confidence degree corresponding to the first prediction result may be determined.
Therefore, based on the three-dimensional bounding box uncertainty regression result, a first target confidence coefficient can be determined, based on the first target confidence coefficient, candidate boxes in the prediction result can be screened and fused, and further the first prediction result and the first target confidence coefficient corresponding to the first prediction result can be determined.
Optionally, the determining, based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result, a first target confidence corresponding to the third prediction result includes:
using the formula confcls=sigmoid(outputcls) Obtaining a classification confidence conf corresponding to the classification result in the third prediction resultcls,outputclsIs a classification result in the third prediction result;
using the formula confloc=1-∑i∈{x,y,z,w,h,l,θ}sigmoid(σi) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction resultlocWherein σ isiRepresenting a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conftotal=confcls·conflocObtaining a first target confidence conf corresponding to the third prediction resulttotal
In particular, by the classification result and the formula confcls=sigmoid(outputcls) The classification confidence corresponding to the classification result can be determined through the three-dimensional bounding box uncertainty regression result and the formula confloc=1-∑i∈{x,y,z,w,h,l,θ}sigmoid(σi) And/7, determining a positioning uncertainty confidence corresponding to the three-dimensional bounding box uncertainty regression result, and determining a first target confidence corresponding to the third prediction result based on the classification confidence and the positioning uncertainty confidence.
Therefore, the first target confidence corresponding to the third prediction result can be determined through the classification result and the three-dimensional boundary box uncertainty regression result, the candidate boxes in the prediction result can be screened and fused based on the first target confidence, and the first target confidence corresponding to the first prediction result and the first prediction result can be further determined.
Optionally, the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result includes:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
Specifically, after the first prediction result is obtained, based on a point cloud sequence coding and instantaneous positioning and Mapping (SLAM) algorithm corresponding to the real sample, a self-moving trajectory can be obtained, and further based on the target propagation range and the self-moving trajectory, a target transformation matrix can be obtained, and further based on the target transformation matrix, the first prediction result can be transformed, and the second prediction result can be obtained.
Alternatively, the point cloud sequence encoding to which the real sample corresponds may be { P }1,P2,...,PnIn which P is1Point cloud data representing the 1 st time, P2Representing point cloud data at time 2, and so on, PnPoint cloud data representing an nth time instant, where n may be an integer greater than or equal to 1.
Optionally, based on SLAM algorithm, the point cloud sequence may be encoded { P }1,P2,...,PnCalculating to obtain self-moving track pt∈R3×4Where t may be any value from 1 to n.
Optionally, a trajectory based on self-motionpt∈R3×4And a target propagation range, a target transition matrix T from the ith frame to the jth frame can be determinedij
For example, in the case of a target propagation range of k, i ∈ [ j-k, j + k]Trajectory p based on self-movementt∈R3×4And a target propagation range, a target transition matrix T from the ith frame to the jth frame can be determinedijIn the case of i ═ j, TijIs an identity matrix.
Therefore, by the SLAM algorithm, the trajectory of the self-movement can be acquired, and then based on the trajectory of the self-movement and the target propagation range, the target transformation matrix can be determined, and then based on the target transformation matrix and the first prediction result, the second prediction result can be acquired.
Optionally, converting the first prediction result based on the target conversion matrix to obtain the second prediction result, where the converting includes:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Alternatively, the prediction result of the i-th frame in the first prediction result may be BiPrediction result B of i-th frameiSpecifically, { B1,B2,...,BmAnd B, indicating the coordinates of the center point of the three-dimensional boundary frame, w, h, l, θ, and w, h, and l respectively indicate three side lengths of the three-dimensional boundary frame, and θ indicates a rotation angle of the three-dimensional boundary frame around the y axis.
Alternatively, the transformation from the bounding box to the corner of the bounding box may be performedBi8-corner point form V converted into three-dimensional bounding boxi(first corner form).
Alternatively, where the target propagation range is k, i ∈ [ j-k, j + k]Trajectory p based on self-movementt∈R3×4And a target propagation range, a target transition matrix T from the ith frame to the jth frame can be determinedijIn the case of i ═ j, TijIs an identity matrix, and can be based on a target transformation matrix TijDiagonal form ViConverting to obtain angular point form Vj(second corner form).
Optionally based on the corner form Vj(second corner form), a candidate frame B corresponding to the second corner form can be obtainedj
Optionally based on the candidate box BjAnd a first prediction result, a candidate box B can be determinedjThe method comprises the steps of obtaining a corresponding classification result, a three-dimensional boundary box regression result, a direction classification result, a three-dimensional boundary box uncertainty regression result and a first target confidence coefficient. It will be appreciated that all candidate blocks B are based on the first prediction resultjA second prediction may be determined.
Therefore, after the three-dimensional bounding box regression result in the first prediction result is converted into the first corner form, the first corner form can be converted based on the target conversion matrix, the second corner form can be obtained, and the second prediction result can be obtained based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Optionally, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:
applications of
Figure BDA0003499181750000241
Obtaining a second angular point form V corresponding to the j framej
Where k is the target propagation range, tiThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at tiCharacterize the ithIn the case where the three-dimensional object is in motion in a frame, the value of the function I (.) is 0, at tiRepresenting that the value of the function I (.) is 1 under the condition that the three-dimensional target in the ith frame is in a static state; t isijRepresenting the target transition matrix from frame i to frame j, ViIndicating a first corner pattern corresponding to the ith frame.
Specifically, the target transformation matrix T from the ith frame to the jth frame is determinedijThen, can pass through
Figure BDA0003499181750000242
Diagonal point form ViConverting to obtain angular point form Vj(second corner form).
Therefore, based on the target transformation matrix, the first corner form can be transformed to obtain the second corner form, and further, based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result, the second prediction result can be obtained.
Optionally, the obtaining the pseudo tag based on the first target confidence, the first predicted result, and the second predicted result includes:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
Specifically, the first prediction result may include a prediction result of the target frame, the second prediction result may include a prediction result of the target frame, and based on the first target confidence, the first prediction result, and the second prediction result, the second target confidence corresponding to the prediction result of the target frame may be obtained, and further based on the second target confidence and the fourth threshold, the candidate frames corresponding to the prediction result of the target frame may be screened to obtain the second target candidate frame, and further, all the second target candidate frames may be fused to obtain the pseudo tag corresponding to the real sample.
Optionally, all the second target candidate frames may be sorted from high to low according to the confidence, and then the second target candidate frames with overlapped bounding frames may be screened out based on the NMS algorithm, so as to obtain fused candidate frames, and then the pseudo label may be determined based on the fused candidate frames.
Therefore, based on the second target confidence corresponding to the prediction result of the target frame, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and further the pseudo label corresponding to the real sample can be obtained.
Optionally, the obtaining a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result, and the second prediction result includes:
applying in the case that the target frame is the jth frame
Figure BDA0003499181750000251
Figure BDA0003499181750000252
Acquiring a second target confidence conf corresponding to the prediction result of the j framej
Wherein confiAnd representing a first target confidence coefficient corresponding to the prediction result of the ith frame, wherein alpha is a preset attenuation factor.
Specifically, when the target frame is the jth frame, the attenuation factor α may be preset, and the second target confidence conf corresponding to the prediction result of the jth frame may be obtainedj
Alternatively, the preset attenuation factor α may be 0.7, 0.8, 0.9, or the like, which is not limited thereto.
Optionally, conf based on the second target confidencej And a fourth threshold value confthredCandidate frame B corresponding to the prediction result of the j-th framejScreening was carried outObtaining a second target candidate frame B of the jth framej Further, all the second target candidate boxes B of the jth frame may be processedj Sorting from high to low according to the confidence degree, screening out a second target candidate frame with overlapped bounding frames based on an NMS algorithm, and further acquiring the candidate frame of the fused jth frame
Figure BDA0003499181750000261
Therefore, the second target confidence corresponding to the prediction result of the target frame can be obtained, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and the pseudo label corresponding to the real sample can be obtained.
Optionally, the public unmanned driving data set KITTI Odometry may be used as a real sample, and input to a pre-trained three-dimensional target detection model (which may include a SECOND network), so as to obtain a first prediction result and a first target confidence corresponding to the first prediction result, further, based on a target propagation range of a time dimension, propagate the first prediction result along the time dimension, obtain a SECOND prediction result, and further, based on the first target confidence, the first prediction result, and the SECOND prediction result, obtain a pseudo tag; and then based on the real sample and the pseudo label, training the pre-trained three-dimensional target detection model to obtain the three-dimensional target detection model.
Alternatively, the three-dimensional Object Detection model is tested based on the val data set of the KITTI 3D Object Detection, and the three-dimensional Object Detection result of the three-dimensional Object Detection model in the KITTI data set can be obtained, as shown in table 1, the evaluation index of the three-dimensional Object Detection result may include the average accuracy rate of the bird's eye view (BEV AP), the average accuracy rate of the three-dimensional frame (3DAP), Easy, model and Hard represent the simple, medium and difficult samples in the KITTI data set, respectively, the model a is a model obtained by training based on the original SECOND network and the pseudo tag directly generated by the original SECOND network in the related art, and the model B is the three-dimensional Object Detection model (including the SECOND network) provided by the embodiment of the present invention.
TABLE 1 three-dimensional target detection results
Figure BDA0003499181750000271
It can be understood from the data in table 1 that the target detection method provided in the embodiment of the present invention can achieve significant performance improvement on an original model (e.g., an original SECOND network) without using any artificially labeled real data.
The target detection method provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual labeling data, and achieves a good detection effect.
The object detection device provided by the present invention is described below, and the object detection device described below and the object detection method described above may be referred to in correspondence with each other.
Fig. 5 is a schematic structural diagram of the object detection apparatus provided in the present invention, and as shown in fig. 5, the apparatus includes a first obtaining module 501 and a second obtaining module 502, where:
a first obtaining module 501, configured to obtain a target point cloud sequence;
a second obtaining module 502, configured to input the target point cloud sequence into a three-dimensional target detection model, and obtain a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
It can be understood that the target detection apparatus trains the initial three-dimensional target detection model through the virtual sample, so as to obtain a pre-trained three-dimensional target detection model, inputs the real sample into the pre-trained three-dimensional target detection model to obtain a first prediction result, propagates the first prediction result along the time dimension to obtain a second prediction result, determines the pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, and trains the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
The target detection device provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual marking data, and achieves a better detection effect.
Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the second obtaining module is specifically configured to:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
Optionally, the apparatus further comprises a training module configured to:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
Optionally, the training module is specifically configured to:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
Optionally, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:
Figure BDA0003499181750000301
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure BDA0003499181750000302
wherein, tiTarget regression value, μ, representing the ith dimensioniThree-dimensional bounding box regression result, sigma, representing the ith dimensioniAnd representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.
Optionally, the training module is specifically configured to:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
Optionally, the training module is specifically configured to:
using the formula confcls=sigmoid(outputcls) Obtaining a classification confidence conf corresponding to the classification result in the third prediction resultcls,outputclsIs a classification result in the third prediction result;
using the formula confloc=1-∑i∈{x,y,z,w,h,l,θ}sigmoid(σi) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction resultlocWherein σ isiRepresenting a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conftotal=confcls·conflocObtaining a first target confidence conf corresponding to the third prediction resulttotal
Optionally, the training module is specifically configured to:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
Optionally, the training module is specifically configured to:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
Optionally, the training module is specifically configured to:
applications of
Figure BDA0003499181750000311
Obtaining a second angular point form V corresponding to the j framej
Where k is the target propagation range, tiThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at tiThe value of the function I (.) is 0 under the condition that the three-dimensional target in the ith frame is represented in a motion state, and the function I (.) is at tiThe value of the function I (·) is 1 under the condition that the three-dimensional target in the ith frame is represented in a static state; t is a unit ofijRepresenting the target transition matrix from frame i to frame j, ViIndicating a first corner pattern corresponding to the ith frame.
Optionally, the training module is specifically configured to:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
Optionally, the training module is specifically configured to:
applying in the case that the target frame is the jth frame
Figure BDA0003499181750000321
Figure BDA0003499181750000322
Acquiring a second target confidence conf corresponding to the prediction result of the j framej
Wherein confiAnd representing a first target confidence corresponding to the prediction result of the ith frame, wherein alpha is a preset attenuation factor.
The target detection device provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual labeling data, and achieves a good detection effect.
Fig. 6 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a target detection method comprising:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the object detection method provided by the above methods, the method comprising:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing an object detection method provided by the above methods, the method including:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (16)

1. A method of target detection, comprising:
acquiring a target point cloud sequence;
inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by transmitting the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
2. The target detection method of claim 1, wherein the three-dimensional target detection results corresponding to the target point cloud sequence comprise classification results, three-dimensional bounding box regression results, direction classification results, and three-dimensional bounding box uncertainty regression results corresponding to the three-dimensional bounding box regression results, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection results corresponding to the target point cloud sequence comprises:
obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;
obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;
and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.
3. The object detection method according to claim 2, wherein the three-dimensional object detection model is constructed by:
training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;
inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;
propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;
obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;
and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.
4. The method according to claim 3, wherein the training an initial three-dimensional object detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional object detection model comprises:
training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;
the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.
5. The method according to claim 4, wherein, when the second regression branch is a regression model based on Gaussian distribution, the loss function corresponding to the second regression branch is specifically:
Figure FDA0003499181740000021
or
When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:
Figure FDA0003499181740000022
wherein, tiTarget regression value, μ, representing the ith dimensioniThree-dimensional bounding box regression results, σ, representing the ith dimensioniRepresenting the uncertainty regression result of the three-dimensional bounding box of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the coordinates of the center point of the three-dimensional bounding box, w, h and l respectively represent three side lengths of the three-dimensional bounding box, and theta represents that the three-dimensional bounding box takes the y axis as the centerThe rotation angle of the core, ε is a predetermined constant.
6. The target detection method of claim 3, wherein the inputting the real sample into the pre-trained three-dimensional target detection model to obtain the first prediction result and a first target confidence corresponding to the first prediction result comprises:
predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;
determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;
based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;
and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.
7. The method for detecting the target according to claim 6, wherein the determining the confidence of the first target corresponding to the third predicted result based on the classification result in the third predicted result and the three-dimensional bounding box uncertainty regression result in the third predicted result comprises:
using the formula confcls=sigmoid(outputcls) Obtaining a classification confidence conf corresponding to the classification result in the third prediction resultcls,outputclsIs a classification result in the third prediction result;
using the formula confloc=1-∑i∈{x,y,z,w,h,l,θ}sigmoid(σi) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction resultlocWherein σ isiRepresenting a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;
using the formula conftotal=confcls·conflocObtaining a first target confidence conf corresponding to the third prediction resulttotal
8. The target detection method of claim 3, wherein the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result comprises:
constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;
acquiring a target transformation matrix based on the target propagation range and the self-movement track;
and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.
9. The object detection method of claim 8, wherein the converting the first prediction result based on the object transformation matrix to obtain the second prediction result comprises:
acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;
converting the first corner form based on the target conversion matrix to obtain a second corner form;
acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;
and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.
10. The method according to claim 9, wherein the converting the first corner form based on the target transformation matrix to obtain a second corner form comprises:
applications of
Figure FDA0003499181740000041
Obtaining a second angular point form V corresponding to the j framej
Where k is the target propagation range, tiThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at tiThe value of the function I (.) is 0 under the condition that the three-dimensional target in the ith frame is represented in a motion state, and the function I (.) is at tiRepresenting that the value of the function I (.) is 1 under the condition that the three-dimensional target in the ith frame is in a static state; t isijRepresenting the target transition matrix from frame i to frame j, ViIndicating a first corner pattern corresponding to the ith frame.
11. The object detection method of claim 3, wherein said obtaining the pseudo tag based on the first object confidence, the first predicted result, and the second predicted result comprises:
acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;
based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;
and fusing all the second target candidate frames to obtain the pseudo label.
12. The object detection method of claim 11, wherein obtaining a second object confidence corresponding to the prediction result of the object frame based on the first object confidence, the first prediction result, and the second prediction result comprises:
applying in the case that the target frame is the jth frame
Figure FDA0003499181740000051
Figure FDA0003499181740000052
Acquiring a second target confidence conf 'corresponding to the prediction result of the j frame'j
Wherein confiAnd representing a first target confidence coefficient corresponding to the prediction result of the ith frame, wherein alpha is a preset attenuation factor.
13. An object detection device, comprising:
the first acquisition module is used for acquiring a target point cloud sequence;
the second acquisition module is used for inputting the target point cloud sequence into the three-dimensional target detection model and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;
the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.
14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the object detection method according to any of claims 1 to 12 are implemented when the processor executes the program.
15. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the object detection method according to any one of claims 1 to 12.
16. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the object detection method according to any one of claims 1 to 12 when executed by a processor.
CN202210122800.5A 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium Active CN114663879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210122800.5A CN114663879B (en) 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210122800.5A CN114663879B (en) 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114663879A true CN114663879A (en) 2022-06-24
CN114663879B CN114663879B (en) 2023-02-21

Family

ID=82026631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210122800.5A Active CN114663879B (en) 2022-02-09 2022-02-09 Target detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114663879B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152770A (en) * 2023-04-19 2023-05-23 深圳佑驾创新科技有限公司 3D target matching model building method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154118A (en) * 2017-12-25 2018-06-12 北京航空航天大学 A kind of target detection system and method based on adaptive combined filter with multistage detection
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN111462236A (en) * 2020-04-02 2020-07-28 集美大学 Method and system for detecting relative pose between ships
CN111474953A (en) * 2020-03-30 2020-07-31 清华大学 Multi-dynamic-view-angle-coordinated aerial target identification method and system
CN111983600A (en) * 2020-08-31 2020-11-24 杭州海康威视数字技术股份有限公司 Target detection method, device and equipment
CN112257605A (en) * 2020-10-23 2021-01-22 中国科学院自动化研究所 Three-dimensional target detection method, system and device based on self-labeling training sample
US20210042929A1 (en) * 2019-01-22 2021-02-11 Institute Of Automation, Chinese Academy Of Sciences Three-dimensional object detection method and system based on weighted channel features of a point cloud
CN112613378A (en) * 2020-12-17 2021-04-06 上海交通大学 3D target detection method, system, medium and terminal
WO2021081808A1 (en) * 2019-10-30 2021-05-06 深圳市大疆创新科技有限公司 Artificial neural network-based object detection system and method
CN113269098A (en) * 2021-05-27 2021-08-17 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113409410A (en) * 2021-05-19 2021-09-17 杭州电子科技大学 Multi-feature fusion IGV positioning and mapping method based on 3D laser radar
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113763430A (en) * 2021-09-13 2021-12-07 智道网联科技(北京)有限公司 Method, apparatus and computer-readable storage medium for detecting moving object
CN113920370A (en) * 2021-10-25 2022-01-11 上海商汤智能科技有限公司 Model training method, target detection method, device, equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154118A (en) * 2017-12-25 2018-06-12 北京航空航天大学 A kind of target detection system and method based on adaptive combined filter with multistage detection
US20210042929A1 (en) * 2019-01-22 2021-02-11 Institute Of Automation, Chinese Academy Of Sciences Three-dimensional object detection method and system based on weighted channel features of a point cloud
WO2021081808A1 (en) * 2019-10-30 2021-05-06 深圳市大疆创新科技有限公司 Artificial neural network-based object detection system and method
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN111474953A (en) * 2020-03-30 2020-07-31 清华大学 Multi-dynamic-view-angle-coordinated aerial target identification method and system
CN111462236A (en) * 2020-04-02 2020-07-28 集美大学 Method and system for detecting relative pose between ships
CN111983600A (en) * 2020-08-31 2020-11-24 杭州海康威视数字技术股份有限公司 Target detection method, device and equipment
CN112257605A (en) * 2020-10-23 2021-01-22 中国科学院自动化研究所 Three-dimensional target detection method, system and device based on self-labeling training sample
CN112613378A (en) * 2020-12-17 2021-04-06 上海交通大学 3D target detection method, system, medium and terminal
CN113409410A (en) * 2021-05-19 2021-09-17 杭州电子科技大学 Multi-feature fusion IGV positioning and mapping method based on 3D laser radar
CN113269098A (en) * 2021-05-27 2021-08-17 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113763430A (en) * 2021-09-13 2021-12-07 智道网联科技(北京)有限公司 Method, apparatus and computer-readable storage medium for detecting moving object
CN113920370A (en) * 2021-10-25 2022-01-11 上海商汤智能科技有限公司 Model training method, target detection method, device, equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JUNBO YIN 等: "LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
YANGYANG YE 等: "SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection", 《NEUROCOMPUTING》 *
彭湃 等: "基于传感器融合里程计的相机与激光雷达自动重标定方法", 《机械工程学报》 *
程志伟: "基于深度学习的自动驾驶场景识别研究", 《中国优秀硕士学位论文全文数据库 工程科技II辑》 *
裴仪瑶 等: "基于定位不确定性的鲁棒3D目标检测方法", 《计算机应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152770A (en) * 2023-04-19 2023-05-23 深圳佑驾创新科技有限公司 3D target matching model building method and device
CN116152770B (en) * 2023-04-19 2023-09-22 深圳佑驾创新科技股份有限公司 3D target matching model building method and device

Also Published As

Publication number Publication date
CN114663879B (en) 2023-02-21

Similar Documents

Publication Publication Date Title
CN113039563B (en) Learning to generate synthetic data sets for training neural networks
CN109643383B (en) Domain split neural network
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
CN111079619B (en) Method and apparatus for detecting target object in image
CN112529015A (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
CN112347550A (en) Coupling type indoor three-dimensional semantic graph building and modeling method
US20220327730A1 (en) Method for training neural network, system for training neural network, and neural network
CN114758337A (en) Semantic instance reconstruction method, device, equipment and medium
CN116958492B (en) VR editing method for reconstructing three-dimensional base scene rendering based on NeRf
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN114663879B (en) Target detection method and device, electronic equipment and storage medium
CN115147798A (en) Method, model and device for predicting travelable area and vehicle
CN116012387A (en) Virtual view selection method and device for three-dimensional semantic segmentation of indoor scene
CN115620122A (en) Training method of neural network model, image re-recognition method and related equipment
CN114565953A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN115115699A (en) Attitude estimation method and device, related equipment and computer product
CN115331194A (en) Occlusion target detection method and related equipment
CN113487374A (en) Block E-commerce platform transaction system based on 5G network
CN113537258A (en) Action track prediction method and device, computer readable medium and electronic equipment
Wang et al. Keyframe image processing of semantic 3D point clouds based on deep learning
EP4057222A1 (en) Machine-learning for 3d segmentation
Singer View-Agnostic Point Cloud Generation
Anju et al. Faster Training of Edge-attention Aided 6D Pose Estimation Model using Transfer Learning and Small Customized Dataset
Hussein et al. Deep Learning in Distance Awareness Using Deep Learning Method
CN116503601A (en) Multi-view-based point cloud semantic segmentation model, method, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant