CN114663879B

CN114663879B - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN114663879B
Application number: CN202210122800.5A
Authority: CN
Inventors: 张兆翔; 张驰; 陈文博; 裴仪瑶
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2023-02-21
Anticipated expiration: 2042-02-09
Also published as: CN114663879A

Abstract

The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target point cloud sequence; inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence; the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, and the second prediction result is obtained by propagating the first prediction result along the time dimension. According to the embodiment of the invention, the second prediction result can be obtained by transmitting the first prediction result along the time dimension, and then the pseudo label can be obtained based on the first prediction result and the second prediction result, so that the three-dimensional target detection model can be trained under the condition of no artificial labeling data, and a better detection effect is achieved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.

Background

Two-dimensional target detection can only provide two-dimensional position information of an object in a picture, and nowadays, along with the rapid development of application fields such as vehicle unmanned driving, intelligent robots, augmented reality and security frontier defense, information of the object in a three-dimensional space is often needed so as to more accurately position and identify the target. The input of the three-dimensional target detection is two-dimensional images or three-dimensional data, and the output of the three-dimensional target detection is the position of an object bounding box in a three-dimensional space and a classification result.

In the prior art, most of three-dimensional related tasks such as unmanned driving and the like utilize laser radar point cloud data to obtain more accurate three-dimensional space information. The point cloud data is different from the conventional image data, has the properties of high dimensionality, disorder and the like, and also causes the problems of large data annotation difficulty and small scale of a data set, and compared with two-dimensional target detection, the acquisition difficulty of three-dimensional target detection data annotation is higher.

Disclosure of Invention

The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which are used for solving the defect of high difficulty in obtaining three-dimensional target detection data labels in the prior art, realizing the training and obtaining of a three-dimensional target detection model under the condition of no manual labeling data and achieving a better detection effect.

In a first aspect, the present invention provides a target detection method, including:

acquiring a target point cloud sequence;

inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;

the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.

Optionally, according to a target detection method provided by the present invention, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:

obtaining a classification result based on the target point cloud sequence and a classification branch of the three-dimensional target detection model;

obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;

and acquiring an uncertainty regression result of the three-dimensional bounding box based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.

Optionally, according to the target detection method provided by the present invention, the three-dimensional target detection model is constructed in the following manner:

training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;

inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;

propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;

acquiring the pseudo label based on the first target confidence, the first prediction result and the second prediction result;

and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.

Optionally, according to a target detection method provided by the present invention, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:

training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;

the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.

Optionally, according to the target detection method provided by the present invention, in a case that the second regression branch is a regression model based on gaussian distribution, a loss function corresponding to the second regression branch specifically is:

or

When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:

wherein, t _i Target regression value, μ, representing the ith dimension _i Three-dimensional bounding box regression result, sigma, representing the ith dimension _i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.

Optionally, according to a target detection method provided by the present invention, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:

predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;

determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;

based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;

and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.

Optionally, according to a target detection method provided by the present invention, the determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result includes:

using the formula conf _cls ＝sigmoid(output _cls ) Obtaining the classification result pair in the third prediction resultCorresponding classification confidence conf _cls ，output _cls Is a classification result in the third prediction result;

using the formula conf _loc ＝1-∑ _{i∈{x，y，z，w，h，l，θ}} sigmoid(σ _i ) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result _loc Wherein σ is _i Representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;

using the formula conf _total ＝conf _cls ·conf _loc Obtaining a first target confidence conf corresponding to the third prediction result _total 。

Optionally, according to a target detection method provided by the present invention, the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result includes:

constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;

acquiring a target conversion matrix based on the target propagation range and the self-movement track;

and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.

Optionally, according to a target detection method provided by the present invention, the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:

acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;

converting the first corner form based on the target conversion matrix to obtain a second corner form;

acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;

and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.

Optionally, according to a target detection method provided by the present invention, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:

applications of

Obtaining a second angular point form V corresponding to the j frame _j ；

Where k is the target propagation range, s _n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s _n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is _n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is a unit of _nj Representing the target transition matrix from the nth frame to the jth frame, V _n Indicating the first corner form corresponding to the nth frame.

Optionally, according to a target detection method provided by the present invention, the obtaining the pseudo tag based on the first target confidence, the first prediction result, and the second prediction result includes:

acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;

based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;

and fusing all the second target candidate frames to obtain the pseudo label.

Optionally, according to a target detection method provided by the present invention, the obtaining, based on the first target confidence, the first prediction result, and the second prediction result, a second target confidence corresponding to the prediction result of the target frame includes:

applying in the case that the target frame is the jth frame

Acquiring a second target confidence conf 'corresponding to the prediction result of the jth frame' _j ；

Wherein conf _n And representing a first target confidence corresponding to the prediction result of the nth frame, wherein alpha is a preset attenuation factor, and k is the target propagation range.

In a second aspect, the present invention further provides an object detecting apparatus, including:

the first acquisition module is used for acquiring a target point cloud sequence;

the second acquisition module is used for inputting the target point cloud sequence into the three-dimensional target detection model and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the above-mentioned object detection methods.

In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.

In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.

According to the target detection method, the device, the electronic equipment and the storage medium, the initial three-dimensional target detection model is trained through the virtual sample, the pre-trained three-dimensional target detection model can be obtained, the real sample is input into the pre-trained three-dimensional target detection model to obtain the first prediction result, the first prediction result is transmitted along the time dimension to obtain the second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model, the target point cloud sequence is input into the three-dimensional target detection model, the accurate three-dimensional target detection result can be obtained, the training and obtaining of the three-dimensional target detection model can be realized under the condition of no manual labeling data, and the better detection effect is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a target detection method provided by the present invention;

FIG. 2 is a second schematic flow chart of a target detection method provided by the present invention;

FIG. 3 is a third schematic flow chart of a target detection method provided by the present invention;

FIG. 4 is a fourth schematic flowchart of a target detection method provided by the present invention;

FIG. 5 is a schematic structural diagram of an object detecting device provided in the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The object detection method and apparatus of the present invention are described below with reference to the drawings.

Fig. 1 is a schematic flow chart of an object detection method provided by the present invention, and as shown in fig. 1, an execution subject of the object detection method may be an electronic device, such as a mobile phone or a server. The method comprises the following steps 101 to 102:

step 101, acquiring a target point cloud sequence;

step 102, inputting the target point cloud sequence into a three-dimensional target detection model, and obtaining a three-dimensional target detection result corresponding to the target point cloud sequence;

Specifically, after the target point cloud sequence is obtained, the target point cloud sequence may be input to the three-dimensional target detection model, and then a three-dimensional target detection result corresponding to the target point cloud sequence may be obtained.

For example, a target point cloud sequence a of a certain unmanned vehicle may be obtained, and then the target point cloud sequence a may be input into a three-dimensional target detection model, and then a three-dimensional target detection result a corresponding to the target point cloud sequence a may be obtained, where the three-dimensional target may be a target object on a motion trajectory of the unmanned vehicle, and the three-dimensional target detection result a may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.

For example, a target point cloud sequence B of an intelligent robot may be obtained, and then the target point cloud sequence B may be input into a three-dimensional target detection model, and then a three-dimensional target detection result B corresponding to the target point cloud sequence B may be obtained, where the three-dimensional target may be a target object of a motion trajectory of the intelligent robot, and the three-dimensional target detection result B may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.

The above examples are merely illustrative of the embodiments of the present invention, and are not intended to limit the embodiments of the present invention.

It can be understood that the pre-trained three-dimensional target detection model can be obtained by training the initial three-dimensional target detection model through the virtual sample, the real sample is input into the pre-trained three-dimensional target detection model to obtain a first prediction result, the first prediction result is propagated along the time dimension to obtain a second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, and then the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model.

According to the target detection method provided by the invention, the initial three-dimensional target detection model is trained through the virtual sample, the pre-trained three-dimensional target detection model can be obtained, the real sample is input into the pre-trained three-dimensional target detection model to obtain the first prediction result, the first prediction result is transmitted along the time dimension to obtain the second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model, the target point cloud sequence is input into the three-dimensional target detection model, the accurate three-dimensional target detection result can be obtained, the three-dimensional target detection model can be trained and obtained under the condition of no manual labeling data, and the better detection effect can be achieved.

Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:

obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;

and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.

Specifically, through each branch model of the three-dimensional target detection model, the target point cloud sequence can be identified, and a classification result, a three-dimensional boundary frame regression result direction classification result, a three-dimensional boundary frame uncertainty regression result and the like are obtained.

Optionally, through a classification branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a classification result corresponding to the target point cloud sequence is obtained.

Optionally, through a first regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box regression result and a direction classification result corresponding to the target point cloud sequence are obtained.

Optionally, through a second regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result may be obtained, where the three-dimensional bounding box uncertainty regression result may be used to represent uncertainty of the three-dimensional bounding box regression result.

Optionally, the three-dimensional object detection model may include a SECOND regression branch and a SECOND SECOND network, wherein the SECOND regression branch may include a classification branch and a first regression branch.

Optionally, the real sample may be identified by a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, the three-dimensional bounding box regression result may be further screened based on the three-dimensional bounding box uncertainty regression result, and the first prediction result may be determined based on the screened three-dimensional bounding box regression result.

It is understood that the first prediction result may include a three-dimensional bounding box regression result after the filtering, a classification result corresponding to the three-dimensional bounding box regression result after the filtering, and a direction classification result corresponding to the three-dimensional bounding box regression result after the filtering.

Optionally, the target point cloud sequence may be identified through a three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, and then the three-dimensional bounding box regression result may be screened based on the three-dimensional bounding box uncertainty regression result, and then the three-dimensional target detection result may be determined based on the screened three-dimensional bounding box regression result.

It is understood that the three-dimensional target detection result may include a filtered three-dimensional bounding box regression result, a classification result corresponding to the filtered three-dimensional bounding box regression result, and a direction classification result corresponding to the filtered three-dimensional bounding box regression result.

Alternatively, the second regression branch may be a regression model based on a gaussian distribution.

Alternatively, the second regression branch may be a regression model based on a laplace distribution.

It is understood that the three-dimensional bounding box uncertainty regression results may be used to obtain more accurate first prediction results in the training session. Specifically, in the training process, a real sample is input into a pre-trained three-dimensional target detection model, a three-dimensional bounding box regression result and a three-dimensional bounding box uncertainty regression result can be obtained, then candidate boxes in the three-dimensional bounding box regression result can be screened based on the three-dimensional bounding box uncertainty regression result, and a more accurate first prediction result can be obtained based on the screened three-dimensional bounding box regression result.

It can be understood that the three-dimensional bounding box uncertainty regression results can be used to obtain more accurate pseudo labels in the training session. Specifically, in the training process, a plurality of candidate frames can be determined based on a first prediction result and a second prediction result, the candidate frames can be screened based on a three-dimensional boundary frame uncertainty regression result, and then a more accurate pseudo label can be obtained based on the screened candidate frames.

Therefore, the three-dimensional boundary frame uncertainty regression result can be obtained through the second regression branch, the accurate pseudo label can be obtained based on the three-dimensional boundary frame uncertainty regression result, fine tuning training can be carried out on the pre-trained three-dimensional target detection model based on the accurate pseudo label and the real sample, and the three-dimensional target detection model obtained through training can achieve a good detection effect under the condition that no manual labeling is used at all.

Optionally, the three-dimensional object detection model is constructed by:

obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;

Specifically, the initial three-dimensional target detection model is trained through a virtual sample, a pre-trained three-dimensional target detection model can be obtained, a real sample can be identified based on the pre-trained three-dimensional target detection model, a first prediction result and a first target confidence coefficient are obtained, the first prediction result is propagated along a time dimension, a second prediction result can be obtained, the first prediction result and the second prediction result can be screened and fused based on the first target confidence coefficient, a pseudo tag is obtained, the pre-trained model can be finely adjusted based on the pseudo tag and the real sample, and the three-dimensional target detection model is obtained.

Alternatively, the target propagation range may be a propagation range between adjacent frames, or may be a propagation range between non-adjacent frames.

Optionally, virtual samples may be generated by the cara simulator, which may include depth images and point cloud data acquired by a laser radar and a depth sensor in the cara simulator.

Alternatively, the prediction result of the nth frame in the first prediction result may be B _n Prediction result B of nth frame _n Specifically, { B ₁ ,B ₂ ,...,B _m And B = { x, y, z, w, h, l, θ }, { x, y, z } represents coordinates of a center point of the three-dimensional bounding box, w, l, and l respectively represent three side lengths of the three-dimensional bounding box, and θ represents rotation of the three-dimensional bounding box around the y axisAnd (5) rotating the angle.

Optionally, fig. 2 is a second schematic flow chart of the object detection method provided by the present invention, fig. 3 is a third schematic flow chart of the object detection method provided by the present invention, and fig. 2 or fig. 3 are an optional example of the present invention, but are not limited to the present invention; as shown in fig. 2, the three-dimensional object detection model is constructed in a manner including the following steps 201 to 205:

step 201, training an initial three-dimensional target detection model based on a virtual sample and a label of the virtual sample to obtain a pre-trained three-dimensional target detection model;

step 202, inputting a continuous point cloud frame of a real sample into a pre-trained three-dimensional target detection model, and acquiring a first prediction result and a first target confidence coefficient;

optionally, the real sample may be identified through a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, and then the three-dimensional bounding box regression result may be screened based on the three-dimensional bounding box uncertainty regression result, and then the first prediction result and the first target confidence may be determined based on the screened three-dimensional bounding box regression result.

It is understood that the real sample may include continuous point cloud frames, and the first prediction result may include a prediction result corresponding to each point cloud frame, for example, as shown in fig. 3, in the case that the real sample includes a j-th point cloud frame, the first prediction result may include a prediction result corresponding to the j-th point cloud frame.

Step 203, propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;

optionally, the first prediction result may include a prediction result corresponding to each point cloud frame, and then the prediction results corresponding to each point cloud frame in the first prediction result may be propagated along the time dimension, respectively, to obtain the second prediction result.

For example, as shown in fig. 3, the first prediction result may include a prediction result A1 of a (j-1) th frame, a prediction result B1 of a j-th frame, and a prediction result C1 of a (j + 1) th frame, the prediction result A1 of the (j-1) th frame in the first prediction result may be propagated along a time dimension to a prediction result B2 of the j-th frame, the prediction result C1 of the (j + 1) th frame in the first prediction result may be propagated along the time dimension to a prediction result B3 of the j-th frame, and the second prediction result may include a prediction result B2 of the j-th frame and a prediction result B3 of the j-th frame.

It is understood that the first prediction result may include a prediction result B1 of a j-th frame, the second prediction result may include a prediction result B2 of the j-th frame, and a prediction result B3 of the j-th frame, and thus, for the j-th frame, the prediction result B1, the prediction result B2, and the prediction result B3 may be obtained.

It will be appreciated that after the second prediction is obtained, the prediction of the real sample may include the first prediction and the second prediction.

Step 204, acquiring a pseudo label based on the first target confidence coefficient, the first prediction result and the second prediction result;

optionally, candidate frames in the first prediction result and the second prediction result may be filtered based on the first target confidence, and a pseudo tag may be determined based on the filtered candidate frames.

For example, as shown in fig. 3, the first prediction result may include a prediction result B1 of a j-th frame, the second prediction result may include a prediction result B2 of the j-th frame, and a prediction result B3 of the j-th frame, and the candidate frame corresponding to the prediction result B1, the candidate frame corresponding to the prediction result B2, and the candidate frame corresponding to the prediction result B3 may be filtered based on the first target confidence, so as to obtain the filtered candidate frames, and then the pseudo tag may be determined based on the filtered candidate frames.

Step 205, training (fine tuning) the pre-trained three-dimensional target detection model based on the real sample and the pseudo label, and obtaining the three-dimensional target detection model.

It can be understood that the pre-trained three-dimensional target detection model obtained through virtual sample training has basic target detection capability, detection performance in a real environment may be reduced, the pre-trained three-dimensional target detection model can be trained based on a real sample and a pseudo label, and detection performance of the three-dimensional target detection model in the real environment can be improved.

Therefore, the first prediction result is propagated along the time dimension, the second prediction result can be obtained, an accurate pseudo label can be determined based on the first target confidence coefficient, the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label, the three-dimensional target detection model is obtained, and a good detection effect is achieved.

Optionally, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:

Optionally, based on the target loss function, the virtual sample and the label of the virtual sample may be input to the initial three-dimensional target detection model for training until the loss function value corresponding to the target loss function is smaller than the first threshold.

Optionally, based on the target loss function, the virtual samples and the labels of the virtual samples may be input to the initial three-dimensional target detection model for training until the number of training times is greater than the second threshold.

Optionally, the classification branch in the three-dimensional target detection model may correspond to a classification loss function, the first regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box regression loss function and a direction classification loss function, and the second regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box uncertainty regression loss function, so that the target loss function may be determined based on the classification loss function, the three-dimensional bounding box regression loss function, the direction classification loss function, and the three-dimensional bounding box uncertainty regression loss function.

Alternatively, the target loss function may be:

L _total ＝β ₁ L _cls +β ₂ L _reg +β ₃ L _{un_reg} +β ₄ L _dir ；

wherein L is _cls Can be a classification loss function, L _reg Can be a three-dimensional bounding box regression loss function, L _dir Can be a directional classification loss function, L _{un_reg} Can be a three-dimensional bounding box uncertainty regression loss function, beta ₁ 、β ₂ 、β ₃ And beta ₄ Are hyper-parameters, which are weight coefficients that balance the four loss functions, respectively.

Alternatively, L _cls It may be specifically the Focal loss function, L _reg Specifically, the loss function can be Smooth L1 loss function, L _dir Specifically, the Smooth L1 loss function may be used.

Optionally, fig. 4 is a fourth schematic flowchart of the target detection method provided by the present invention, and fig. 4 is an optional example of the present invention, but is not limited to the present invention; as shown in fig. 4, the process of training the three-dimensional target detection model may include the following steps 401 to 407:

step 401, based on the point cloud data in the virtual sample and the point cloud data in the real sample, a point cloud database D can be constructed;

optionally, the point cloud database D may include one or more point cloud data D _i Wherein:

wherein D is _i Representing the ith point cloud data, x in the point cloud database D _i ，y _i ，z _i Representing the ith point in the cloud database D relative to the laserThree-dimensional positional information of arrival, R _i And the reflectivity of the ith point in the laser point cloud is represented, and N is the number of the point clouds in the laser point cloud.

Step 402, carrying out voxelization coding on point cloud data;

optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a virtual sample in the point cloud database D may be coded, and a first voxelization coding corresponding to the virtual sample may be obtained.

Optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a real sample in the point cloud database D may be coded, and a second voxelization coding corresponding to the real sample may be obtained.

Step 403, performing voxel characteristic extraction on the voxelized code;

optionally, based on a voxel feature extractor in the three-dimensional object detection model, a spatially sparse first voxel feature corresponding to the first pixelization code may be acquired.

Optionally, based on the voxel feature extractor in the three-dimensional object detection model, a spatially sparse second voxel feature corresponding to the second pixelization code may be obtained.

Step 404, acquiring a sample-level feature map based on voxel features;

optionally, the first voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a first spatial feature map is obtained, the first spatial feature map is projected to a top view, dimension compression in the vertical direction may be performed, and a sample-level first feature map may be obtained.

Optionally, the second voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a second spatial feature map is obtained, the second spatial feature map is projected to the top view, dimension compression in the vertical direction may be performed, and a sample-level second feature map may be obtained.

Step 405, generating a network through the candidate area, predicting the characteristic diagram of the sample level, and obtaining a three-dimensional target detection result;

optionally, the candidate region generation network in the three-dimensional object detection model may include a classification branch, a first regression branch, and a second regression branch.

Optionally, the classification branch may predict the feature map of the sample stage to obtain a classification result.

Optionally, the feature map of the sample level may be predicted through the first regression branch, and a three-dimensional bounding box regression result and a direction classification result are obtained.

Alternatively, the three-dimensional bounding box (3D bounding box) regression result may be { μ } _x ，μ _y ，μ _z ，μ _w ，μ _h ，μ _l ，μ _θ In which, { mu _x ，μ _y ，μ _z Denotes the coordinates of the center point of the three-dimensional bounding box, μ _w 、μ _h And mu _l Respectively representing three side lengths, mu, of the three-dimensional bounding box _θ Indicating the rotation angle of the three-dimensional bounding box about the y-axis.

Optionally, the feature map of the sample level may be predicted through the second regression branch, and a three-dimensional bounding box uncertainty regression result is obtained.

Alternatively, the three-dimensional bounding box uncertainty regression result may be { σ } _x ，σ _y ，σ _z ，σ _w ，σ _h ，σ _l ，σ _θ } corresponding to the three-dimensional bounding box regression result { mu _x ，μ _y ，μ _z ，μ _w ，μ _h ，μ _l ，μ _θ }, e.g. σ _x Represents μ _x Uncertainty of (2), e.g. sigma _y Represents μ _y Uncertainty of (2).

In step 406, a loss value is obtained based on the target loss function.

Optionally, based on the classification loss function corresponding to the classification branch, a loss value of the classification result may be obtained.

Optionally, based on the three-dimensional bounding box regression loss function corresponding to the first regression branch, a loss value of the three-dimensional bounding box regression result may be obtained.

Optionally, based on the direction classification loss function corresponding to the first regression branch, a loss value of the direction classification result may be obtained.

Optionally, a loss value of the three-dimensional bounding box uncertainty regression result may be obtained based on the three-dimensional bounding box uncertainty regression loss function corresponding to the second regression branch.

Optionally, the loss value corresponding to the target loss function is determined based on the loss value of the classification result, the loss value of the three-dimensional bounding box regression result, the loss value of the direction classification result, and the loss value of the three-dimensional bounding box uncertainty regression result.

In step 407, based on the loss value corresponding to the target loss function, it may be determined whether to end the training.

Alternatively, in a case where the loss function value corresponding to the target loss function is smaller than the first threshold, it may be determined to end the training.

Alternatively, in the case where the number of training times is greater than the second threshold value, it may be determined to end the training.

Therefore, based on the virtual sample, the label of the virtual sample, and the target loss function, the initial three-dimensional target detection model can be trained, and then a pre-trained three-dimensional target detection model can be obtained.

Optionally, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:

or

wherein, t _i Mesh representing the ith dimensionNormalized regression value, μ _i Three-dimensional bounding box regression results, σ, representing the ith dimension _i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.

In particular, in case the second regression branch is a regression model based on gaussian distribution, it may be based on L _{un_reg_Gaussian} And obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.

In particular, in the case where the second regression branch is a regression model based on a laplace distribution, it may be based on L _{un_reg_Laplace} And obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.

Alternatively, the target regression value t _i Can be obtained by the following formula:

wherein, g _i Truth values, a, of a three-dimensional bounding box representing the ith dimension _i A defined anchor box representing a three-dimensional bounding box of the ith dimension; i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the center point coordinates of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents the rotation angle of the three-dimensional boundary frame with the y axis as the center.

Therefore, the loss value of the uncertainty regression result of the three-dimensional bounding box can be determined through the loss function corresponding to the second regression branch, and the loss value corresponding to the target loss function can be determined by combining the loss value of the classification result, the loss value of the regression result of the three-dimensional bounding box and the loss value of the direction classification result.

Optionally, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:

Specifically, based on a pre-trained three-dimensional target detection model, a third prediction result may be obtained by predicting a real sample, and then based on a classification result and a three-dimensional bounding box uncertainty regression result, a first target confidence corresponding to the third prediction result may be determined, and then based on the first target confidence and a third threshold, a candidate frame corresponding to the third prediction result may be screened, and one or more first target candidate frames may be obtained, and then all the first target candidate frames may be fused, and a fused candidate frame may be obtained, and then based on the fused candidate frame, the first prediction result and a first target confidence corresponding to the first prediction result may be determined.

Optionally, all the first target candidate frames may be sorted from high to low according to the confidence degrees, and then the first target candidate frames with overlapped bounding frames may be screened out based on a Non-Maximum Suppression (NMS) algorithm, so as to obtain the fused candidate frames, and then the first prediction result and the first target confidence degree corresponding to the first prediction result may be determined based on the fused candidate frames.

Therefore, based on the three-dimensional bounding box uncertainty regression result, a first target confidence coefficient can be determined, based on the first target confidence coefficient, candidate boxes in the prediction result can be screened and fused, and further the first prediction result and the first target confidence coefficient corresponding to the first prediction result can be determined.

Optionally, the determining, based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result, the first target confidence corresponding to the third prediction result includes:

using the formula conf _cls ＝sigmoid(output _cls ) Obtaining a classification confidence conf corresponding to the classification result in the third prediction result _cls ，output _cls Is a classification result in the third prediction result;

using the formula conf _loc ＝1-∑ _{i∈{x，y，z，w，h，l，θ}} sigmoid(σ _i ) And/7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result _loc Wherein σ is _i Representing an uncertainty regression result of a three-dimensional bounding box of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional bounding box, w, h and l respectively represent three side lengths of the three-dimensional bounding box, and theta represents a rotation angle of the three-dimensional bounding box by taking a y axis as a center;

using the formula conf _total ＝conf _cls ·con _floc Obtaining a first target confidence conf corresponding to the third prediction result _total 。

In particular, by the classification result and the formula conf _cls ＝sigmoid(output _cls ) The classification confidence corresponding to the classification result can be determined, and the result and the common are regressed through the uncertainty of the three-dimensional bounding boxFormula con _floc ＝1-∑ _{i∈{x，y，z，w，h，l，θ}} sigmoid(σ _i ) And/7, determining a positioning uncertainty confidence corresponding to the three-dimensional bounding box uncertainty regression result, and determining a first target confidence corresponding to the third prediction result based on the classification confidence and the positioning uncertainty confidence.

Therefore, the first target confidence corresponding to the third prediction result can be determined through the classification result and the three-dimensional boundary box uncertainty regression result, the candidate boxes in the prediction result can be screened and fused based on the first target confidence, and the first target confidence corresponding to the first prediction result and the first prediction result can be further determined.

Optionally, the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result includes:

constructing a SLAM algorithm based on point cloud sequence coding and instant positioning corresponding to a real sample and a map, and acquiring a self-movement track;

acquiring a target transformation matrix based on the target propagation range and the self-movement track;

Specifically, after the first prediction result is obtained, based on a point cloud sequence coding and instantaneous positioning and Mapping (SLAM) algorithm corresponding to the real sample, a self-moving trajectory can be obtained, and further based on the target propagation range and the self-moving trajectory, a target transformation matrix can be obtained, and further based on the target transformation matrix, the first prediction result can be transformed, and the second prediction result can be obtained.

Alternatively, the point cloud sequence encoding to which the real sample corresponds may be { P } ₁ ，P ₂ ，...，P _num In which P is ₁ Point cloud data representing the 1 st time, P ₂ Representing point cloud data at time 2, and so on, P _num Point cloud data representing a time of num, where num may be greater thanOr an integer equal to 1.

Optionally, based on SLAM algorithm, the point cloud sequence may be encoded { P } ₁ ，P ₂ ，...，P _num Calculating to obtain self-moving track p _t ∈R ^3×4 Where t may be any value from 1 to num.

Optionally, a trajectory p based on self-motion _t ∈R ^3×4 And a target propagation range, a target transition matrix T from the nth frame to the jth frame can be determined _nj 。

For example, where the target propagation range is k, n ∈ [ j-k, j + k]Trajectory p based on self-movement _t ∈R ^3×4 And a target propagation range, a target transition matrix T from the nth frame to the jth frame can be determined _nj In the case of n = j, T _nj Is an identity matrix.

Therefore, by the SLAM algorithm, the trajectory of the self-movement can be acquired, and then based on the trajectory of the self-movement and the target propagation range, the target transformation matrix can be determined, and then based on the target transformation matrix and the first prediction result, the second prediction result can be acquired.

Optionally, converting the first prediction result based on the target conversion matrix to obtain the second prediction result, where the converting includes:

Alternatively, the prediction result of the nth frame in the first prediction result may be B _n Prediction result B of n-th frame _n Specifically, { B ₁ ，B ₂ ，...，B _m And B = { x, y, z, w, h, l, θ }, { x, y, z } represents coordinates of a center point of the three-dimensional boundary frame, w, h, and l respectively represent three side lengths of the three-dimensional boundary frame, and θ represents a rotation angle of the three-dimensional boundary frame with the y axis as the center.

Alternatively, B may be transformed by transformation of the bounding box into the corner of the bounding box _n 8-corner point form V converted into three-dimensional bounding box _n (first corner form).

Optionally, in the case of a target propagation range of k, n ∈ [ j-k, j + k]Trajectory p based on self-movement _t ∈R ^3×4 And a target propagation range, a target transition matrix T from the nth frame to the jth frame can be determined _nj In the case of n = j, T _nj Is an identity matrix, and can be based on a target transformation matrix T _nj Diagonal form V _n Converting to obtain angular point form V _j (second corner form).

Optionally based on the corner form V _j (second corner form), a candidate frame B corresponding to the second corner form can be obtained _j 。

Optionally based on the candidate box B _j And a first prediction result, a candidate box B can be determined _j The method comprises the steps of obtaining a corresponding classification result, a three-dimensional boundary box regression result, a direction classification result, a three-dimensional boundary box uncertainty regression result and a first target confidence coefficient. It will be appreciated that all candidate blocks B are based on the first prediction result _j A second prediction may be determined.

Therefore, after the three-dimensional bounding box regression result in the first prediction result is converted into the first corner form, the first corner form can be converted based on the target conversion matrix, the second corner form can be obtained, and the second prediction result can be obtained based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.

Optionally, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:

applications of

Obtaining a second angular point form V corresponding to the j frame _j ；

Where k is the target propagation range, s _n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s _n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is _n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is _nj Representing the target transition matrix from the nth frame to the jth frame, V _n Indicating the first corner form corresponding to the nth frame.

Specifically, the target conversion matrix T from the nth frame to the jth frame is determined _nj Then, can pass through

Diagonal point form V _n Converting to obtain angular point form V _j (second corner form).

Therefore, based on the target conversion matrix, the first corner form can be converted to obtain the second corner form, and further, based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result, the second prediction result can be obtained.

Optionally, the obtaining the pseudo tag based on the first target confidence, the first predicted result, and the second predicted result includes:

and fusing all the second target candidate frames to obtain the pseudo label.

Specifically, the first prediction result may include a prediction result of the target frame, the second prediction result may include a prediction result of the target frame, and based on the first target confidence, the first prediction result, and the second prediction result, the second target confidence corresponding to the prediction result of the target frame may be obtained, and further based on the second target confidence and the fourth threshold, the candidate frames corresponding to the prediction result of the target frame may be screened to obtain the second target candidate frame, and further, all the second target candidate frames may be fused to obtain the pseudo tag corresponding to the real sample.

Optionally, all the second target candidate frames may be sorted from high to low according to the confidence, and then the second target candidate frames with overlapped bounding frames may be screened out based on the NMS algorithm, so as to obtain fused candidate frames, and then the pseudo label may be determined based on the fused candidate frames.

Therefore, based on the second target confidence corresponding to the prediction result of the target frame, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and further the pseudo label corresponding to the real sample can be obtained.

Optionally, the obtaining a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result, and the second prediction result includes:

applying in the case that the target frame is the jth frame

Wherein conf _n And representing a first target confidence corresponding to the prediction result of the nth frame, wherein alpha is a preset attenuation factor.

Specifically, when the target frame is the jth frame, the attenuation factor α may be preset, and the second target confidence corresponding to the prediction result of the jth frame may be obtainedDegree conf' _j 。

Alternatively, the preset attenuation factor α may be 0.7, 0.8, 0.9, or the like, which is not limited thereto.

Optionally, based on a second target confidence conf' _j And a fourth threshold value conf _thred Candidate frame B corresponding to the prediction result of the j-th frame _j Screening is carried out, and a second target candidate frame B 'of the j frame is obtained' _j Further, all second target candidate frames B 'of the j-th frame may be processed' _j Sorting from high to low according to the confidence degree, screening out a second target candidate frame with overlapped bounding frames based on an NMS algorithm, and further acquiring the candidate frame of the fused jth frame

Therefore, the second target confidence corresponding to the prediction result of the target frame can be obtained, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and the pseudo label corresponding to the real sample can be obtained.

Optionally, the public unmanned driving data set KITTI Odometry may be used as a real sample, and input to a pre-trained three-dimensional target detection model (which may include a SECOND network), so as to obtain a first prediction result and a first target confidence corresponding to the first prediction result, further, based on a target propagation range of a time dimension, propagate the first prediction result along the time dimension, obtain a SECOND prediction result, and further, based on the first target confidence, the first prediction result, and the SECOND prediction result, obtain a pseudo tag; and then based on the real sample and the pseudo label, training the pre-trained three-dimensional target detection model to obtain the three-dimensional target detection model.

Alternatively, the three-dimensional Object Detection model is tested based on the val data set of the KITTI 3D Object Detection, and the three-dimensional Object Detection result of the three-dimensional Object Detection model in the KITTI data set can be obtained, as shown in table 1, the evaluation index of the three-dimensional Object Detection result may include the average accuracy rate of the bird's eye view (BEV AP), the average accuracy rate of the three-dimensional frame (3D AP), easy, model and Hard represent the simple, medium and difficult samples in the KITTI data set, respectively, the model a is a model obtained by training based on the original SECOND network and the pseudo tag directly generated by the original SECOND in the related art, and the model B is the three-dimensional Object Detection model (including the SECOND network) provided by the embodiment of the present invention.

TABLE 1 three-dimensional target detection results

It can be understood from the data in table 1 that the target detection method provided in the embodiment of the present invention can achieve significant performance improvement on an original model (e.g., an original SECOND network) without using any artificially labeled real data.

According to the target detection method provided by the invention, the initial three-dimensional target detection model is trained through the virtual sample, the pre-trained three-dimensional target detection model can be obtained, the real sample is input into the pre-trained three-dimensional target detection model to obtain the first prediction result, the first prediction result is propagated along the time dimension to obtain the second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model, the target point cloud sequence is input into the three-dimensional target detection model, the accurate three-dimensional target detection result can be obtained, the training and obtaining of the three-dimensional target detection model can be realized under the condition of no manual labeling data, and a better detection effect is achieved.

The object detection device provided by the present invention is described below, and the object detection device described below and the object detection method described above may be referred to in correspondence with each other.

Fig. 5 is a schematic structural diagram of the object detection apparatus provided in the present invention, and as shown in fig. 5, the apparatus includes a first obtaining module 501 and a second obtaining module 502, where:

a first obtaining module 501, configured to obtain a target point cloud sequence;

a second obtaining module 502, configured to input the target point cloud sequence into a three-dimensional target detection model, and obtain a three-dimensional target detection result corresponding to the target point cloud sequence;

It can be understood that the target detection apparatus trains the initial three-dimensional target detection model through the virtual sample, so as to obtain a pre-trained three-dimensional target detection model, inputs the real sample into the pre-trained three-dimensional target detection model to obtain a first prediction result, propagates the first prediction result along the time dimension to obtain a second prediction result, determines the pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, and trains the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.

The target detection device provided by the invention can obtain a pre-trained three-dimensional target detection model by training an initial three-dimensional target detection model through a virtual sample, can obtain a first prediction result by inputting a real sample into the pre-trained three-dimensional target detection model, can obtain a second prediction result by transmitting the first prediction result along a time dimension, can determine a pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, can train the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model, further inputs a target point cloud sequence into the three-dimensional target detection model, can obtain an accurate three-dimensional target detection result, and can train and obtain the three-dimensional target detection model without manually marked data and achieve a better detection effect.

Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the second obtaining module is specifically configured to:

Optionally, the apparatus further comprises a training module configured to:

Optionally, the training module is specifically configured to:

Optionally, in a case that the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:

or

wherein, t _i Target regression value, μ, representing the ith dimension _i Three-dimensional bounding box regression results, σ, representing the ith dimension _i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.

Optionally, the training module is specifically configured to:

applications of

Obtaining a second angular point form V corresponding to the j frame _j ；

Optionally, the training module is specifically configured to:

and fusing all the second target candidate frames to obtain the pseudo label.

Optionally, the training module is specifically configured to:

applying in the case that the target frame is the jth frame

Acquiring a second target confidence conf 'corresponding to the prediction result of the j frame' _j ；

Fig. 6 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor) 610, a communication Interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a target detection method comprising:

acquiring a target point cloud sequence;

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the object detection method provided by the above methods, the method comprising:

acquiring a target point cloud sequence;

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the object detection method provided by the above methods, the method including:

acquiring a target point cloud sequence;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of object detection, comprising:

acquiring a target point cloud sequence;

the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by transmitting the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample;

the three-dimensional target detection result corresponding to the target point cloud sequence comprises a classification result, a three-dimensional boundary frame regression result, a direction classification result and a three-dimensional boundary frame uncertainty regression result corresponding to the three-dimensional boundary frame regression result, the target point cloud sequence is input into a three-dimensional target detection model, and the three-dimensional target detection result corresponding to the target point cloud sequence is obtained, and the method comprises the following steps:

acquiring an uncertainty regression result of the three-dimensional bounding box based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model;

the three-dimensional target detection model is constructed in the following way:

training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model;

the target propagation range based on the time dimension propagates the first prediction result along the time dimension to obtain the second prediction result, and the method comprises the following steps:

converting the first prediction result based on the target conversion matrix to obtain a second prediction result;

the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:

obtaining a second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result;

the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:

applications of

Obtaining a second angular point form V corresponding to the j frame _j ；

Where k is the target propagation range, s _n The method is used for representing that the three-dimensional target in the nth frame is in a motion state or a static state; at s _n The value of the function I (.) is 0 when the three-dimensional target in the n-th frame is in a motion state, and s is _n Representing that the value of the function I (.) is 1 under the condition that the three-dimensional target in the nth frame is in a static state; t is _nj Representing the target transition matrix from the nth frame to the jth frame, V _n Indicating the first corner pattern corresponding to the nth frame.

2. The method for detecting the target of claim 1, wherein the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model comprises:

3. The method according to claim 2, wherein, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:

or

wherein, t _i Target regression value representing ith dimension，μ _i Three-dimensional bounding box regression results, σ, representing the ith dimension _i And representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.

4. The target detection method of claim 1, wherein the inputting the real sample into the pre-trained three-dimensional target detection model to obtain the first prediction result and a first target confidence corresponding to the first prediction result comprises:

5. The method for detecting the target according to claim 4, wherein the determining the confidence of the first target corresponding to the third predicted result based on the classification result in the third predicted result and the three-dimensional bounding box uncertainty regression result in the third predicted result comprises:

using the formula conf _loc ＝1-∑ _{i∈{x,y,z,w,h,l,θ}} sigmoid(σ _i ) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result _loc Wherein σ is _i Representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;

6. The object detection method of claim 1, wherein said obtaining the pseudo tag based on the first object confidence, the first predicted result, and the second predicted result comprises:

and fusing all the second target candidate frames to obtain the pseudo label.

7. The object detection method of claim 6, wherein obtaining a second object confidence corresponding to the prediction result of the object frame based on the first object confidence, the first prediction result and the second prediction result comprises:

applying in the case that the target frame is the jth frame

Acquiring a second target confidence conf corresponding to the prediction result of the j frame _j ′；

8. An object detection device, comprising:

applications of

Acquiring a second angular point form V corresponding to the jth frame _j ；

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the object detection method according to any of claims 1 to 7 are implemented when the processor executes the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the object detection method according to any one of claims 1 to 7.