CN114663879A

CN114663879A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN114663879A
Application number: CN202210122800.5A
Authority: CN
Inventors: 张兆翔; 张驰; 陈文博; 裴仪瑶
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2022-06-24
Anticipated expiration: 2042-02-09
Also published as: CN114663879B

Abstract

The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target point cloud sequence; inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence; the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, and the second prediction result is obtained by propagating the first prediction result along the time dimension. According to the embodiment of the invention, the second prediction result can be obtained by transmitting the first prediction result along the time dimension, and the pseudo label can be obtained based on the first prediction result and the second prediction result, so that the three-dimensional target detection model can be trained under the condition of no manual labeling data, and a better detection effect is achieved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.

Background

Two-dimensional target detection can only provide two-dimensional position information of an object in a picture, and nowadays, along with the rapid development of application fields such as vehicle unmanned driving, intelligent robots, augmented reality and security frontier defense, information of the object in a three-dimensional space is often needed so as to more accurately position and identify the target. The input of the three-dimensional target detection is two-dimensional images or three-dimensional data, and the output of the three-dimensional target detection is the position of an object bounding box in a three-dimensional space and a classification result.

In the prior art, most of three-dimensional related tasks such as unmanned driving and the like are to utilize laser radar point cloud data to obtain more accurate three-dimensional space information. The point cloud data is different from the original image data, has properties such as high dimensionality and disorder, and also causes the problems of large data annotation difficulty and small scale of a data set, and compared with two-dimensional target detection, the method has the advantage that the three-dimensional target detection data annotation is more difficult to obtain.

Disclosure of Invention

The invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which are used for solving the defect of high difficulty in obtaining three-dimensional target detection data labels in the prior art, realizing the training and obtaining of a three-dimensional target detection model under the condition of no manual labeling data and achieving a better detection effect.

In a first aspect, the present invention provides a target detection method, including:

acquiring a target point cloud sequence;

inputting the target point cloud sequence into a three-dimensional target detection model, and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;

the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by propagating the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.

Optionally, according to a target detection method provided by the present invention, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:

obtaining the classification result based on the target point cloud sequence and the classification branch of the three-dimensional target detection model;

obtaining a three-dimensional bounding box regression result and the direction classification result based on the target point cloud sequence and a first regression branch of the three-dimensional target detection model;

and acquiring an uncertainty regression result of the three-dimensional boundary frame based on the target point cloud sequence and a second regression branch of the three-dimensional target detection model.

Optionally, according to the target detection method provided by the present invention, the three-dimensional target detection model is constructed in the following manner:

training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model;

inputting the real sample into the pre-trained three-dimensional target detection model, and acquiring the first prediction result and a first target confidence coefficient corresponding to the first prediction result;

propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;

acquiring the pseudo label based on the first target confidence, the first prediction result and the second prediction result;

and training the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.

Optionally, according to a target detection method provided by the present invention, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:

training the initial three-dimensional target detection model based on the virtual sample, the label of the virtual sample and a target loss function until a loss function value corresponding to the target loss function is smaller than a first threshold or until the training times are larger than a second threshold;

the target loss function is determined based on the loss functions corresponding to the classification branches in the three-dimensional target detection model and the loss functions corresponding to the regression branches in the three-dimensional target detection model.

Optionally, according to the target detection method provided by the present invention, in a case that the second regression branch is a regression model based on gaussian distribution, a loss function corresponding to the second regression branch specifically is:

or

When the second regression branch is a regression model based on laplace distribution, the loss function corresponding to the second regression branch is specifically:

wherein, t_iTarget regression value, μ, representing the ith dimension_iThree-dimensional bounding box regression result, sigma, representing the ith dimension_iAnd representing a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, theta represents a rotation angle of the three-dimensional boundary frame by taking the y axis as the center, and epsilon is a preset constant.

Optionally, according to a target detection method provided by the present invention, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:

predicting the real sample based on the pre-trained three-dimensional target detection model to obtain a third prediction result;

determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result;

based on the first target confidence and a third threshold, acquiring a first target candidate frame from candidate frames corresponding to the third prediction result, wherein the first target confidence corresponding to the first target candidate frame is greater than or equal to the third threshold;

and fusing all the first target candidate frames to obtain the first prediction result and a first target confidence corresponding to the first prediction result.

Optionally, according to a target detection method provided by the present invention, the determining a first target confidence corresponding to the third prediction result based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result includes:

using the formula conf_cls＝sigmoid(output_cls) Obtaining a classification confidence conf corresponding to the classification result in the third prediction result_cls，output_clsIs thatA classification result in the third prediction result;

using the formula conf_loc＝1-∑_{i∈{x,y,z,w,h,l,θ}}sigmoid(σ_i) And 7, acquiring a positioning uncertainty confidence conf corresponding to the three-dimensional bounding box uncertainty regression result in the third prediction result_locWherein σ is_iRepresenting a three-dimensional boundary frame uncertainty regression result of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents a central point coordinate of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents a rotation angle of the three-dimensional boundary frame with the y axis as the center;

using the formula conf_total＝conf_cls·conf_locObtaining a first target confidence conf corresponding to the third prediction result_total。

Optionally, according to a target detection method provided by the present invention, the propagating the first prediction result along a time dimension based on a target propagation range of the time dimension to obtain the second prediction result includes:

constructing a SLAM algorithm based on point cloud sequence coding and instant positioning and a map corresponding to a real sample, and acquiring a self-moving track;

acquiring a target transformation matrix based on the target propagation range and the self-movement track;

and converting the first prediction result based on the target conversion matrix to obtain the second prediction result.

Optionally, according to a target detection method provided by the present invention, the converting the first prediction result based on the target conversion matrix to obtain the second prediction result includes:

acquiring a first corner form corresponding to a three-dimensional bounding box regression result in the first prediction result based on the first prediction result;

converting the first corner form based on the target conversion matrix to obtain a second corner form;

acquiring a three-dimensional bounding box regression result corresponding to the second corner form based on the second corner form;

and acquiring the second prediction result based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.

Optionally, according to a target detection method provided by the present invention, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:

applications of

Obtaining a second angular point form V corresponding to the j frame_j；

Where k is the target propagation range, t_iThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at t_iThe value of the function I (.) is 0 under the condition that the three-dimensional target in the ith frame is represented in a motion state, and the function I (.) is at t_iRepresenting that the value of the function I (.) is 1 under the condition that the three-dimensional target in the ith frame is in a static state; t is_ijRepresenting the target transition matrix from frame i to frame j, V_iIndicating a first corner pattern corresponding to the ith frame.

Optionally, according to a target detection method provided by the present invention, the obtaining the pseudo tag based on the first target confidence, the first prediction result, and the second prediction result includes:

acquiring a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result and the second prediction result;

based on the second target confidence and a fourth threshold, acquiring a second target candidate frame from candidate frames corresponding to the prediction result of the target frame, wherein the second target confidence corresponding to the second target candidate frame is greater than or equal to the fourth threshold;

and fusing all the second target candidate frames to obtain the pseudo label.

Optionally, according to a target detection method provided by the present invention, the obtaining, based on the first target confidence, the first prediction result, and the second prediction result, a second target confidence corresponding to the prediction result of the target frame includes:

applying in the case that the target frame is the jth frame

α^|i-j | obtaining a second target confidence conf corresponding to the prediction result of the j-th frame_j ^′；

Wherein conf_iAnd representing a first target confidence corresponding to the prediction result of the ith frame, wherein alpha is a preset attenuation factor.

In a second aspect, the present invention further provides an object detecting apparatus, including:

the first acquisition module is used for acquiring a target point cloud sequence;

the second acquisition module is used for inputting the target point cloud sequence into the three-dimensional target detection model and acquiring a three-dimensional target detection result corresponding to the target point cloud sequence;

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of any of the above-mentioned object detection methods.

In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.

In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.

The target detection method, the device, the electronic equipment and the storage medium provided by the invention can obtain a pre-trained three-dimensional target detection model by training an initial three-dimensional target detection model through a virtual sample, can obtain a first prediction result by inputting a real sample into the pre-trained three-dimensional target detection model, can obtain a second prediction result by transmitting the first prediction result along a time dimension, can determine a pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, can train the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model, further inputs a target point cloud sequence into the three-dimensional target detection model, can obtain an accurate three-dimensional target detection result, and can realize the training of obtaining the three-dimensional target detection model without manually marked data, and achieves better detection effect.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a target detection method provided by the present invention;

FIG. 2 is a second schematic flow chart of a target detection method provided by the present invention;

FIG. 3 is a third schematic flow chart of a target detection method provided by the present invention;

FIG. 4 is a fourth schematic flowchart of a target detection method provided by the present invention;

FIG. 5 is a schematic structural diagram of an object detecting device provided in the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The object detection method and apparatus of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an object detection method provided by the present invention, and as shown in fig. 1, an execution subject of the object detection method may be an electronic device, such as a mobile phone or a server. The method comprises the following steps 101 to 102:

step 101, acquiring a target point cloud sequence;

step 102, inputting the target point cloud sequence into a three-dimensional target detection model, and obtaining a three-dimensional target detection result corresponding to the target point cloud sequence;

Specifically, after the target point cloud sequence is obtained, the target point cloud sequence may be input to the three-dimensional target detection model, and then a three-dimensional target detection result corresponding to the target point cloud sequence may be obtained.

For example, a target point cloud sequence a of a certain unmanned vehicle may be obtained, and then the target point cloud sequence a may be input into a three-dimensional target detection model, and then a three-dimensional target detection result a corresponding to the target point cloud sequence a may be obtained, where the three-dimensional target may be a target object on a motion trajectory of the unmanned vehicle, and the three-dimensional target detection result a may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.

For example, a target point cloud sequence B of an intelligent robot may be obtained, and then the target point cloud sequence B may be input into a three-dimensional target detection model, and then a three-dimensional target detection result B corresponding to the target point cloud sequence B may be obtained, where the three-dimensional target may be a target object of a motion trajectory of the intelligent robot, and the three-dimensional target detection result B may include information such as a spatial position of the three-dimensional target, a length, a width, and a height of the three-dimensional target, and a direction of the three-dimensional target.

The above examples are merely illustrative of the embodiments of the present invention, and are not intended to limit the embodiments of the present invention.

It can be understood that the pre-trained three-dimensional target detection model can be obtained by training the initial three-dimensional target detection model through the virtual sample, the real sample is input into the pre-trained three-dimensional target detection model to obtain a first prediction result, the first prediction result is propagated along the time dimension to obtain a second prediction result, the pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, and then the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain the three-dimensional target detection model.

The target detection method provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual labeling data, and achieves a good detection effect.

Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection result corresponding to the target point cloud sequence includes:

Specifically, through each branch model of the three-dimensional target detection model, the target point cloud sequence can be identified, and a classification result, a three-dimensional boundary frame regression result direction classification result, a three-dimensional boundary frame uncertainty regression result and the like are obtained.

Optionally, through a classification branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a classification result corresponding to the target point cloud sequence is obtained.

Optionally, through a first regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box regression result and a direction classification result corresponding to the target point cloud sequence are obtained.

Optionally, through a second regression branch of the three-dimensional target detection model, the target point cloud sequence may be identified, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result may be obtained, where the three-dimensional bounding box uncertainty regression result may be used to represent uncertainty of the three-dimensional bounding box regression result.

Optionally, the three-dimensional object detection model may include a SECOND regression branch and a SECOND SECOND network, wherein the SECOND regression branch may include a classification branch and a first regression branch.

Optionally, the real sample may be identified by a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, the three-dimensional bounding box regression result may be further screened based on the three-dimensional bounding box uncertainty regression result, and the first prediction result may be determined based on the screened three-dimensional bounding box regression result.

It is understood that the first prediction result may include a three-dimensional bounding box regression result after the filtering, a classification result corresponding to the three-dimensional bounding box regression result after the filtering, and a direction classification result corresponding to the three-dimensional bounding box regression result after the filtering.

Optionally, the target point cloud sequence may be identified through a three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, and then the three-dimensional bounding box regression result may be screened based on the three-dimensional bounding box uncertainty regression result, and then the three-dimensional target detection result may be determined based on the screened three-dimensional bounding box regression result.

It is understood that the three-dimensional target detection result may include a three-dimensional bounding box regression result after being screened, a classification result corresponding to the three-dimensional bounding box regression result after being screened, and a direction classification result corresponding to the three-dimensional bounding box regression result after being screened.

Alternatively, the second regression branch may be a regression model based on a gaussian distribution.

Alternatively, the second regression branch may be a regression model based on a laplace distribution.

It is understood that the three-dimensional bounding box uncertainty regression results may be used to obtain more accurate first prediction results in the training session. Specifically, in the training process, a real sample is input into a pre-trained three-dimensional target detection model, a three-dimensional bounding box regression result and a three-dimensional bounding box uncertainty regression result can be obtained, then candidate boxes in the three-dimensional bounding box regression result can be screened based on the three-dimensional bounding box uncertainty regression result, and a more accurate first prediction result can be obtained based on the screened three-dimensional bounding box regression result.

It can be understood that the three-dimensional bounding box uncertainty regression results can be used to obtain more accurate pseudo labels in the training session. Specifically, in the training process, a plurality of candidate frames can be determined based on a first prediction result and a second prediction result, the candidate frames can be screened based on a three-dimensional boundary frame uncertainty regression result, and then a more accurate pseudo label can be obtained based on the screened candidate frames.

Therefore, the three-dimensional boundary frame uncertainty regression result can be obtained through the second regression branch, the accurate pseudo label can be obtained based on the three-dimensional boundary frame uncertainty regression result, fine tuning training can be carried out on the pre-trained three-dimensional target detection model based on the accurate pseudo label and the real sample, and the three-dimensional target detection model obtained through training can achieve a good detection effect under the condition that no manual labeling is used at all.

Optionally, the three-dimensional object detection model is constructed by:

obtaining the pseudo label based on the first target confidence, the first prediction result and the second prediction result;

Specifically, the initial three-dimensional target detection model is trained through a virtual sample, a pre-trained three-dimensional target detection model can be obtained, a real sample can be identified based on the pre-trained three-dimensional target detection model, a first prediction result and a first target confidence coefficient are obtained, the first prediction result is propagated along a time dimension, a second prediction result can be obtained, the first prediction result and the second prediction result can be screened and fused based on the first target confidence coefficient, a pseudo tag is obtained, the pre-trained model can be finely adjusted based on the pseudo tag and the real sample, and the three-dimensional target detection model is obtained.

Alternatively, the target propagation range may be a propagation range between adjacent frames, or may be a propagation range between non-adjacent frames.

Optionally, virtual samples may be generated by the cara simulator, which may include depth images and point cloud data acquired by a laser radar and a depth sensor in the cara simulator.

Alternatively, the prediction result of the i-th frame in the first prediction result may be B_iPrediction result B of i-th frame_iSpecifically, { B₁,B₂,...,B_mAnd B, indicating the coordinates of the center point of the three-dimensional boundary frame, w, h, l, θ, and w, h, and l respectively indicate three side lengths of the three-dimensional boundary frame, and θ indicates a rotation angle of the three-dimensional boundary frame around the y axis.

Optionally, fig. 2 is a second schematic flow chart of the object detection method provided by the present invention, fig. 3 is a third schematic flow chart of the object detection method provided by the present invention, and fig. 2 or fig. 3 are an optional example of the present invention, but are not limited to the present invention; as shown in fig. 2, the three-dimensional object detection model is constructed in a manner including the following steps 201 to 205:

step 201, training an initial three-dimensional target detection model based on a virtual sample and a label of the virtual sample to obtain a pre-trained three-dimensional target detection model;

step 202, inputting a continuous point cloud frame of a real sample into a pre-trained three-dimensional target detection model, and acquiring a first prediction result and a first target confidence coefficient;

optionally, the real sample may be identified by a pre-trained three-dimensional target detection model, a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result may be obtained, the three-dimensional bounding box regression result may be further screened based on the three-dimensional bounding box uncertainty regression result, and the first prediction result and the first target confidence may be determined based on the screened three-dimensional bounding box regression result.

It is understood that the real sample may include continuous point cloud frames, and the first prediction result may include a prediction result corresponding to each point cloud frame, for example, as shown in fig. 3, in the case that the real sample includes a j-th point cloud frame, the first prediction result may include a prediction result corresponding to the j-th point cloud frame.

Step 203, propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain a second prediction result;

optionally, the first prediction result may include a prediction result corresponding to each point cloud frame, and then the prediction results corresponding to each point cloud frame in the first prediction result may be propagated along the time dimension, respectively, to obtain the second prediction result.

For example, as shown in fig. 3, the first prediction result may include a prediction result a1 of the (j-1) th frame, a prediction result B1 of the j-1 th frame, and a prediction result C1 of the (j +1) th frame, the prediction result a1 of the (j-1) th frame in the first prediction result may be propagated along the time dimension to a prediction result B2 of the j-th frame, the prediction result C1 of the (j +1) th frame in the first prediction result may be propagated along the time dimension to a prediction result B3 of the j-th frame, and the second prediction result may include a prediction result B2 of the j-th frame and a prediction result B3 of the j-th frame.

It is understood that the first prediction result may include the prediction result B1 of the j-th frame, the second prediction result may include the prediction result B2 of the j-th frame and the prediction result B3 of the j-th frame, and thus, for the j-th frame, the prediction result B1, the prediction result B2 and the prediction result B3 may be obtained.

It will be appreciated that after the second prediction is obtained, the prediction of the real sample may include the first prediction and the second prediction.

Step 204, acquiring a pseudo label based on the first target confidence coefficient, the first prediction result and the second prediction result;

optionally, candidate frames in the first prediction result and the second prediction result may be filtered based on the first target confidence, and a pseudo tag may be determined based on the filtered candidate frames.

For example, as shown in fig. 3, the first prediction result may include a prediction result B1 of a j-th frame, the second prediction result may include a prediction result B2 of the j-th frame, and the prediction result B3 of the j-th frame, and the candidate frame corresponding to the prediction result B1, the candidate frame corresponding to the prediction result B2, and the candidate frame corresponding to the prediction result B3 may be filtered based on the first target confidence, and the filtered candidate frames may be obtained, and then the pseudo label may be determined based on the filtered candidate frames.

Step 205, training (fine tuning) the pre-trained three-dimensional target detection model based on the real sample and the pseudo label, and obtaining the three-dimensional target detection model.

It can be understood that the pre-trained three-dimensional target detection model obtained through virtual sample training has basic target detection capability, detection performance in a real environment may be reduced, the pre-trained three-dimensional target detection model can be trained based on a real sample and a pseudo label, and detection performance of the three-dimensional target detection model in the real environment can be improved.

Therefore, the first prediction result is propagated along the time dimension, the second prediction result can be obtained, an accurate pseudo label can be determined based on the first target confidence coefficient, the first prediction result and the second prediction result, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label, the three-dimensional target detection model is obtained, and a good detection effect is achieved.

Optionally, the training an initial three-dimensional target detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional target detection model includes:

Optionally, based on the target loss function, the virtual sample and the label of the virtual sample may be input to the initial three-dimensional target detection model for training until the loss function value corresponding to the target loss function is smaller than the first threshold.

Optionally, based on the target loss function, the virtual samples and the labels of the virtual samples may be input to the initial three-dimensional target detection model for training until the training times are greater than the second threshold.

Alternatively, the classification branch in the three-dimensional target detection model may correspond to a classification loss function, the first regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box regression loss function and a directional classification loss function, and the second regression branch in the three-dimensional target detection model may correspond to a three-dimensional bounding box uncertainty regression loss function, and then the target loss function may be determined based on the classification loss function, the three-dimensional bounding box regression loss function, the directional classification loss function, and the three-dimensional bounding box uncertainty regression loss function.

Alternatively, the target loss function may be:

L_total＝β₁L_cls+β₂L_reg+β₃L_{un_reg}+β₄L_dir；

wherein L is_clsCan be a classification loss function, L_regCan be a three-dimensional bounding box regression loss function, L_dirCan be a directional classification loss function, L_{un_reg}Can be a three-dimensional bounding box uncertainty regression loss function, beta₁、β₂、β₃And beta₄Are hyper-parameters, which are weight coefficients that balance the four loss functions, respectively.

Alternatively, L_clsIt may be specifically the Focal loss function, L_regSpecifically, the loss function can be Smooth L1 loss function, L_dirSpecifically, the loss function may be a Smooth L1 loss function.

Optionally, fig. 4 is a fourth schematic flowchart of the target detection method provided by the present invention, and fig. 4 is an optional example of the present invention, but not limiting the present invention; as shown in fig. 4, the process of training the three-dimensional target detection model may include the following steps 401 to 407:

step 401, based on the point cloud data in the virtual sample and the point cloud data in the real sample, a point cloud database D can be constructed;

optionally, the point cloud database D may include one or more point cloud data D_iWherein:

wherein D is_iRepresents the ith point cloud data, x in the point cloud database D_i,y_i,z_iRepresenting the three-dimensional position information of the ith point in the cloud database D relative to the laser radar, R_iAnd the reflectivity of the ith point in the laser point cloud is represented, and N is the number of the point clouds in the laser point cloud.

Step 402, carrying out voxelization coding on point cloud data;

optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a virtual sample in the point cloud database D may be coded, and a first voxelization coding corresponding to the virtual sample may be obtained.

Optionally, through a voxelization coding layer in the three-dimensional target detection model, point cloud data corresponding to a real sample in the point cloud database D may be coded, and a second voxelization coding corresponding to the real sample may be obtained.

Step 403, performing voxel characteristic extraction on the voxelized code;

optionally, based on a voxel feature extractor in the three-dimensional object detection model, a spatially sparse first voxel feature corresponding to the first pixelization code may be acquired.

Optionally, based on the voxel feature extractor in the three-dimensional object detection model, a spatially sparse second voxel feature corresponding to the second pixelization code may be obtained.

Step 404, acquiring a sample-level feature map based on voxel features;

optionally, the first voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a first spatial feature map is obtained, the first spatial feature map is projected to a top view, dimension compression in the vertical direction may be performed, and a sample-level first feature map may be obtained.

Optionally, the second voxel feature may be encoded through a sparse convolution layer in the three-dimensional target detection model, a second spatial feature map is obtained, the second spatial feature map is projected to the top view, dimension compression in the vertical direction may be performed, and a sample-level second feature map may be obtained.

Step 405, generating a network through a candidate area, predicting a sample-level feature map, and acquiring a three-dimensional target detection result;

optionally, the candidate region generation network in the three-dimensional object detection model may include a classification branch, a first regression branch, and a second regression branch.

Optionally, the classification branch may predict the feature map of the sample stage to obtain a classification result.

Optionally, the feature map of the sample level may be predicted through the first regression branch, and a three-dimensional bounding box regression result and a direction classification result are obtained.

Alternatively, the three-dimensional bounding box (3D bounding box) regression result may be { μ }_x,μ_y,μ_z,μ_w,μ_h,μ_l,μ_θIn which, { mu_x,μ_y,μ_zDenotes the coordinates of the center point of the three-dimensional bounding box, mu_w、μ_hAnd mu_lRespectively representing three side lengths, mu, of the three-dimensional bounding box_θIndicating the rotation angle of the three-dimensional bounding box about the y-axis.

Optionally, the feature map of the sample level may be predicted through the second regression branch, and a three-dimensional bounding box uncertainty regression result is obtained.

Alternatively, the three-dimensional bounding box uncertainty regression result may be { σ }_x,σ_y,σ_z,σ_w,σ_h,σ_l,σ_θCorresponding to the three-dimensional bounding box regression result [ mu ]_x,μ_y,μ_z,μ_w,μ_h,μ_l,μ_θ}, e.g. σ_xRepresents μ_xUncertainty of (2), e.g. sigma_yRepresents μ_yUncertainty of (2).

In step 406, a loss value is obtained based on the target loss function.

Optionally, based on the classification loss function corresponding to the classification branch, a loss value of the classification result may be obtained.

Optionally, based on the three-dimensional bounding box regression loss function corresponding to the first regression branch, a loss value of the three-dimensional bounding box regression result may be obtained.

Optionally, based on the directional classification loss function corresponding to the first regression branch, a loss value of the directional classification result may be obtained.

Optionally, a loss value of the three-dimensional bounding box uncertainty regression result may be obtained based on the three-dimensional bounding box uncertainty regression loss function corresponding to the second regression branch.

Optionally, the loss value corresponding to the target loss function is determined based on the loss value of the classification result, the loss value of the three-dimensional bounding box regression result, the loss value of the direction classification result, and the loss value of the three-dimensional bounding box uncertainty regression result.

In step 407, based on the loss value corresponding to the target loss function, it may be determined whether to end the training.

Alternatively, in a case where the loss function value corresponding to the target loss function is smaller than the first threshold, it may be determined to end the training.

Alternatively, in the case where the number of times of training is greater than the second threshold value, it may be determined to end the training.

Therefore, based on the virtual sample, the label of the virtual sample and the target loss function, the initial three-dimensional target detection model can be trained, and then a pre-trained three-dimensional target detection model can be obtained.

Optionally, when the second regression branch is a regression model based on gaussian distribution, the loss function corresponding to the second regression branch is specifically:

or

wherein, t_iTarget regression value, μ, representing the ith dimension_iThree-dimensional bounding box regression result, sigma, representing the ith dimension_iRepresenting the uncertainty regression result of the three-dimensional bounding box of the ith dimension, i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the coordinates of the center point of the three-dimensional bounding box, w, h and l respectively represent three side lengths of the three-dimensional bounding box,theta represents the rotation angle of the three-dimensional bounding box by taking the y axis as the center, and epsilon is a preset constant.

In particular, in case the second regression branch is a regression model based on gaussian distribution, it may be based on L_{un_reg_Gaussian}And obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.

In particular, in case the second regression branch is a regression model based on a laplacian distribution, it may be based on L_{un_reg_Laplace}And obtaining a loss value of the uncertainty regression result of the three-dimensional bounding box.

Alternatively, the target regression value t_iCan be obtained by the following formula:

t_θ＝g_θ-a_θ；

wherein, g_iTruth value, a, of a three-dimensional bounding box representing the ith dimension_iA defined anchor box representing a three-dimensional bounding box of the ith dimension; i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the center point coordinates of the three-dimensional boundary frame, w, h and l respectively represent three side lengths of the three-dimensional boundary frame, and theta represents the rotation angle of the three-dimensional boundary frame with the y axis as the center.

Therefore, the loss value of the uncertainty regression result of the three-dimensional bounding box can be determined through the loss function corresponding to the second regression branch, and the loss value corresponding to the target loss function can be determined by combining the loss value of the classification result, the loss value of the regression result of the three-dimensional bounding box and the loss value of the direction classification result.

Optionally, the inputting the real sample into the pre-trained three-dimensional target detection model, and obtaining the first prediction result and a first target confidence corresponding to the first prediction result, includes:

Specifically, based on a pre-trained three-dimensional target detection model, a real sample can be predicted to obtain a third prediction result, and further based on a classification result and a three-dimensional bounding box uncertainty regression result, a first target confidence corresponding to the third prediction result can be determined, and further based on the first target confidence and a third threshold, candidate frames corresponding to the third prediction result can be screened to obtain one or more first target candidate frames, and further all the first target candidate frames can be fused to obtain fused candidate frames, and further based on the fused candidate frames, the first prediction result and the first target confidence corresponding to the first prediction result can be determined.

Optionally, all the first target candidate boxes may be sorted from high to low according to the confidence degrees, and then the first target candidate boxes with overlapped bounding boxes may be screened out based on a Non-Maximum Suppression (NMS) algorithm, so as to obtain the fused candidate boxes, and then based on the fused candidate boxes, the first prediction result and the first target confidence degree corresponding to the first prediction result may be determined.

Therefore, based on the three-dimensional bounding box uncertainty regression result, a first target confidence coefficient can be determined, based on the first target confidence coefficient, candidate boxes in the prediction result can be screened and fused, and further the first prediction result and the first target confidence coefficient corresponding to the first prediction result can be determined.

Optionally, the determining, based on the classification result in the third prediction result and the three-dimensional bounding box uncertainty regression result in the third prediction result, a first target confidence corresponding to the third prediction result includes:

using the formula conf_cls＝sigmoid(output_cls) Obtaining a classification confidence conf corresponding to the classification result in the third prediction result_cls，output_clsIs a classification result in the third prediction result;

In particular, by the classification result and the formula conf_cls＝sigmoid(output_cls) The classification confidence corresponding to the classification result can be determined through the three-dimensional bounding box uncertainty regression result and the formula conf_loc＝1-∑_{i∈{x,y,z,w,h,l,θ}}sigmoid(σ_i) And/7, determining a positioning uncertainty confidence corresponding to the three-dimensional bounding box uncertainty regression result, and determining a first target confidence corresponding to the third prediction result based on the classification confidence and the positioning uncertainty confidence.

Therefore, the first target confidence corresponding to the third prediction result can be determined through the classification result and the three-dimensional boundary box uncertainty regression result, the candidate boxes in the prediction result can be screened and fused based on the first target confidence, and the first target confidence corresponding to the first prediction result and the first prediction result can be further determined.

Optionally, the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result includes:

Specifically, after the first prediction result is obtained, based on a point cloud sequence coding and instantaneous positioning and Mapping (SLAM) algorithm corresponding to the real sample, a self-moving trajectory can be obtained, and further based on the target propagation range and the self-moving trajectory, a target transformation matrix can be obtained, and further based on the target transformation matrix, the first prediction result can be transformed, and the second prediction result can be obtained.

Alternatively, the point cloud sequence encoding to which the real sample corresponds may be { P }₁,P₂,...,P_nIn which P is₁Point cloud data representing the 1 st time, P₂Representing point cloud data at time 2, and so on, P_nPoint cloud data representing an nth time instant, where n may be an integer greater than or equal to 1.

Optionally, based on SLAM algorithm, the point cloud sequence may be encoded { P }₁,P₂,...,P_nCalculating to obtain self-moving track p_t∈R^3×4Where t may be any value from 1 to n.

Optionally, a trajectory based on self-motionp_t∈R^3×4And a target propagation range, a target transition matrix T from the ith frame to the jth frame can be determined_ij。

For example, in the case of a target propagation range of k, i ∈ [ j-k, j + k]Trajectory p based on self-movement_t∈R^3×4And a target propagation range, a target transition matrix T from the ith frame to the jth frame can be determined_ijIn the case of i ═ j, T_ijIs an identity matrix.

Therefore, by the SLAM algorithm, the trajectory of the self-movement can be acquired, and then based on the trajectory of the self-movement and the target propagation range, the target transformation matrix can be determined, and then based on the target transformation matrix and the first prediction result, the second prediction result can be acquired.

Optionally, converting the first prediction result based on the target conversion matrix to obtain the second prediction result, where the converting includes:

Alternatively, the transformation from the bounding box to the corner of the bounding box may be performedB_i8-corner point form V converted into three-dimensional bounding box_i(first corner form).

Alternatively, where the target propagation range is k, i ∈ [ j-k, j + k]Trajectory p based on self-movement_t∈R^3×4And a target propagation range, a target transition matrix T from the ith frame to the jth frame can be determined_ijIn the case of i ═ j, T_ijIs an identity matrix, and can be based on a target transformation matrix T_ijDiagonal form V_iConverting to obtain angular point form V_j(second corner form).

Optionally based on the corner form V_j(second corner form), a candidate frame B corresponding to the second corner form can be obtained_j。

Optionally based on the candidate box B_jAnd a first prediction result, a candidate box B can be determined_jThe method comprises the steps of obtaining a corresponding classification result, a three-dimensional boundary box regression result, a direction classification result, a three-dimensional boundary box uncertainty regression result and a first target confidence coefficient. It will be appreciated that all candidate blocks B are based on the first prediction result_jA second prediction may be determined.

Therefore, after the three-dimensional bounding box regression result in the first prediction result is converted into the first corner form, the first corner form can be converted based on the target conversion matrix, the second corner form can be obtained, and the second prediction result can be obtained based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result.

Optionally, the converting the first corner form based on the target conversion matrix to obtain a second corner form includes:

applications of

Obtaining a second angular point form V corresponding to the j frame_j；

Where k is the target propagation range, t_iThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at t_iCharacterize the ithIn the case where the three-dimensional object is in motion in a frame, the value of the function I (.) is 0, at t_iRepresenting that the value of the function I (.) is 1 under the condition that the three-dimensional target in the ith frame is in a static state; t is_ijRepresenting the target transition matrix from frame i to frame j, V_iIndicating a first corner pattern corresponding to the ith frame.

Specifically, the target transformation matrix T from the ith frame to the jth frame is determined_ijThen, can pass through

Diagonal point form V_iConverting to obtain angular point form V_j(second corner form).

Therefore, based on the target transformation matrix, the first corner form can be transformed to obtain the second corner form, and further, based on the three-dimensional bounding box regression result corresponding to the second corner form and the first prediction result, the second prediction result can be obtained.

Optionally, the obtaining the pseudo tag based on the first target confidence, the first predicted result, and the second predicted result includes:

and fusing all the second target candidate frames to obtain the pseudo label.

Specifically, the first prediction result may include a prediction result of the target frame, the second prediction result may include a prediction result of the target frame, and based on the first target confidence, the first prediction result, and the second prediction result, the second target confidence corresponding to the prediction result of the target frame may be obtained, and further based on the second target confidence and the fourth threshold, the candidate frames corresponding to the prediction result of the target frame may be screened to obtain the second target candidate frame, and further, all the second target candidate frames may be fused to obtain the pseudo tag corresponding to the real sample.

Optionally, all the second target candidate frames may be sorted from high to low according to the confidence, and then the second target candidate frames with overlapped bounding frames may be screened out based on the NMS algorithm, so as to obtain fused candidate frames, and then the pseudo label may be determined based on the fused candidate frames.

Therefore, based on the second target confidence corresponding to the prediction result of the target frame, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and further the pseudo label corresponding to the real sample can be obtained.

Optionally, the obtaining a second target confidence corresponding to the prediction result of the target frame based on the first target confidence, the first prediction result, and the second prediction result includes:

applying in the case that the target frame is the jth frame

Acquiring a second target confidence conf corresponding to the prediction result of the j frame_j ^′；

Wherein conf_iAnd representing a first target confidence coefficient corresponding to the prediction result of the ith frame, wherein alpha is a preset attenuation factor.

Specifically, when the target frame is the jth frame, the attenuation factor α may be preset, and the second target confidence conf corresponding to the prediction result of the jth frame may be obtained_j ^′。

Alternatively, the preset attenuation factor α may be 0.7, 0.8, 0.9, or the like, which is not limited thereto.

Optionally, conf based on the second target confidence_j ^′And a fourth threshold value conf_thredCandidate frame B corresponding to the prediction result of the j-th frame_jScreening was carried outObtaining a second target candidate frame B of the jth frame_j ^′Further, all the second target candidate boxes B of the jth frame may be processed_j ^′Sorting from high to low according to the confidence degree, screening out a second target candidate frame with overlapped bounding frames based on an NMS algorithm, and further acquiring the candidate frame of the fused jth frame

Therefore, the second target confidence corresponding to the prediction result of the target frame can be obtained, the candidate frames corresponding to the prediction result of the target frame can be screened and fused, and the pseudo label corresponding to the real sample can be obtained.

Optionally, the public unmanned driving data set KITTI Odometry may be used as a real sample, and input to a pre-trained three-dimensional target detection model (which may include a SECOND network), so as to obtain a first prediction result and a first target confidence corresponding to the first prediction result, further, based on a target propagation range of a time dimension, propagate the first prediction result along the time dimension, obtain a SECOND prediction result, and further, based on the first target confidence, the first prediction result, and the SECOND prediction result, obtain a pseudo tag; and then based on the real sample and the pseudo label, training the pre-trained three-dimensional target detection model to obtain the three-dimensional target detection model.

Alternatively, the three-dimensional Object Detection model is tested based on the val data set of the KITTI 3D Object Detection, and the three-dimensional Object Detection result of the three-dimensional Object Detection model in the KITTI data set can be obtained, as shown in table 1, the evaluation index of the three-dimensional Object Detection result may include the average accuracy rate of the bird's eye view (BEV AP), the average accuracy rate of the three-dimensional frame (3DAP), Easy, model and Hard represent the simple, medium and difficult samples in the KITTI data set, respectively, the model a is a model obtained by training based on the original SECOND network and the pseudo tag directly generated by the original SECOND network in the related art, and the model B is the three-dimensional Object Detection model (including the SECOND network) provided by the embodiment of the present invention.

TABLE 1 three-dimensional target detection results

It can be understood from the data in table 1 that the target detection method provided in the embodiment of the present invention can achieve significant performance improvement on an original model (e.g., an original SECOND network) without using any artificially labeled real data.

The object detection device provided by the present invention is described below, and the object detection device described below and the object detection method described above may be referred to in correspondence with each other.

Fig. 5 is a schematic structural diagram of the object detection apparatus provided in the present invention, and as shown in fig. 5, the apparatus includes a first obtaining module 501 and a second obtaining module 502, where:

a first obtaining module 501, configured to obtain a target point cloud sequence;

a second obtaining module 502, configured to input the target point cloud sequence into a three-dimensional target detection model, and obtain a three-dimensional target detection result corresponding to the target point cloud sequence;

It can be understood that the target detection apparatus trains the initial three-dimensional target detection model through the virtual sample, so as to obtain a pre-trained three-dimensional target detection model, inputs the real sample into the pre-trained three-dimensional target detection model to obtain a first prediction result, propagates the first prediction result along the time dimension to obtain a second prediction result, determines the pseudo label corresponding to the real sample based on the first prediction result and the second prediction result, and trains the pre-trained three-dimensional target detection model based on the real sample and the pseudo label to obtain the three-dimensional target detection model.

The target detection device provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual marking data, and achieves a better detection effect.

Optionally, the three-dimensional target detection result corresponding to the target point cloud sequence includes a classification result, a three-dimensional bounding box regression result, a direction classification result, and a three-dimensional bounding box uncertainty regression result corresponding to the three-dimensional bounding box regression result, and the second obtaining module is specifically configured to:

Optionally, the apparatus further comprises a training module configured to:

Optionally, the training module is specifically configured to:

or

Optionally, the training module is specifically configured to:

applications of

Obtaining a second angular point form V corresponding to the j frame_j；

Where k is the target propagation range, t_iThe method is used for representing that the three-dimensional target in the ith frame is in a motion state or a static state; at t_iThe value of the function I (.) is 0 under the condition that the three-dimensional target in the ith frame is represented in a motion state, and the function I (.) is at t_iThe value of the function I (·) is 1 under the condition that the three-dimensional target in the ith frame is represented in a static state; t is a unit of_ijRepresenting the target transition matrix from frame i to frame j, V_iIndicating a first corner pattern corresponding to the ith frame.

Optionally, the training module is specifically configured to:

and fusing all the second target candidate frames to obtain the pseudo label.

Optionally, the training module is specifically configured to:

applying in the case that the target frame is the jth frame

The target detection device provided by the invention trains the initial three-dimensional target detection model through the virtual sample, a pre-trained three-dimensional target detection model can be obtained, a first prediction result can be obtained by inputting a real sample into the pre-trained three-dimensional target detection model, a second prediction result can be obtained by propagating the first prediction result along a time dimension, a pseudo label corresponding to the real sample can be determined based on the first prediction result and the second prediction result, further, the pre-trained three-dimensional target detection model can be trained based on the real sample and the pseudo label to obtain a three-dimensional target detection model, further, the target point cloud sequence is input into the three-dimensional target detection model, so that an accurate three-dimensional target detection result can be obtained, the method can be used for training and obtaining the three-dimensional target detection model under the condition of no manual labeling data, and achieves a good detection effect.

Fig. 6 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a target detection method comprising:

acquiring a target point cloud sequence;

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the object detection method provided by the above methods, the method comprising:

acquiring a target point cloud sequence;

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing an object detection method provided by the above methods, the method including:

acquiring a target point cloud sequence;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of target detection, comprising:

acquiring a target point cloud sequence;

the three-dimensional target detection model is obtained by training a virtual sample and a real sample; the pseudo label corresponding to the real sample is determined based on a first prediction result and a second prediction result, the first prediction result is obtained by predicting the real sample data through a pre-trained three-dimensional target detection model, the second prediction result is obtained by transmitting the first prediction result along a time dimension, and the pre-trained three-dimensional target detection model is obtained through training of the virtual sample.

2. The target detection method of claim 1, wherein the three-dimensional target detection results corresponding to the target point cloud sequence comprise classification results, three-dimensional bounding box regression results, direction classification results, and three-dimensional bounding box uncertainty regression results corresponding to the three-dimensional bounding box regression results, and the inputting the target point cloud sequence into a three-dimensional target detection model to obtain the three-dimensional target detection results corresponding to the target point cloud sequence comprises:

3. The object detection method according to claim 2, wherein the three-dimensional object detection model is constructed by:

4. The method according to claim 3, wherein the training an initial three-dimensional object detection model based on the virtual sample and the label of the virtual sample to obtain the pre-trained three-dimensional object detection model comprises:

5. The method according to claim 4, wherein, when the second regression branch is a regression model based on Gaussian distribution, the loss function corresponding to the second regression branch is specifically:

or

wherein, t_iTarget regression value, μ, representing the ith dimension_iThree-dimensional bounding box regression results, σ, representing the ith dimension_iRepresenting the uncertainty regression result of the three-dimensional bounding box of the ith dimension, wherein i belongs to { x, y, z, w, h, l, theta }, { x, y, z } represents the coordinates of the center point of the three-dimensional bounding box, w, h and l respectively represent three side lengths of the three-dimensional bounding box, and theta represents that the three-dimensional bounding box takes the y axis as the centerThe rotation angle of the core, ε is a predetermined constant.

6. The target detection method of claim 3, wherein the inputting the real sample into the pre-trained three-dimensional target detection model to obtain the first prediction result and a first target confidence corresponding to the first prediction result comprises:

7. The method for detecting the target according to claim 6, wherein the determining the confidence of the first target corresponding to the third predicted result based on the classification result in the third predicted result and the three-dimensional bounding box uncertainty regression result in the third predicted result comprises:

8. The target detection method of claim 3, wherein the propagating the first prediction result along the time dimension based on the target propagation range of the time dimension to obtain the second prediction result comprises:

9. The object detection method of claim 8, wherein the converting the first prediction result based on the object transformation matrix to obtain the second prediction result comprises:

10. The method according to claim 9, wherein the converting the first corner form based on the target transformation matrix to obtain a second corner form comprises:

applications of

Obtaining a second angular point form V corresponding to the j frame_j；

11. The object detection method of claim 3, wherein said obtaining the pseudo tag based on the first object confidence, the first predicted result, and the second predicted result comprises:

and fusing all the second target candidate frames to obtain the pseudo label.

12. The object detection method of claim 11, wherein obtaining a second object confidence corresponding to the prediction result of the object frame based on the first object confidence, the first prediction result, and the second prediction result comprises:

applying in the case that the target frame is the jth frame

Acquiring a second target confidence conf 'corresponding to the prediction result of the j frame'_j；

13. An object detection device, comprising:

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the object detection method according to any of claims 1 to 12 are implemented when the processor executes the program.

15. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the object detection method according to any one of claims 1 to 12.

16. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the object detection method according to any one of claims 1 to 12 when executed by a processor.