CN114005110B

CN114005110B - 3D detection model training method and device, and 3D detection method and device

Info

Publication number: CN114005110B
Application number: CN202111638346.0A
Authority: CN
Inventors: 康含玉; 盛杲; 张海强; 李成军
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-05-17
Anticipated expiration: 2041-12-30
Also published as: CN114005110A

Abstract

The invention discloses a 3D detection model training method and device and a 3D detection method and device. The 3D detection model training method comprises the following steps: acquiring a sample point cloud collected by a roadside laser radar and a target mark point cloud corresponding to the sample point cloud; inputting the sample point cloud into a 3D detection model to obtain a target detection result; obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target point cloud and the target mark point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value; and correcting the parameters of the 3D detection model according to the total prediction loss value. The technical scheme of the invention can enable the 3D detection model with smaller mass to have higher detection precision.

Description

3D detection model training method and device, and 3D detection method and device

Technical Field

The invention relates to the technical field of data processing, in particular to a 3D detection model training method and device and a 3D detection method and device.

Background

3D (3-Degree) object detection is intended to locate and identify objects in a 3D scene. Existing 3D object detectors typically operate directly on the original point cloud or convert the point cloud to 3D form and then apply a 3D convolution. However, these methods are very computationally intensive, suffer from efficiency and accuracy drawbacks, and are complicated to operate. For 3D object detectors based on 2D versions, their performance is still limited.

For example, in the recent existing method, a pointpilars 3D detection model is often adopted to perform target detection processing on point cloud data, and a shallow network structure is set to balance between speed and accuracy, so that the processing efficiency can be improved by the shallow network structure, but the detection accuracy can also be reduced.

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide a 3D detection model training method and apparatus, and a 3D detection method and apparatus, which improve the accuracy of a 3D detection model with a smaller volume.

According to a first aspect of the present invention, there is provided a 3D detection model training method, including: acquiring a sample point cloud collected by a roadside laser radar and a target mark point cloud corresponding to the sample point cloud; inputting the sample point cloud into a 3D detection model to obtain a target detection result, wherein the target detection result comprises a 3D frame and a target learning point cloud in the 3D frame; obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target learning point cloud and the target marking point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value; and correcting the parameters of the 3D detection model according to the total prediction loss value.

According to a second aspect of the present invention, there is provided a 3D detection method, comprising: acquiring a point cloud to be identified, which is acquired by a roadside laser radar; and carrying out target detection on the point cloud to be recognized according to the 3D detection model to obtain a target detection frame, wherein the 3D detection model is obtained by the 3D detection model training method.

According to a third aspect of the present invention, there is provided a 3D detection model training apparatus, comprising: the system comprises a sample data acquisition unit, a target marking unit and a data processing unit, wherein the sample data acquisition unit is used for acquiring a sample point cloud acquired by a roadside laser radar and a target marking point cloud corresponding to the sample point cloud; the sample data processing unit is used for inputting the sample point cloud into a 3D detection model to obtain a target detection result, and the target detection result comprises a 3D frame and a target learning point cloud in the 3D frame; the loss value calculation unit is used for obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target learning point cloud and the target marking point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value; and the parameter correcting unit is used for correcting the parameters of the 3D detection model according to the total prediction loss value.

According to a fourth aspect of the present invention, there is provided a 3D detection apparatus comprising: the point cloud acquisition unit is used for acquiring point clouds to be identified, which are acquired by the roadside laser radar; and the target detection unit is used for carrying out target detection on the point cloud to be recognized according to the 3D detection model to obtain a target detection frame, wherein the 3D detection model is obtained by the 3D detection model training method.

According to a fifth aspect of the invention, there is provided an electronic device comprising a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the above-described object detection model training method, or to perform the above-described object detection method.

According to a sixth aspect of the present invention, there is provided a computer-readable storage medium storing one or more programs which, when executed by a processor, implement the above-described object detection model training method or implement the above-described object detection method.

The invention adopts at least one technical scheme which can achieve the following beneficial effects:

constructing training data by using the sample point cloud and the target mark point cloud, generating a first prediction loss value for a 3D frame learned by the sample point cloud based on the 3D detection model in the 3D detection model training process, and calculating the credibility of the learned 3D frame according to the first prediction loss value; and generating a second prediction loss value based on the target learning point cloud and the target mark point cloud which are learned by the 3D detection model for the sample point cloud, introducing an attention supervision mechanism through the second prediction loss value, calculating the credibility of the target learning point cloud through the attention supervision mechanism, and obtaining a total prediction loss value by fusing the first prediction loss value and the second prediction loss value.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow diagram of a 3D detection model training method according to one embodiment of the invention;

FIG. 2 is a schematic structural diagram of a backhaul network and a detection head in an improved PointPillars 3D detection model according to an embodiment of the present invention;

FIG. 3 shows a flow diagram of a 3D detection method according to one embodiment of the invention;

FIG. 4 shows a block diagram of a 3D inspection model training apparatus according to one embodiment of the present invention;

FIG. 5 shows a block diagram of a 3D detection apparatus according to an embodiment of the invention;

fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein.

Aiming at the problem that the precision of a 3D detection model with a shallow network structure is not high enough in the prior art, the embodiment of the invention improves the 3D detection model in the prior art by adding an attention supervision mechanism in the 3D detection model, and realizes a 3D detection effect with higher precision in a roadside scene.

Based on this, the embodiment of the invention provides a 3D detection model training method.

Fig. 1 shows a flowchart of a 3D detection model training method according to an embodiment of the present invention, and as shown in fig. 1, the method of the present embodiment at least includes steps S110 to S140:

step S110, acquiring a sample point cloud collected by the road side laser radar and a target mark point cloud corresponding to the sample point cloud.

Wherein, the sample point cloud is data used for 3D detection model training. In a roadside scene, a sample point cloud may be acquired by a roadside lidar mounted on a roadside vertical rod. For example, a roadside lidar capable of panoramic scanning of 360 degrees is usually mounted on a traffic light bracket, and the roadside lidar can be used for obtaining the sample point cloud required by the embodiment. In this embodiment, to obtain the point cloud data at the effective viewing angle, a laser radar of, for example, the type of the itante 32-line 5515 may be installed on a roadside vertical rod, and the installation position is preferably about 4 meters from the ground.

And the target mark point cloud marks a target object in the sample point cloud and is used as a reference for calculating a loss value in the 3D detection model training process. For example, the target marker point cloud of the sample point cloud a is a point cloud marked with an obstacle, the target marker point cloud of the sample point cloud B is a point cloud marked with a vehicle, the sample point cloud a and the sample point cloud B are learned by using a 3D detection model respectively to obtain an obstacle learning point cloud and a vehicle learning point cloud, the obstacle marker point cloud is used as a reference for the obstacle learning point cloud, and the vehicle marker point cloud is used as a reference for the vehicle learning point cloud, so that an obstacle learning loss value and a vehicle learning loss value can be calculated respectively.

It should be noted that, in order to make the 3D detection model have sufficient robustness, the sample point cloud of the embodiment should cover various targets in the roadside scene as much as possible, such as vehicles, pedestrians, tall and big obstacles, and the like.

And step S120, inputting the sample point cloud into a 3D detection model to obtain a target detection result, wherein the target detection result comprises a 3D frame and a target learning point cloud in the 3D frame.

The 3D detection model in this embodiment is, for example, a pointpilars 3D detection model, and detects and outputs a 3D frame and point cloud data located in the 3D frame by a detection head. Here, the 3D frame may be represented by 7 variables (x, y, z, w, l, h, θ), where (x, y, z) represents the center point coordinates of the 3D frame, (w, l, h) represents the dimensions of the 3D frame, i.e., width, length, and height information, and (θ) represents the orientation of the 3D frame.

The target learning point cloud can be understood as point cloud data located in a 3D frame, and is target point cloud data learned from a sample point cloud through a 3D detection model. The target point cloud can use 9 variables (x, y, z, x)_c,y_c,z_c,x_p,y_p) Where (x, y, z) represents the three-dimensional coordinates of the point cloud data, (x)_c,y_c,z_c) Represents the geometric center of all points in Pillar where the point cloud data is located, (x)_p,y_p) Representing the relative position of the point cloud data to the geometric center.

And step S130, obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target learning point cloud and the target marking point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value.

Specifically, after the sample point cloud is input into the 3D detection model, the sample point cloud sequentially passes through the feature coding network and the feature extraction network, a feature map corresponding to the sample point cloud is obtained by the feature extraction network, and the feature map is subjected to 3D frame detection and regression through the detection head to obtain a 3D frame.

And an attention supervision mechanism is introduced into the second prediction loss value, the deviation condition of the target learning point cloud relative to the target mark point cloud is calculated through the attention supervision mechanism, and the learning precision of the target learning point cloud is reflected.

The total prediction loss value is obtained by fusing the first prediction loss value and the second prediction loss value, and meanwhile, the learning precision of the 3D frame and the learning precision of the target learning point cloud are considered, so that the learning effect of the 3D detection model on the sample point cloud can be represented more comprehensively, the parameters of the 3D detection model are corrected based on the total prediction loss value, and the detection result of the 3D detection model can be more accurate.

And step S140, correcting the parameters of the target detection model according to the total prediction loss value.

It can be seen that, in the method shown in fig. 1, training data is constructed by using sample point clouds and target mark point clouds, in the 3D detection model training process, a first prediction loss value is generated for a 3D frame learned by the sample point clouds based on a 3D detection model, and the credibility of the learned 3D frame is calculated by the first prediction loss value; and generating a second prediction loss value based on the target learning point cloud and the target mark point cloud which are learned by the 3D detection model for the sample point cloud, introducing an attention supervision mechanism through the second prediction loss value, calculating the credibility of the target learning point cloud through the attention supervision mechanism, and obtaining a total prediction loss value by fusing the first prediction loss value and the second prediction loss value.

In some embodiments, acquiring a sample point cloud collected by a roadside lidar and a target mark point cloud corresponding to the sample point cloud includes:

obtaining a sample point cloud of a roadside target by a roadside laser radar, and obtaining a target mark point cloud according to the sample point cloud and a roadside background point cloud obtained in advance; here, the roadside background point cloud is the point cloud collected by the roadside lidar when there is no roadside target.

Specifically, the distance between point cloud data in the sample point cloud and corresponding point cloud data in the roadside background point cloud is obtained; if the distance is larger than a distance threshold value, determining that the point cloud data in the sample point cloud is a target point cloud, and marking the target point cloud data as a first numerical value; if the distance is not larger than the distance threshold value, determining that the point cloud data in the sample point cloud is background point cloud, and marking the point cloud data as a second numerical value; and determining the marked sample point cloud as a target marked point cloud corresponding to the sample point cloud.

For example, a point cloud when a roadside laser radar is used for collecting roadside obstacles is used as a roadside background point cloud, the point cloud when obstacles are collected is used as a sample point cloud under the same scanning angle for the same roadside environment, namely the same roadside scene, the Euclidean distance between each point cloud data in the sample point cloud and the corresponding point cloud data in the roadside background point cloud is calculated, and the sample point cloud and the roadside background point cloud have the same visual field because the sample point cloud and the roadside background point cloud are the point clouds collected by the same roadside laser radar at different moments, namely the corresponding relation between the sample point cloud and the corresponding point cloud data in the roadside background point cloud can be determined based on the three-dimensional coordinate relation of the point cloud data. If the euclidean distance is greater than the distance threshold, it indicates that the point cloud data is a foreground target point, and marks the point cloud data as 1, otherwise, if the euclidean distance is not greater than the distance threshold, it indicates that the point cloud data is a background point, and marks the point cloud data as 0, and the obtained binarized marked point cloud is the target marked point cloud of this embodiment.

It should be noted that, the embodiment exemplarily shows a method for generating a target marker point cloud, in practical application, the target marker point cloud may also be generated by other methods, for example, a background filtering method, a manual calibration method, and the like, and in practical application, a person skilled in the art may flexibly select the method.

Considering that if the 3D detection model directly processes the 3D point cloud, the calculation complexity is high, and the detection efficiency is influenced. The 3D detection model of this embodiment can perform voxel processing on the 3D point cloud, and reduce the dimension of the 3D point cloud into a pseudo 2D map.

Taking the pointpilars 3D Detection model as an example, the model comprises a Feature coding network (pilar Feature Net), a Feature extraction network (backhaul) and a Detection Head (Detection Head), and the general process of converting the sample point cloud into the pseudo 2D image by the Feature coding network is as follows:

the sample point cloud is divided into (H multiplied by W) grids according to the X axis and the Y axis (not considering the Z axis, on the top view) of the sample point cloud data, and all the point cloud data falling into one grid are regarded as being in one Pillar (Pillar), namely, the point cloud data in the grid form one Pillar. The point cloud data in each pilar may be represented by a vector of D =9 dimensions, with these 9 dimensions representing the true three-dimensional coordinate information of the point cloud data, the reflection intensity, the geometric center of all the points in the pilar where the point cloud data is located, and the relative position of the point cloud data to the geometric center.

Assuming that there are P non-empty pilars in the sample point cloud and N point cloud data in each pilar, the sample point cloud can be represented by tensor (D, P, N), if there are more than N point cloud data in each pilar, the sample point cloud can be randomly sampled to N, if there are less than N point cloud data in each pilar, the partial fill of less than N is 0.

And after the tensor quantization is realized, processing and feature extraction are carried out on the tensor quantized point cloud data by using the PointNet with the simplified version. Here, the simplified version of PointNet includes a Linear layer, a Batch Normalization (BN) layer, a modified Linear Unit (ReLU) and the like.

Here, the feature extraction may be understood as processing the dimensions of the point cloud, where the initial sample point cloud dimension is D =9, and the processed dimension is C, so that a tensor (C, P, N) may be obtained.

And (3) carrying out Max Pooling (maximum Pooling operation) operation on the tensor (C, P, N) according to the dimensionality of Pillar to obtain the characteristic diagram of the (C, P) dimensionality. This embodiment converts P into (H, W), i.e., P → hxw, in order to obtain a pseudo 2D map, whereby a pseudo 2D map of the form (C, H, W) can be obtained.

Similarly, the target marker point cloud is input into a feature coding network of the PointPillars 3D detection model, and a pseudo 2D target marker map can be obtained.

In some embodiments, in order to avoid directly calculating the 3D point cloud data when calculating the second predicted loss value, the present embodiment performs size recovery on the serial feature map corresponding to the target learning point cloud, for example, performs upsampling on the serial feature map through the deconvolution operation shown in fig. 2 to recover the size of the serial feature map to a pseudo 2D map size; and obtaining a second prediction loss value according to the serial feature diagram after the size recovery and the pseudo 2D target mark diagram.

The series characteristic diagram is obtained by processing the sample point cloud through a characteristic coding network and a characteristic extraction network. In consideration of the fact that the target learning point cloud is obtained by performing regression processing on a series feature map output by the feature extraction network through the detection head, the series feature map corresponding to the target learning point cloud is the series feature map input into the detection head through the feature extraction network.

Because the series characteristic graph is the 2D image data after voxelization, and the pseudo 2D target mark graph of the target marking point cloud is also the 2D image data, the 2D image data is directly calculated, the calculation complexity can be reduced, and the detection efficiency is improved.

Referring to fig. 2, after the sample point cloud is converted into a pseudo 2D map by the above method, the pseudo 2D map is input to a backhaul network, which employs two sub-networks, wherein one sub-network continuously reduces the resolution of the feature map by a convolution (Conv) operation while increasing the dimension of the feature map, thereby obtaining three feature maps with different resolutions, respectively, (C, H/2, W/2), (2C, H/4, W/4) and (4C, H/8, W/8). And performing deconvolution (Deconv) on the three feature maps by another sub-network to the same size, and then connecting the deconvolved feature maps in series (concat) to obtain a series feature map, wherein the size of the series feature map is smaller than that of the pseudo 2D map.

As shown in fig. 2, after the series feature map is obtained, the series feature map is input to the detection head, the series feature map is processed by the detection head, and a 3D frame is regressed to obtain a target learning point cloud in the 3D frame.

The conventional detection head calculates a loss value according to a loss function (loss function), which is a function for representing the risk or loss of an event. In the prior art, the detection head calculates a first predicted loss value based on a cross-entropy loss function.

As indicated above, the 3D frame is represented by 7 variables (x, y, z, w, l, h, θ), and the parameter to be learned in the regression task of the 3D frame is the offset of the 7 variables, and in practical applications, the regression loss loc shown in fig. 2 is calculated by using the Smooth loss function (Smooth), for example. Meanwhile, in order to avoid direction discrimination errors, the conventional detection head also introduces a classification loss function (Softmax) to learn the direction of the target, and the loss is denoted as a direction loss cls shown in fig. 2. Here, the weighted sum of the regression loss loc and the directional loss cls is the first predicted loss value of the present embodiment.

Different from the traditional pointpilars 3D detection model, the embodiment adds an output branch to the feature coding network, connects the newly added output branch to the detection head, and calculates the second prediction loss value by the detection head based on the serial feature map and the pseudo 2D target label map.

Because the categories of the foreground points and the background points in the pseudo 2D target mark map are not balanced, in some embodiments, a second predicted Loss value of the serial feature map after size recovery relative to the pseudo 2D target mark map may be calculated according to a Focal Loss function (Focal local).

It should be noted that, because the feature maps with different resolutions are responsible for detecting targets with different sizes, for example, the feature map with a large resolution is usually small in receptive field and suitable for capturing small targets such as pedestrians. For a roadside scene, fig. 2 of the present embodiment exemplarily shows that 3 feature maps with different resolutions can be acquired, thereby respectively detecting targets such as pedestrians, vehicles, tall and big obstacles, and the like. In practical application, a person skilled in the art can set a feature extraction network of the 3D detection model according to application requirements.

After the first prediction loss value and the second prediction loss value are calculated based on the above embodiment, the first prediction loss value and the second prediction loss value may be weighted and summed to obtain a total prediction loss value, and the parameters of the 3D detection model are corrected based on the total prediction loss value, where the parameters of the feature extraction network are mainly corrected.

In summary, the model training method of the embodiment of the invention can enable the 3D detection model with smaller mass to have higher detection precision. Aiming at the 3D detection model with small volume, the model training method provided by the embodiment of the invention can improve the model detection precision under the condition of not sacrificing the model detection efficiency.

The embodiment also provides a target detection method.

Fig. 3 shows a flowchart of a 3D detection method according to an embodiment of the invention, and as shown in fig. 3, the method of the embodiment at least includes the following steps S310 to S320:

and S310, acquiring the point cloud to be identified, which is acquired by the roadside laser radar.

And S320, performing target detection on the point cloud to be recognized according to the 3D detection model to obtain a target detection frame.

The 3D detection model is obtained by the above embodiment of the 3D detection model training method.

It can be seen that, when the method shown in fig. 3 is used for detecting a target by using the 3D detection model provided by the embodiment of the invention, only the 3D point cloud collected by the roadside laser radar needs to be acquired, and the 3D point cloud does not need to be labeled. For the target detection system, the target detection system can complete the whole target detection process only by acquiring the 3D point cloud and inputting the 3D point cloud into the 3D detection model, and the method of the embodiment can be effectively applied to the existing target detection system, is convenient for popularization of the 3D detection method of the embodiment, and has higher practicability.

The method belongs to the same technical concept as the method for training the target detection model in the embodiment, and the embodiment of the invention also provides a device for training the target detection model.

Fig. 4 is a block diagram illustrating a 3D detection model training apparatus according to an embodiment of the present invention, and as shown in fig. 4, the 3D detection model training apparatus 400 includes:

the sample data acquisition unit 410 is used for acquiring a sample point cloud acquired by a roadside laser radar and a target mark point cloud corresponding to the sample point cloud;

the sample data processing unit 420 is configured to input the sample point cloud to a 3D detection model to obtain a target detection result, where the target detection result includes a 3D frame and a target learning point cloud within the 3D frame;

a loss value calculation unit 430, configured to obtain a first predicted loss value according to the 3D frame, obtain a second predicted loss value according to the target learning point cloud and the target marker point cloud, and obtain a total predicted loss value according to the first predicted loss value and the second predicted loss value;

a parameter modifying unit 440, configured to modify a parameter of the 3D detection model according to the total prediction loss value.

In some embodiments, the sample data acquiring unit 410 is configured to acquire a sample point cloud of the roadside laser radar to the roadside target; and obtaining the target mark point cloud according to the sample point cloud and a roadside background point cloud obtained in advance, wherein the roadside background point cloud is the point cloud collected by the roadside laser radar when no roadside target exists. Specifically, the distance between point cloud data in the sample point cloud and corresponding point cloud data in the roadside background point cloud is obtained; if the distance is larger than a distance threshold value, determining that the point cloud data in the sample point cloud is a target point cloud, and marking the target point cloud data as a first numerical value; if the distance is not larger than the distance threshold value, determining that the point cloud data in the sample point cloud is background point cloud, and marking the point cloud data as a second numerical value; and determining the marked sample point cloud as the target marked point cloud.

In some embodiments, the 3D detection model includes a feature coding network and a feature extraction network, and the sample data processing unit 420 is configured to input the target marker point cloud to the feature coding network to obtain a pseudo 2D target marker map;

a loss value calculating unit 430, configured to perform size recovery on a series feature map corresponding to a target learning point cloud, where the series feature map is obtained by processing the sample point cloud through the feature coding network and the feature extraction network; and obtaining a second prediction loss value according to the serial feature diagram after the size recovery and the pseudo 2D target mark diagram. Specifically, a second predicted loss value of the serial feature map after size recovery relative to the pseudo 2D target mark map is calculated according to a focus loss function.

In some embodiments, the loss value calculating unit 430 is further configured to sum the first predicted loss value and the second predicted loss value by weighting to obtain the total predicted loss value.

It can be understood that the 3D detection model training apparatus can implement the steps of the 3D detection model training method provided in the foregoing embodiment, and the explanations related to the 3D detection model training method are applicable to the 3D detection model training apparatus, and are not repeated herein.

The 3D detection method in the foregoing embodiment belongs to the same technical concept, and the embodiment of the present invention also provides a 3D detection apparatus.

Fig. 5 shows a block diagram of a 3D sensing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the 3D sensing apparatus 500 includes:

a point cloud obtaining unit 510, configured to obtain a point cloud to be identified, which is collected by a roadside laser radar;

and an object detection unit 520, configured to perform object detection on the point cloud to be recognized according to the 3D detection model to obtain an object detection frame, where the 3D detection model is obtained by the 3D detection model training method in the foregoing embodiment.

It should be noted that:

FIG. 6 shows a schematic diagram of an electronic device according to one embodiment of the invention. Referring to fig. 6, at the hardware level, the electronic device includes a processor and a memory, and optionally further includes an internal bus and a network interface. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the interface module, the communication module, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

A memory for storing computer executable instructions. The memory provides computer executable instructions to the processor through the internal bus.

A processor executing computer executable instructions stored in the memory and specifically configured to perform the following operations:

acquiring a sample point cloud collected by a roadside laser radar and a target marking point cloud corresponding to the sample point cloud; inputting the sample point cloud into a 3D detection model to obtain a target detection result, wherein the target detection result comprises a 3D frame and a target learning point cloud in the 3D frame; obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target learning point cloud and the target marking point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value; and correcting the parameters of the 3D detection model according to the total prediction loss value.

Or, for implementing the following operations:

acquiring a point cloud to be identified, which is acquired by a roadside laser radar; and carrying out target detection on the point cloud to be recognized according to the 3D detection model to obtain a target detection frame.

The functions performed by the 3D detection model training method disclosed in the embodiment of fig. 1 or the 3D detection method shown in fig. 3 may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

An embodiment of the present invention further provides a computer-readable storage medium, which stores one or more programs, and when the one or more programs are executed by a processor, the one or more programs implement the aforementioned 3D detection model training method or the aforementioned 3D detection method.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention.

The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A3D detection model training method is characterized in that the 3D detection model comprises a feature coding network, a feature extraction network and a detection head, the feature coding network is additionally provided with an output branch, and the newly added output branch is connected to the detection head, and the method comprises the following steps:

acquiring a sample point cloud collected by a roadside laser radar and a target mark point cloud corresponding to the sample point cloud;

inputting the sample point cloud into a 3D detection model to obtain a target detection result, wherein the target detection result comprises a 3D frame and a target learning point cloud in the 3D frame;

obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target learning point cloud and the target marking point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value;

correcting parameters of the 3D detection model according to the total prediction loss value;

obtaining a second predicted loss value according to the target learning point cloud and the target marker point cloud comprises: and calculating a second prediction loss value by the detection head based on a series characteristic diagram and a pseudo 2D target mark diagram, wherein the series characteristic diagram is obtained by processing the sample point cloud through the characteristic coding network and the characteristic extraction network, and the pseudo 2D target mark diagram is obtained by processing the target mark point cloud corresponding to the sample point cloud through the characteristic coding network.

2. The method of claim 1, wherein obtaining a sample point cloud collected by a roadside lidar and a target marker point cloud corresponding to the sample point cloud comprises:

acquiring a sample point cloud of a roadside target by a roadside laser radar;

and obtaining the target mark point cloud according to the sample point cloud and a roadside background point cloud obtained in advance, wherein the roadside background point cloud is the point cloud collected by the roadside laser radar when no roadside target exists.

3. The method of claim 2, wherein obtaining the target marker point cloud from the sample point cloud and a previously obtained roadside background point cloud comprises:

acquiring the distance between the point cloud data in the sample point cloud and the corresponding point cloud data in the roadside background point cloud;

if the distance is larger than a distance threshold value, determining that the point cloud data in the sample point cloud is a target point cloud, and marking the target point cloud data as a first numerical value; if the distance is not larger than the distance threshold value, determining that the point cloud data in the sample point cloud is background point cloud, and marking the point cloud data as a second numerical value;

and determining the marked sample point cloud as the target marked point cloud.

4. The method of claim 1, wherein deriving a second predicted loss value from the target learning point cloud and the target marker point cloud comprises:

performing size recovery on the series characteristic graph corresponding to the target learning point cloud;

and obtaining a second prediction loss value according to the serial feature diagram after the size recovery and the pseudo 2D target mark diagram.

5. The method of claim 4, wherein obtaining a second predicted loss value according to the size-recovered concatenated feature maps and the pseudo 2D target label map comprises:

and calculating a second prediction loss value of the serial feature map after size recovery relative to the pseudo 2D target mark map according to a focus loss function.

6. The method of claim 1, wherein deriving the total predicted loss value from the first predicted loss value and the second predicted loss value comprises:

and weighting and summing the first prediction loss value and the second prediction loss value to obtain the total prediction loss value.

7. A 3D detection method, comprising:

acquiring a point cloud to be identified, which is acquired by a roadside laser radar;

and performing target detection on the point cloud to be recognized according to a 3D detection model to obtain a target detection frame, wherein the 3D detection model is obtained by the 3D detection model training method according to any one of claims 1 to 6.

8. A3D detection model training device is characterized in that the 3D detection model comprises a feature coding network, a feature extraction network and a detection head, an output branch is newly added to the feature coding network, and the newly added output branch is connected to the detection head, the device comprises:

the system comprises a sample data acquisition unit, a target marking unit and a data processing unit, wherein the sample data acquisition unit is used for acquiring a sample point cloud acquired by a roadside laser radar and a target marking point cloud corresponding to the sample point cloud;

the sample data processing unit is used for inputting the sample point cloud into a 3D detection model to obtain a target detection result, and the target detection result comprises a 3D frame and a target learning point cloud in the 3D frame;

the loss value calculation unit is used for obtaining a first prediction loss value according to the 3D frame, obtaining a second prediction loss value according to the target learning point cloud and the target marking point cloud, and obtaining a total prediction loss value according to the first prediction loss value and the second prediction loss value;

the parameter correcting unit is used for correcting the parameters of the 3D detection model according to the total prediction loss value;

the loss value calculating unit is further used for calculating a second predicted loss value by the detection head based on a serial feature map and a pseudo 2D target mark map, wherein the serial feature map is obtained by processing the sample point cloud through the feature coding network and the feature extraction network, and the pseudo 2D target mark map is obtained by processing the target mark point cloud corresponding to the sample point cloud through the feature coding network.

9. A3D detection device, comprising:

the point cloud acquisition unit is used for acquiring point clouds to be identified, which are acquired by the roadside laser radar;

a target detection unit, configured to perform target detection on the point cloud to be recognized according to a 3D detection model to obtain a target detection frame, where the 3D detection model is obtained by the 3D detection model training method according to any one of claims 1 to 6.

10. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the 3D detection model training method of any one of claims 1 to 6, or to perform the 3D detection method of claim 7.