CN115880685B

CN115880685B - Three-dimensional target detection method and system based on volntet model

Info

Publication number: CN115880685B
Application number: CN202211577601.XA
Authority: CN
Inventors: 杨嵘; 李特; 宛敏红; 张春龙; 朱世强; 王文
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2024-02-13
Anticipated expiration: 2042-12-09
Also published as: WO2024119776A1; CN115880685A

Abstract

A three-dimensional target detection method based on a votnet model comprises the following steps: constructing a votnet model; constructing a point cloud data set for training a volntet model aiming at an interested target; constructing a seed displacement loss function based on double-layer nested three-dimensional rectangular frame space division for training a volntet model; constructing other loss functions for training the votnet model based on the original method of the volnet model, including foreground and background classification loss functions, center offset loss functions, size offset loss functions and orientation angle offset loss functions; training a votnet model based on the constructed point cloud data set and the loss function; acquiring point cloud data of a scene to be detected by using an RGB-D camera; based on point cloud data of a scene to be detected, a three-dimensional target detection result of the target of interest is output through a volnte model. The invention can effectively reduce the false alarm rate of the three-dimensional target detection result on the premise of not increasing the model reasoning delay.

Description

Three-dimensional target detection method and system based on volntet model

Technical Field

The invention relates to the field of indoor three-dimensional target detection, in particular to a three-dimensional target detection method based on a votnet model.

Background

The application of the target detection algorithm is wide, and the target detection algorithm is a research hotspot which is concerned in the field of computer vision. In recent years, with the development of deep learning, the research of image target detection has made a great breakthrough, and compared with the two-dimensional target detection technology, the three-dimensional target detection technology combines the depth information, can provide the spatial scene information such as the position, the direction and the size of the target, and has rapidly developed in the fields of automatic driving and robots. The detection recall rate of the three-dimensional target detection method based on the volntet model is higher, but seed points on the non-target surface generated by the volntet model are easy to appear in areas among a plurality of targets after displacement treatment, when the distance between the targets is too close, the neighborhood points aggregated by some aggregation points generated by the volntet model possibly mix the characteristics of different targets, the foreground semantic characteristics of the aggregated characteristics are obvious, the positioning accuracy is extremely poor, and therefore a false alarm target with higher confidence is generated. The existing method generally reduces the false alarm rate of the detection result by carrying out refinement treatment on the detection result, but the additional refinement treatment inevitably brings additional calculation amount and increases the reasoning time consumption. How to inhibit the generation of false alarm targets without additionally increasing the reasoning time consumption is a great challenge faced by the three-dimensional target detection technology.

Disclosure of Invention

The invention aims to provide a three-dimensional target detection method based on a votnet model aiming at the defects of the prior art. Compared with other three-dimensional target detection methods based on the volntet model, the method does not increase model reasoning time consumption, and has low detection false alarm rate and high detection recall rate.

The aim of the invention is realized by the following technical scheme: a three-dimensional target detection method based on a votnet model comprises the following steps:

step one: constructing a votnet model;

step two: constructing a point cloud data set for training a volntet model aiming at an interested target;

step three: constructing a seed displacement loss function based on double-layer nested three-dimensional rectangular frame space division for training a volntet model;

step four: constructing other loss functions for training the votnet model based on the original method of the volnet model, including foreground and background classification loss functions, center offset loss functions, size offset loss functions and orientation angle offset loss functions;

step five: training a votnet model based on the constructed point cloud data set and the loss function;

step six: acquiring point cloud data of a scene to be detected by using an RGB-D camera;

step seven: based on point cloud data of a scene to be detected, a three-dimensional target detection result of the target of interest is output through a volnte model.

Further, in step one, a volntet model is built according to the building method described in the volntet paper.

In the second step, the point cloud data set refers to a plurality of point cloud data samples containing the objects of interest and labeling information of three-dimensional rectangular bounding boxes of each object of interest in the samples. The point cloud data may come from a variety of sensors.

Further, in the third step, for each training sample, the seed point displacement loss function is implemented by the following sub-steps:

inputting point cloud data of the training sample into a volnte model to obtain seed point coordinates extracted by the volnte model and coordinate offset of each seed point predicted by the volnte model;

(3.2) providing that the training sample contains n in total _t Target, for the i < th > e [1, n ] _t ]Target, true three-dimensional rectangular bounding Box provided with target _i ＝(x _i ,y _i ,z _i ,l _i ,w _i ,h _i ,θ _i ) Wherein x is _i ,y _i ,z _i For the center point coordinates of the bounding box, l _i ,w _i ,h _i For the length, width and height of the bounding box, θ _i For the orientation angle of the bounding Box, for each target, a true three-dimensional rectangular bounding Box _i Generating a three-dimensional rectangular frame with the same coordinates and the same orientation angle as the center point of the surrounding frame and the length, width and height of the three-dimensional rectangular frame being respectively enlarged by k times

(3.3) setting the total number of seed points to n _s Traversing the whole seed points, for the j' th E [1, n ] _s ]Seed points P _j Set seed point P _j The coordinates in the three-dimensional space are (x _j ,y _j ,z _j ) For seed point P _j Real three-dimensional rectangular bounding Box for traversing whole targets _i ,i∈[1,n _t ]When the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Box at a certain real three-dimensional rectangular bounding Box _i Generates a corresponding target point T for the seed point when it is on the inner or outer surface of the seed _j The target point T _j Is equal to the coordinate value of the real three-dimensional rectangular bounding Box _i Is a center point coordinate value (x) _i ,y _i ,z _i ) If the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Inside or outside a plurality of real three-dimensional rectangular bounding boxes, then the target point T _j Is equal to the distance from the seed point P in the center point of the true three-dimensional rectangular bounding box _j The nearest three-dimensional rectangle encloses the center point coordinates of the frame;

(3.4) setting the seed points without corresponding target points as n' _s Traversing seed points which are not corresponding to target points in whole, and for j's [1, n ]' _s ]Seed points P _j For seed point P _j Traversing the whole expanded three-dimensional rectangular bounding boxWhen the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Three-dimensional rectangular bounding box after a certain expansion>Generates a seed point for the seed point when it is on the inner or outer surfaceCorresponding target point V _j The target point V _j Is equal to the seed point P _j Coordinate value (x) _j ,y _j ,zj ₎ ；

(3.5) traversing the population of seed points for the j < th > e [1, n ] _s ]Seed points P _j Let the coordinate offset predicted by the volntet model as the seed point be (Δx) _j ,Δy _j ,Δz _j ) If the seed point P _j There is a corresponding target point T _j Or the target point V _j Then is the seed point P _j Generating a corresponding predicted point Q _j Predicted point Q _j Coordinate value of (x) _j +Δx _j ,y _j +Δy _j ,z _j +Δz _j )；

(3.6) calculating seed Point Displacement loss L _vote-reg Let PT _k Indicating the presence of the target point T _k Seed point P of (2) _k ,PT _k The total number of (2) is n _T ，PV _m Indicating the presence of the target point V _m Seed point P of (2) _m ，PV _m The total number of (2) is n _V D (A, B) represents the distance between the point A and the point B in the three-dimensional space, alpha is a balance factor, and the specific calculation mode of the seed displacement loss is as follows:

the invention also includes a three-dimensional object detection system based on the votnet model, comprising:

the volntet model building module is used for building a volntet model;

the point cloud data set construction module is used for constructing a point cloud data set for training a volntet model aiming at an interested target;

the seed position displacement loss function construction module is used for constructing a seed position displacement loss function based on double-layer nested three-dimensional rectangular frame space division for training a volntet model;

the other loss function construction module is used for constructing other loss functions for training the volntet model based on the volntet model original method, including foreground and background classification loss functions, center offset loss functions, size offset loss functions and orientation angle offset loss functions;

the volntet model building module is used for training the volntet model based on the built point cloud data set and the loss function;

the system comprises a point cloud data acquisition module of a scene to be detected, a point cloud data acquisition module and a point cloud data acquisition module, wherein the point cloud data acquisition module is used for acquiring the point cloud data of the scene to be detected by using an RGB-D camera;

and the result output module is used for outputting a three-dimensional target detection result of the target of interest through the volntet model based on the point cloud data of the scene to be detected.

The invention also includes a computer readable storage medium having stored thereon a program which, when executed by a processor, implements a method of three-dimensional object detection based on a votnet model of the invention.

The method aims at the problem of three-dimensional space positioning of the indoor target, and can obtain the position coordinates of the specific target in the three-dimensional space in the field of view of the camera. According to the invention, a seed point displacement loss calculation method based on double-layer nested three-dimensional rectangular frame space division is constructed, so that the seed points in the background area near the target in the volntet model are prevented from moving to the vicinity of the target center area in the displacement stage, and the candidate frames with higher confidence coefficient and lower coincidence degree with the target are avoided from being generated in the voting point aggregation stage of the volntet model. Compared with other three-dimensional target detection methods based on the volume model, the method can effectively reduce the false alarm rate of the three-dimensional target detection result on the premise of not increasing the model reasoning delay.

The method has the beneficial effects that the probability that the background seed point in the volntet model is shifted to the middle of a plurality of targets is reduced, and the false alarm rate of the detection result is reduced. Compared with a seed point dividing method based on a single-layer three-dimensional rectangular frame, the method effectively avoids error supervision signals caused by incomplete sample labeling, and further improves recall rate of detection results. The invention can not introduce extra calculated amount in the reasoning stage of the volent model, and solves the problem of high false alarm rate of the current three-dimensional target detection method based on the volent model on the premise of not increasing the time consumption of model reasoning.

Drawings

FIG. 1 is a flow chart of a three-dimensional object detection method of an embodiment of the present application shown in one exemplary embodiment;

FIG. 2 is a schematic diagram of a three-dimensional rectangular bounding box of a target shown in one exemplary embodiment;

FIG. 3 is a schematic diagram of converting a depth map into point cloud data, shown in one exemplary embodiment;

FIG. 4 is a seed displacement loss function implementation method based on double-layer nested three-dimensional rectangular box space partitioning shown in one exemplary embodiment;

fig. 5 is a system configuration diagram of the present application shown in one exemplary embodiment.

Detailed Description

The present invention will be described in further detail with reference to specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic diagram of a three-dimensional object detection method for a chair-type object according to an embodiment of the present application, including the following steps:

step S101: a volntet model was constructed.

The volntet model is a volntet model in the publicly published volntet paper, and preferably, the volntet model can be directly constructed according to the construction method described in the volntet paper.

Step S102: constructing a point cloud data set for training a volntet model aiming at the chair subclass target;

the point cloud data set refers to a plurality of point cloud data samples containing chair type targets and labeling information of three-dimensional rectangular bounding boxes of each chair type target in the samples, and preferably, the disclosed SUN RGB-D data set can be directly used.

In one exemplary embodiment, as shown in FIG. 2, a three-dimensional rectangular bounding box should completely enclose a target in three-dimensional space.

Step S103: constructing a seed displacement loss function based on double-layer nested three-dimensional rectangular frame space division for training a volntet model;

step S104: constructing other loss functions for training the votnet model;

the other loss functions used for training the volntet model refer to a foreground-background classification loss function, a center offset loss function, a size offset loss function and an orientation angle offset loss function of the volntet model, and preferably, the foreground-background classification loss function, the center offset loss function, the size offset loss function and the orientation angle offset loss function of the volntet model can be directly constructed according to the construction mode described in the volntet paper.

Step S105: training a votnet model based on the constructed point cloud data set and the loss function;

the training refers to obtaining the optimized volntet model parameters by using a back propagation algorithm, and preferably, the updating process of the volntet model parameters can be realized by using an SGD algorithm.

Step S106: acquiring point cloud data of a scene to be detected by using an RGB-D camera;

the point cloud data refers to point cloud data generated by projecting depth image pixels generated by an RGB-D camera to a three-dimensional space according to internal parameters of the RGB-D camera. In one exemplary embodiment, as shown in FIG. 3, the RGB-D camera generated depth image is projected into a three-dimensional space to form point cloud data.

Step S107: based on point cloud data of a scene to be detected, a three-dimensional target detection result of the chair type target is output through a volnte model.

In one exemplary embodiment, as shown in FIG. 4, for each training sample, the seed displacement loss function based on the double-nested three-dimensional rectangular box space division is calculated as follows:

step S201: inputting point cloud data of the training sample into a volntet model to obtain seed point coordinates extracted by the volntet model and coordinate offset of each seed point predicted by the volntet model;

step S202: let the training samples contain n in total _t Target of the first step ofi∈[1,n _t ]Target, true three-dimensional rectangular bounding Box provided with target _i ＝(x _i ,y _i ,z _i ,l _i ,w _i ,h _i ,θ _i ) Wherein x is _i ,y _i ,z _i For the center point coordinates of the bounding box, l _i ,w _i ,h _i For the length, width and height of the bounding box, θ _i For the orientation angle of the bounding Box, for each target, a true three-dimensional rectangular bounding Box _i Generating a three-dimensional rectangular frame with the same coordinates and the same orientation angle as the center point of the surrounding frame and the length, width and height of the three-dimensional rectangular frame being respectively enlarged by k times

The three-dimensional rectangular frame with k times of expansionIn which k is generally [1,2 ]]A real number in between, preferably k=1.5 can be taken.

Step S203: let the total number of seed points be n _s Traversing the whole seed points, for the j' th E [1, n ] _s ]Seed points P _j Set seed point P _j The coordinates in the three-dimensional space are (x _j ,y _j ,z _j ) For seed point P _j Real three-dimensional rectangular bounding Box for traversing whole targets _i ,i∈[1,n _t ]When the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Box at a certain real three-dimensional rectangular bounding Box _i Generates a corresponding target point T for the seed point when it is on the inner or outer surface of the seed _j The target point T _j Is equal to the coordinate value of the real three-dimensional rectangular bounding Box _i Is a center point coordinate value (x) _i ,y _i ,z _i ) If the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Inside or outside a plurality of real three-dimensional rectangular bounding boxes, thenPunctuation T _j Is equal to the distance from the seed point P in the center point of the true three-dimensional rectangular bounding box _j The nearest three-dimensional rectangle encloses the center point coordinates of the frame;

step S204: let the number of seed points without corresponding target points be n' _s Traversing seed points which are not corresponding to target points in whole, and for j < 1, n ∈ ^′ _s ]Seed points P _j For seed point P _j Traversing the whole expanded three-dimensional rectangular bounding boxWhen the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Three-dimensional rectangular bounding box after a certain expansion>Generates a corresponding target point V for the seed point when it is on the inner or outer surface of the seed _j The target point V _j Is equal to the seed point P _j Coordinate value (x) _j ,y _j ,z _j )；

Step S205: traversing the whole seed points, for the j' th E [1, n ] _s ]Seed points P _j Let the coordinate offset predicted by the volntet model as the seed point be (Δx) _j ,Δy _j ,Δz _j ) If the seed point P _j There is a corresponding target point T _j Or the target point V _j Then is the seed point P _j Generating a corresponding predicted point Q _j Predicted point Q _j Coordinate value of (x) _j +Δx _j ,y _j +Δy _j ,z _j +Δz _j )；

Step S206: calculating seed point displacement loss L _vote-reg Let PT _k Indicating the presence of the target point T _k Seed point P of (2) _k ,PT _k The total number of (2) is n _T ，PV _m Indicating the presence of the target point V _m Seed point P of (2) _m ，PV _m The total number of (2) is n _V D (A, B) represents the distance between the point A and the point B in three-dimensional space, alpha is a balance factor, and the seed pointDisplacement loss L _vote-reg The specific calculation mode of (a) is as follows:

the balance factor α is used to balance displacement losses of different types of seed points, and preferably α=0.1 may be taken.

The present invention also provides a computer readable storage medium storing a computer program operable to perform a method of three-dimensional object detection based on the votnet model provided in fig. 1 above.

The invention also provides a schematic block diagram of a three-dimensional object detection system based on the votnet model, which corresponds to the one shown in fig. 1. The invention discloses a three-dimensional target detection system based on a votnet model, which comprises the following components:

the volntet model building module is used for building a volntet model;

As shown in fig. 5, at the hardware level, the three-dimensional object detection system based on the von et model includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may also include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method of data acquisition described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments of the present invention are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims

1. A three-dimensional target detection method based on a votnet model comprises the following steps:

step one: constructing a votnet model;

step three: constructing a seed displacement loss function based on double-layer nested three-dimensional rectangular frame space division for training a volntet model; for each training sample, the seed point displacement loss function is implemented by the sub-steps of:

(3.3) setting the total number of seed points to n _s Traversing the whole seed points,for j E [1, n ] _s ]Seed points P _j Set seed point P _j The coordinates in the three-dimensional space are (x _j ,y _j ,z _j ) For seed point P _j Real three-dimensional rectangular bounding Box for traversing whole targets _i ,i∈[1,n _t ]When the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Box at a certain real three-dimensional rectangular bounding Box _i Generates a corresponding target point T for the seed point when it is on the inner or outer surface of the seed _j The target point T _j Is equal to the coordinate value of the real three-dimensional rectangular bounding Box _i Is a center point coordinate value (x) _i ,y _i ,z _i ) If the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Inside or outside a plurality of real three-dimensional rectangular bounding boxes, then the target point T _j Is equal to the distance from the seed point P in the center point of the true three-dimensional rectangular bounding box _j The nearest three-dimensional rectangle encloses the center point coordinates of the frame;

(3.4) setting the seed points without corresponding target points as n' _s Traversing seed points which are not corresponding to target points in whole, and for j's [1, n ]' _s ]Seed points P _j For seed point P _j Traversing the whole expanded three-dimensional rectangular bounding boxWhen the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Three-dimensional rectangular bounding box after a certain expansion>Generates a corresponding target point V for the seed point when it is on the inner or outer surface of the seed _j The target point V _j Is equal to the seed point P _j Coordinate value (x) _j ,y _j ,z _j )；

(3.5) traversing the population of seed points for the j < th > e [1, n ] _s ]Seed points P _j Let us set the volntet model as the seed point predictionIs (Deltax) _j ,Δy _j ,Δz _j ) If the seed point P _j There is a corresponding target point T _j Or the target point V _j Then is the seed point P _j Generating a corresponding predicted point Q _j Predicted point Q _j Coordinate value of (x) _j +Δx _j ,y _j +Δy _j ,z _j +Δz _j )；

2. The three-dimensional object detection method based on the votnet model as set forth in claim 1, wherein: in the second step, the point cloud data set refers to a plurality of point cloud data samples containing the interested targets and labeling information of three-dimensional rectangular bounding boxes of each interested target in the samples.

3. The three-dimensional object detection method based on the votnet model as claimed in claim 2, wherein: the point cloud data comes from a variety of sensors.

4. The three-dimensional object detection method based on the votnet model as set forth in claim 1, wherein: in the fourth step, the other loss functions used for training the votnet model refer to foreground and background classification loss functions, center offset loss functions, size offset loss functions and orientation angle offset loss functions of the volnet model.

5. The three-dimensional object detection method based on the votnet model as set forth in claim 1, wherein: the training in the fifth step refers to obtaining optimized volntet model parameters by using a back propagation algorithm.

6. The three-dimensional object detection method based on the votnet model as set forth in claim 1, wherein: the point cloud data in the sixth step refers to point cloud data generated by projecting depth image pixels generated by the RGB-D camera into a three-dimensional space according to internal parameters of the RGB-D camera.

7. A three-dimensional target detecting system based on a votnet model is characterized in that: comprising the following steps:

the volntet model building module is used for building a volntet model;

the seed position displacement loss function construction module is used for constructing a seed position displacement loss function based on double-layer nested three-dimensional rectangular frame space division for training a volntet model; for each training sample, the seed point displacement loss function is implemented by the sub-steps of:

(3.3) setting the total number of seed points to n _s Traversing the whole seed points, for the j' th E [1, n ] _s ]Seed points P _j Set seed point P _j The coordinates in the three-dimensional space are (x _j ,y _j ,z _j ) For seed point P _j Real three-dimensional rectangular bounding Box for traversing whole targets _i ,i∈[1,n _t ]When the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Box at a certain real three-dimensional rectangular bounding Box _i Generates a corresponding target point T for the seed point when it is on the inner or outer surface of the seed _j The target point T _j Is equal to the coordinate value of the real three-dimensional rectangular bounding Box _i Is a center point coordinate value (x) _i ,y _i ,z _i ) If the seed point P _j Coordinates (x) _j ,y _j ,z _j ) Inside or outside a plurality of real three-dimensional rectangular bounding boxes, then the target point T _j Is equal to the coordinate value of the real three-dimensional rectangular bounding boxDistance from seed point P in heart point _j The nearest three-dimensional rectangle encloses the center point coordinates of the frame;

8. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements a method for three-dimensional object detection based on a votnet model as claimed in any one of claims 1 to 6.