CN113223091A

CN113223091A - Three-dimensional target detection method, three-dimensional target capture device and electronic equipment

Info

Publication number: CN113223091A
Application number: CN202110473106.3A
Authority: CN
Inventors: 刘亦芃; 杜国光; 赵开勇
Original assignee: Cloudminds Robotics Co Ltd
Current assignee: Cloudminds Robotics Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-08-06
Anticipated expiration: 2041-04-29
Also published as: WO2022227678A1; CN113223091B

Abstract

The embodiment of the invention relates to the technical field of computer vision, and discloses a three-dimensional target detection method, a three-dimensional target grabbing device and electronic equipment. The method comprises the following steps: acquiring a depth image containing a target recognition object; generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, wherein the camera point cloud is a point cloud under a camera coordinate system; converting the camera point cloud into a world point cloud, wherein the world point cloud is a point cloud under a world coordinate system; carrying out target detection on the world point cloud according to a preset target identification model so as to generate a bounding box of the target identification object under a world coordinate system; and generating the bounding box of the target identification object in the camera coordinate system according to the bounding box of the target identification object in the world coordinate system. Through the mode, the detection quality of the three-dimensional target is improved.

Description

Three-dimensional target detection method, three-dimensional target capture device and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a three-dimensional target detection method, a three-dimensional target grabbing device and electronic equipment.

Background

Three-dimensional target detection refers to a technique of detecting three-dimensional space coordinates of an object. In the field of automatic driving, the collision of the vehicle can be avoided by controlling the vehicle through three-dimensional target detection; in the field of service robots, objects can be accurately grabbed through three-dimensional target detection.

Three-dimensional target detection generally outputs bounding boxes, categories, and corresponding confidences of target identifiers according to input point cloud data. However, in the related art, when detecting a three-dimensional object, it is generally required to obtain external parameters of a camera, and convert point cloud data in a camera coordinate system into point cloud data in a world coordinate system according to the external parameters of the camera. When the external parameters of the camera cannot be acquired, the accuracy of detecting the three-dimensional target by the related technology is low.

Disclosure of Invention

In view of the above problems, embodiments of the present invention provide a three-dimensional target detection method, a three-dimensional target capture device, and an electronic device, so as to solve the problem in the prior art that the three-dimensional target detection accuracy is low.

According to an aspect of an embodiment of the present invention, there is provided a three-dimensional target detection method, including:

acquiring a depth image containing a target recognition object;

generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, wherein the camera point cloud is a point cloud under a camera coordinate system;

converting the camera point cloud into a world point cloud, wherein the world point cloud is a point cloud under a world coordinate system;

carrying out target detection on the world point cloud according to a preset target identification model so as to generate a bounding box of the target identification object under a world coordinate system;

and generating the bounding box of the target identification object in the camera coordinate system according to the bounding box of the target identification object in the world coordinate system.

In an alternative approach, the converting the camera point cloud to a world point cloud comprises:

registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from a camera coordinate system to a world coordinate system;

converting the camera point cloud to a world point cloud according to the transformation matrix.

In an alternative manner, the registering the camera point cloud with a preset plane point cloud to generate a transformation matrix from a camera coordinate system to a world coordinate system includes:

respectively calculating the mean values of the camera point clouds in three dimensions;

constructing a homogeneous transformation matrix according to the mean value, and setting the homogeneous transformation matrix as an initial value of an iterative closest point algorithm;

and generating a transformation matrix from a camera coordinate system to a world coordinate system according to the iterative closest point algorithm and the plane point cloud vertical to the gravity axis.

In an alternative approach, the converting the camera point cloud to a world point cloud according to the transformation matrix comprises:

determining a rotation matrix corresponding to the transformation matrix;

if the rotation angle corresponding to the rotation matrix is larger than 90 degrees, generating a world point cloud according to the rotation matrix and the camera point cloud;

and if the rotation angle corresponding to the rotation matrix is not more than 90 degrees, generating the world point cloud according to the complementary angle rotation amount corresponding to the rotation matrix and the camera point cloud.

In an optional manner, the method further comprises:

constructing a point cloud data training set, wherein the point cloud data training set comprises a plurality of groups of world point cloud data and label information corresponding to each group of world point cloud data;

and training a preset target recognition algorithm by using the point cloud data training set to generate the target recognition model.

In an alternative approach, the constructing a training set of point cloud data includes:

constructing a three-dimensional model library, wherein the three-dimensional model library comprises three-dimensional models of a plurality of identification objects;

after each identification object is placed in a world coordinate system, calculating an initial value of a bounding box of each identification object;

carrying out simulation arrangement on each identification object, and calculating a bounding box simulation value of each identification object at a simulation position;

randomly generating a camera view angle, and rendering based on the camera view angle to generate camera point cloud data of each identification object;

converting the camera point cloud data of each identification object into corresponding world point cloud data;

and adding label information to the corresponding world point cloud data.

According to another aspect of the embodiments of the present invention, there is provided a three-dimensional object capturing method, including the three-dimensional object detecting method described above, the three-dimensional object capturing method further includes:

determining the space position of the target recognition object according to the bounding box of the target recognition object under the camera coordinate system;

and generating a grabbing instruction according to the space position, so that the grabber grabs the target recognition object according to the grabbing instruction.

According to another aspect of the embodiments of the present invention, there is provided a three-dimensional object detecting apparatus, the apparatus including:

the acquisition module is used for acquiring a depth image containing a target recognition object;

the first generation module is used for generating a camera point cloud corresponding to the depth image according to the depth image and camera internal parameters, and the camera point cloud is a point cloud under a camera coordinate system;

the conversion module is used for converting the camera point cloud into a world point cloud, and the world point cloud is a point cloud under a world coordinate system;

the second generation module is used for carrying out target detection on the world point cloud according to a preset target identification model so as to generate a bounding box of the target identification object under a world coordinate system;

and the third generation module is used for generating the bounding box of the target identification object in the camera coordinate system according to the bounding box of the target identification object in the world coordinate system.

According to another aspect of the embodiments of the present invention, there is provided an electronic device including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the three-dimensional object detection method or the three-dimensional object grabbing method.

According to another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium, in which at least one executable instruction is stored, and when the executable instruction is executed on an electronic device, the electronic device executes the operations of the three-dimensional object detection method or the three-dimensional object capture method.

According to the embodiment of the invention, the camera point cloud corresponding to the depth image can be generated through the depth image and the camera internal reference; after the camera point cloud is converted into the world point cloud, target detection can be performed on the world point cloud according to a preset target identification model so as to generate a bounding box of a target identification object under a world coordinate system; furthermore, the bounding box of the target identification object in the camera coordinate system can be generated according to the bounding box of the target identification object in the world coordinate system, so that the detection of the target identification object is completed. Therefore, the method and the device can generate the bounding box of the target identification object under the camera coordinate system based on the camera point cloud under the condition that the camera external reference is not acquired, and can improve the detection accuracy of the target identification object.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.

Drawings

The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flow chart illustrating a three-dimensional target detection method according to an embodiment of the present invention;

fig. 2(a) is a schematic diagram illustrating an identifier placement scene and a corresponding simulated camera position according to an embodiment of the present invention;

FIG. 2(b) is a schematic diagram showing rendering effects of the camera in FIG. 2 (a);

fig. 3(a) is a schematic diagram illustrating another identifier placement scene and a corresponding simulated camera position according to an embodiment of the present invention;

FIG. 3(b) is a schematic diagram showing rendering effects of the camera in FIG. 3 (a);

fig. 4 is a schematic flow chart illustrating a three-dimensional object capture method according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a three-dimensional object detection apparatus provided in an embodiment of the present invention;

fig. 6 shows a schematic structural diagram of an electronic device provided in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein.

Fig. 1 shows a flow chart of a three-dimensional object detection method according to an embodiment of the invention, which is performed by an electronic device. The memory of the electronic device is used for storing at least one executable instruction, and the executable instruction enables the processor of the electronic device to execute the operation of the three-dimensional target detection method. The electronic device may be a robot, car, computer or other terminal device. As shown in fig. 1, the method comprises the steps of:

step 110: and acquiring a depth image containing the target recognition object.

The depth image may be an RGBD image, i.e., a depth image of an RGB color mode. The target recognition object in the depth image is a recognition object which needs target detection. The object identifiers can be, for example, cups, bottles, fruits, etc. Generally, a depth image including a target recognition object can be acquired by shooting a scene including the target recognition object by a depth camera.

Step 120: and generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, wherein the camera point cloud is a point cloud under a camera coordinate system.

And generating a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, wherein the camera point cloud is a point cloud under a camera coordinate system. The camera internal parameter is a parameter related to a characteristic of a camera that captures a depth image, and generally includes a focal length, a pixel size, and the like of the camera.

Step 130: and converting the camera point cloud into a world point cloud, wherein the world point cloud is a point cloud under a world coordinate system.

In an alternative manner, the camera point cloud and the preset plane point cloud may be registered to generate a transformation matrix from the camera coordinate system to the world coordinate system, and the camera point cloud is converted into the world point cloud according to the transformation matrix. In order to obtain a transformation matrix, the mean values of the camera point cloud in three dimensions can be respectively calculated, a homogeneous transformation matrix is constructed according to the mean values, the homogeneous transformation matrix is set as an initial value of an iterative closest point algorithm, and the transformation matrix from a camera coordinate system to a world coordinate system is generated according to the iterative closest point algorithm and the plane point cloud perpendicular to the gravity axis.

For example, the mean value of the camera point cloud in each dimension in the three-dimensional space is first calculated

And

then construct homogeneous transformation matrix

As an initial value of the iterative closest point algorithm; and generating a plane point cloud perpendicular to a gravity axis (z axis) of the world coordinate system to obtain a transformation matrix from the camera point cloud to the plane point cloud, and converting the camera point cloud into the world point cloud through the transformation matrix.

In an optional manner, when the camera point cloud is converted into the world point cloud according to the transformation matrix, first, a rotation matrix corresponding to the transformation matrix is determined, if a rotation angle corresponding to the rotation matrix is greater than 90 degrees, the world point cloud is generated according to the rotation matrix and the camera point cloud, and if the rotation angle corresponding to the rotation matrix is not greater than 90 degrees, the world point cloud is generated according to a complementary angle rotation amount corresponding to the rotation matrix and the camera point cloud. For example, if the rotation angle does not exceed 90 degrees, the difference between 180 degrees and the rotation angle is taken as the rotation angle of the rotation matrix.

Step 140: and carrying out target detection on the world point cloud according to a preset target identification model so as to generate a bounding box of the target identification object under a world coordinate system.

The target detection can be carried out on the world point cloud according to a preset target recognition model so as to generate a bounding box of the target identification object under the world coordinate system. Bounding boxes, i.e. bounding minimum rectangles, are algorithms for solving the optimal bounding space of a set of discrete points, the basic idea being to approximately replace complex geometric objects with slightly larger and characteristically simple geometries (bounding boxes). The bounding boxes of the target identifiers may be, for example, AABB bounding boxes, bounding balls, directional bounding boxes OBB, and fixed directional convex hulls FDH. Before target detection is performed on the world point cloud according to a preset target recognition model, a target recognition algorithm can be trained based on deep learning to generate a target recognition model. The training process of the target recognition algorithm is explained in detail below.

Before training a target recognition algorithm, a point cloud data training set is required to be constructed, wherein the point cloud data training set comprises a plurality of groups of world point cloud data and label information corresponding to each group of world point cloud data. And training a preset target recognition algorithm by using the point cloud data training set to generate a target recognition model. In one embodiment of the invention, the target recognition model may be, for example, a Vote Net network (three-dimensional target detection network). The Vote Net network is an end-to-end 3D object detection network based on the cooperation of deep point set network and Hough voting.

In an alternative way, a point cloud data training set can be constructed by:

and constructing a three-dimensional model library, wherein the three-dimensional model library comprises a plurality of three-dimensional models of the identification objects, and each identification object is arranged to a world coordinate system (the x axis is towards the right, the y axis is towards the front, and the z axis is towards the upper) so that the long axis corresponds to the y axis, the width corresponds to the x axis, and the height corresponds to the z axis when the object is vertically placed. The bounding box for each recognition can then be calculated using principal component analysis. Further, a recognition object placing scene for simulation is constructed, each recognition object is placed at the simulation position under the placing scene, and a bounding box of each recognition object at the simulation position is calculated. If the placing position comprises a plurality of identification objects, collision detection can be carried out to ensure that the identification objects do not collide with each other. The placing position is a space position of each identification object in a preset space range under a world coordinate system, after the identification objects are placed to the world coordinate system, the initial position of the identification objects is determined, and the placing position of the identification objects is determined through a translation matrix and a rotation matrix, wherein the rotation matrix is a rotation matrix around a z axis. Further, a plurality of camera view angles can be randomly generated, the world point cloud data is rendered based on each camera view angle to generate camera point cloud data of a corresponding camera view angle of each identification object, and identification object categories corresponding to the camera point cloud data, the centroid, the length, the width, the height of a corresponding bounding box and a rotation angle around the z axis are stored.

Fig. 2(a) shows a schematic diagram of an object placement scene and a corresponding simulated camera position provided by an embodiment of the present invention, and fig. 2(b) shows a schematic diagram of a rendering effect of the camera in fig. 2 (a); in the object placement scene in fig. 2(a), a camera view angle is randomly generated, and the object point cloud data in the world coordinate system is rendered based on the camera view angle, so that the rendering effect in fig. 2(b) can be obtained. Similarly, fig. 3(a) shows a schematic diagram of another object placement scene and a corresponding simulated camera position provided by the embodiment of the present invention, and fig. 3(b) shows a schematic diagram of a rendering effect of the camera in fig. 3 (a); in the object placement scene in fig. 3(a), a camera view angle is randomly generated, and the object point cloud data in the world coordinate system is rendered based on the camera view angle, so that the rendering effect in fig. 3(b) can be obtained. It should be noted that, for any object placement scene, multiple camera view angles may be randomly generated, and the world point cloud of the identification object is rendered based on each camera view angle, so as to obtain the camera point cloud under the corresponding camera view angle.

The following describes a process of calculating a bounding box of a recognition object by a principal component analysis method using a formula.

Let M be a 3 × n matrix representing the coordinates of the point clouds in the three-dimensional space, and n be the number of point clouds. Let mean (M) denote the matrix formed by the mean values of M in three dimensions, i.e. mean (M) is also a 3 × n matrix, the elements of each row are equal, and the elements of each row are equal to the mean value of M in the corresponding dimension. Definition of

Computing

The covariance matrix Corr of (a) is,

and the feature value a and the feature vector V of Corr are determined so that CorrV becomes AV. Further, the column vectors of the feature vectors V are rearranged to obtain feature vectors V' corresponding to 6 different placement modes of the identification object.

Further, by calculating M ' ═ V ' M, corrected point clouds M ' of 6 different placement states of the recognition object can be obtained. After M 'is translated to the origin, i.e., M' -mean (M '), the bounding box B of the corrected point cloud M' can be calculated. Wherein xmin, ymin and zmin are minimum values of the correction point cloud M 'in the x-axis direction, the y-axis direction and the z-axis direction, respectively, and xmax, ymax and zmax are maximum values of the correction point cloud M' in the x-axis direction, the y-axis direction and the z-axis direction, respectively.

By rotating the matrix

And the translation matrix t ═ t_x，t_y,t_z]^TThe correction point cloud M 'may be randomly placed and then updated to RM' + t. Wherein theta is the rotation angle of the correction point cloud M' around the z-axis, t_x、t_yAnd t_zThe translation amounts of the correction point cloud M' in the x-axis, y-axis and z-axis, respectively.

The following describes a process of randomly generating a camera view angle and rendering a point cloud under a world coordinate system based on the camera view angle through a formula.

Wherein a position matrix C of the virtual camera can be set_P＝[x_p，y_p,z_p]^TFront orientation matrix C_f＝[x_f，y_f,z_f]^TAnd an upper orientation matrix C_t＝[x_t，y_t,z_t]^TThen it can be found that the left direction matrix of the camera is C_l＝[y_tz_f-z_ty_f，z_tx_f-x_tz_f,x_ty_f-y_tx_f]^T. The camera view angle of the virtual camera at the corresponding position can be determined through the front direction matrix, the upper direction matrix and the left direction matrix. Let T be_CIs a homogeneous transformation matrix of the camera coordinate system relative to the world coordinate system

Wherein the content of the first and second substances,

as a cameraThe external reference matrix of (a) is,

a transformation matrix for the direction of the camera coordinate system relative to the world coordinate system.

By solving the above linear equation

Further, by the pair T_CInverting to obtain a homogeneous transformation matrix of the world coordinate system relative to the camera coordinate system

Further, the camera point cloud coordinates M of the recognized object_CIs composed of

The embodiment of the invention selects the Vote Net network to be trained so as to obtain the target recognition model. Vote Net predicts rotation around a single axis only relatively well, so before training a Vote Net network based on deep learning, the camera point cloud of the recognizer needs to be transformed to the world point cloud, even if the direction of gravity is aligned with the-z axis. Further, the camera point cloud of the recognition object may be converted to a recognition object world point cloud based on an iterative closest point algorithm. The following describes a process of converting a camera point cloud of an identification object into a world point cloud.

In an alternative mode, the mean value of the camera point cloud of the identified object in each dimension in the three-dimensional space is calculated firstly

And

then, based on the mean value of each dimension

And

constructing homogeneous transformation matrices

As an initial value for the iterative closest point algorithm. Because the background desktop accounts for a large amount in the scene of placing the identification object, and the proportion of the point cloud corresponding to the background desktop is large, a plane point cloud perpendicular to the z axis is generated, plane registration can be performed by using an iterative closest point algorithm, a transformation matrix from the camera point cloud of the identification object to the plane point cloud is calculated, the transformation matrix comprises a translation matrix and a rotation matrix, and the rotation angle corresponding to the rotation matrix can be further determined.

In addition, since the default is a plan view when the robot grasps the object, (0,0,1)^TThe rotation angle of the vector should exceed 90 degrees; if (0,0,1)^TIf the rotation angle of the vector does not exceed 90 degrees, 180 degrees are compared with (0,0,1)^TThe difference between the rotation angles of the vectors is taken as the rotation angle of the rotation matrix. And finally, converting the camera point cloud into a world point cloud through a rotating matrix, namely, enabling the-z axis to be consistent with the gravity direction.

And converting the camera point cloud data of the camera view angle corresponding to each placing position into world point cloud data, and adding label information to the world point cloud data to construct a point cloud data training set. The tag information may include, for example, the category of the corresponding identifier, and the centroid, length, width, height, and rotation angle about the z-axis of the bounding box corresponding to the simulated location.

The Vote Net network takes world point cloud as input and outputs a 3D bounding box, confidence coefficient and category of the target recognition object in the actual placement scene. The three-dimensional target is detected through the Vote Net network, only coordinate information of the world point cloud is needed, dependence on density of the world point cloud is not high, and generalization performance is good. Although Vote Net has achieved good results in the task of 3D object detection in indoor scenes, it is true data of large indoor objects that it handles. In this specification, Vote Net is used to process simulation data, train with the simulation data, and detect a world point cloud obtained from real shot data. The geometric characteristics of the simulation data and the real shooting data are not greatly different, so that the feasibility of the embodiment of the invention is better.

The following describes the training of the Vote Net network based on a point cloud data training set.

When the Vote Net network is trained, firstly, 2.5D point clouds in a simulation scene are constructed according to similar densities, then shooting is carried out through a virtual camera, world point cloud data are generated according to camera point cloud data obtained through shooting, and label information of each world point cloud data is automatically obtained, so that the training speed of a target recognition model can be improved. And inputting the world point cloud data containing the label information into a Vote Net network for training, and determining the total number of training rounds according to the cloud amount of the points. After Vote Net network training is finished, three-dimensional target detection is carried out on the world point cloud processed by the iterative closest point algorithm, and a 3D bounding box, confidence coefficient and a recognizer category of a target recognizer corresponding to camera point cloud data can be obtained.

Step 150: and generating the bounding box of the target identification object in the camera coordinate system according to the bounding box of the target identification object in the world coordinate system.

The bounding box of the target identification object in the world coordinate system can be converted into the bounding box of the target identification object in the camera coordinate system according to the rotation matrix. Further, the bounding box matrix of the object identification object under the camera coordinate system can be obtained by right multiplying the rotation matrix by the bounding box matrix of the object identification object under the world coordinate system.

Fig. 4 is a flowchart illustrating a three-dimensional object capture method according to another embodiment of the invention, which is performed by an electronic device. The memory of the electronic device is used for storing at least one executable instruction, and the executable instruction enables the processor of the electronic device to execute the operation of the three-dimensional object grabbing method. As shown in fig. 4, the method comprises the steps of:

step 210: and determining the space position of the target recognition object according to the bounding box of the target recognition object under the camera coordinate system.

The space position of the target recognition object can be determined according to the bounding box of the target recognition object under the camera coordinate system. The spatial position of the target recognition object includes the spatial coordinates of the target recognition object and the rotation angle of the target recognition object in the three-dimensional space.

Step 220: and generating a grabbing instruction according to the space position, so that the grabber grabs the target recognition object according to the grabbing instruction.

The grabbing instruction can be generated according to the space position of the target recognition object, and the grabbing instruction is sent to the grabber used for grabbing the target recognition object. The grabber can determine a grabbing path of the target identification object according to the grabbing instruction, and grab the target identification object according to the grabbing path.

According to the embodiment of the invention, the bounding box of the target identification object under the camera coordinate system is generated based on the camera point cloud, the spatial position of the target identification object is determined according to the bounding box of the target identification object under the camera coordinate system, and the grabbing instruction is generated according to the spatial position, so that the grabber can accurately grab the target identification object according to the grabbing instruction.

Fig. 5 shows a schematic structural diagram of a three-dimensional target detection apparatus according to an embodiment of the present invention. As shown in fig. 5, the apparatus 300 includes: an acquisition module 310, a first generation module 320, a conversion module 330, a second generation module 340, and a third generation module 350.

The acquiring module 310 is configured to acquire a depth image including a target recognition object;

a first generating module 320, configured to generate a camera point cloud corresponding to the depth image according to the depth image and the camera internal parameters, where the camera point cloud is a point cloud under a camera coordinate system;

a conversion module 330, configured to convert the camera point cloud into a world point cloud, where the world point cloud is a point cloud under a world coordinate system;

the second generating module 340 is configured to perform target detection on the world point cloud according to a preset target identification model to generate a bounding box of the target identifier in a world coordinate system;

a third generating module 350, configured to generate a bounding box of the target identifier in the camera coordinate system according to the bounding box of the target identifier in the world coordinate system.

In an alternative manner, the conversion module 330 is configured to:

determining a rotation matrix corresponding to the transformation matrix;

In an optional manner, the apparatus 300 further comprises a training module for:

In an alternative, the training module is configured to:

and adding label information to the corresponding world point cloud data.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and a specific implementation of the electronic device is not limited in the specific embodiment of the present invention.

As shown in fig. 6, the electronic device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402 is configured to execute the program 410, and may specifically perform the relevant steps in the embodiment of the three-dimensional object detection method described above.

In particular, program 410 may include program code comprising computer-executable instructions.

The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may specifically be invoked by the processor 402 to cause the electronic device to perform the following operations:

acquiring a depth image containing a target recognition object;

In an alternative, the program 410 is invoked by the processor 402 to cause the electronic device to perform the following operations:

determining a rotation matrix corresponding to the transformation matrix;

and adding label information to the corresponding world point cloud data.

An embodiment of the present invention provides a computer-readable storage medium, where the storage medium stores at least one executable instruction, and when the executable instruction is executed on an electronic device, the electronic device is enabled to execute a three-dimensional object detection method in any method embodiment described above.

The embodiment of the invention provides a three-dimensional target detection device, which is used for executing the three-dimensional target detection method.

Embodiments of the present invention provide a computer program, which can be called by a processor to enable an electronic device to execute a three-dimensional object detection method in any of the above method embodiments.

Embodiments of the present invention provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when run on a computer, cause the computer to perform the three-dimensional object detection method in any of the above-mentioned method embodiments.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims

1. A method for three-dimensional object detection, the method comprising:

acquiring a depth image containing a target recognition object;

2. The method of claim 1, wherein the converting the camera point cloud to a world point cloud comprises:

3. The method of claim 2, wherein registering the camera point cloud with a pre-defined planar point cloud to generate a transformation matrix of a camera coordinate system to a world coordinate system comprises:

4. The method of claim 2 or 3, wherein the converting the camera point cloud to a world point cloud according to the transformation matrix comprises:

determining a rotation matrix corresponding to the transformation matrix;

5. The method of claim 1, further comprising:

6. The method of claim 5, wherein the constructing a training set of point cloud data comprises:

and adding label information to the corresponding world point cloud data.

7. A three-dimensional object grasping method comprising the three-dimensional object detecting method according to any one of claims 1 to 6, the three-dimensional object grasping method further comprising:

8. A three-dimensional object detection apparatus, characterized in that the apparatus comprises:

9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the three-dimensional object detection method according to any one of claims 1-6 or the three-dimensional object capture method according to claim 7.

10. A computer-readable storage medium having stored therein at least one executable instruction, which when run on an electronic device, causes the electronic device to perform the operations of the three-dimensional object detection method according to any one of claims 1 to 6 or the three-dimensional object capture method according to claim 7.