CN111251295A

CN111251295A - Visual mechanical arm grabbing method and device applied to parameterized parts

Info

Publication number: CN111251295A
Application number: CN202010048562.9A
Authority: CN
Inventors: 曾龙; 林垟钵; 董至恺; 俞佳熠; 赵嘉宇
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Guangzhou Fuwei Intelligent Technology Co ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-09
Anticipated expiration: 2040-01-16
Also published as: CN111251295B

Abstract

A visual mechanical arm grabbing method and device applied to parameterized parts are disclosed, and the method comprises the following steps: s1, obtaining scene point cloud of the parameterized part, and removing background information to obtain target point cloud; s2, inputting the target point cloud into a parameterized point cloud deep neural network, and mapping the target point cloud into a target point in a feature vector space according to a descriptor mapping function generated by the parameterized point cloud deep neural network; s3, according to manifold representing different types of part families built in the feature vector space in advance, by calculating the feature point of the manifold surface closest to the target point, specific parameter values of the target object and the corresponding target template are obtained; s4, acquiring 6D pose information of the parameterized part through an alignment algorithm according to the target template; and S5, transferring the 6D pose information of the parameterized part to a mechanical arm control system, and realizing grabbing of the parameterized part. The invention is suitable for grabbing parts of different part families, saves a large amount of calculation time and has strong robustness and universality.

Description

Visual mechanical arm grabbing method and device applied to parameterized parts

Technical Field

The invention relates to visual grabbing, in particular to a method and a device for grabbing a visual mechanical arm applied to parameterized parts.

Background

The demand of China for robots is increasing day by day, and robots are being widely applied to the field of part grabbing. However, most of the robots currently put into use in society lack intelligence, and are difficult to adapt to various and small-batch production modes and unstructured industrial environments in industry.

The vision grabbing refers to grabbing a target object by guiding a robot through a vision system, and CN201511005603 discloses a method for grabbing by the vision robot based on a product information label. CN201810034599 discloses a robot grabbing method based on depth vision, which utilizes an edge detection operator to obtain a gradient map of a depth image, selects an optimal grabbing point, and calculates a motion trajectory by inverse kinematics to realize a grabbing process. CN201811256263 discloses a robot grabbing method based on decision feedback, which inputs the type, position and three-dimensional posture of an object into a grabbing action decision model to obtain a grabbing strategy, and feeds back sensor data to know whether grabbing is successful or not and adjusts the grabbing strategy until the object is successfully grabbed. CN201910069683 discloses a robot grabbing system and method, in which a binocular camera calculates three-pole coordinates of an object, converts the three-pole coordinates into control coordinates of a robot arm, and reaches a target position through a middle point coordinate and judges whether a position error is smaller than a threshold value. CN201910275513 discloses a robot vision grasping method and device, which calculate a mapping matrix and tcp offset by calibrating a picture, obtain a central point and an angle of a workpiece to be grasped, and finally implement grasping action.

In a practical industrial environment, part families are typically parameterized, i.e., have uniform parameterized template types, but the parameter values of the parts are not the same, thus producing parts of the same class in multiple size scales. Since most of the commonly used grabbing methods in the prior art grab a scene for a single object with a fixed template, all the part templates in the whole part family need to be constructed in advance, and the workload is large. And different part family types have different characteristic distributions, and the traditional visual identification and grabbing method is difficult to be suitable for object models of different part families.

Disclosure of Invention

In order to overcome at least one of the technical defects, the invention provides a visual mechanical arm grabbing method and device applied to parameterized parts.

In order to achieve the purpose, the invention adopts the following technical scheme:

a visual mechanical arm grabbing method applied to parameterized parts comprises the following steps:

s1, obtaining scene point cloud of the parameterized part to be grabbed, and removing background information through preprocessing to obtain target point cloud;

s2, inputting the target point cloud into the trained parameterized point cloud deep neural network, and mapping the target point cloud into a target point in a feature vector space according to a descriptor mapping function D generated by the parameterized point cloud deep neural network;

s3, according to manifold representing different types of part families built in the feature vector space in advance, by calculating the feature point of the manifold surface closest to the target point, specific parameter values of the target object and corresponding target templates are obtained;

s4, acquiring 6D pose information of the parameterized part through an alignment algorithm according to the target template;

and S5, transmitting the 6D pose information of the parameterized part to a robot control system to control the robot to complete the grabbing task of the parameterized part.

Further:

in step S1, after the robot receives the capture instruction for the parameterized part, a scene point cloud of the parameterized part is acquired by the depth camera.

In step S1, the scene point cloud includes a target point cloud of a parameterized part to be captured and an environment background point cloud, and the target point cloud of the parameterized part is obtained by subtracting a pre-stored background point cloud from the scene point cloud.

In step S1, each point in the pre-stored background point cloud is used as a tree node to construct a KD tree, each point in the scene point cloud is used as a search point, a nearest neighbor point with a radius smaller than a set threshold is searched, and after the search is finished, the nearest neighbor point is subtracted from the corresponding tree node to remove the background point cloud.

In step S2, the parameterized deep neural network generates a descriptor mapping function D, so that after the target point cloud passes through the descriptor mapping function D, the same type of parts are closer to each other in a feature vector space, and different types of parts are farther from each other;

wherein the data set generation process of the parameterized deep neural network comprises the following steps:

(1) obtaining a series of part family obj template libraries through three-dimensional modeling;

(2) carrying out uniform sampling of fixed point number and sampling of farthest point on the surface of the obj model of the part family in sequence to obtain a series of point cloud template libraries of the part family, wherein each point cloud file in the point cloud template libraries of the part family belongs to the template point cloud;

(3) adding a single part family obj file to a Blender rendering engine, and performing multi-angle shooting by using a simulation camera to obtain a Blender rendering depth map;

(4) converting the rendered depth map into a point cloud map through internal reference and external reference conversion of the simulation camera, and sampling the farthest points to obtain a simulation multi-view point cloud map;

(5) the point cloud template library and the simulated multi-view point cloud image jointly form a data set of the parameterized deep neural network.

The overall loss function in the training process of the parameterized deep neural network comprises a classification loss function and a contrast loss function, wherein the classification loss function is used for classifying part families, the contrast loss function is used for realizing that parts of the same type are closer to each other and parts of different types are farther from each other in a feature vector space, and the expression of the contrast loss function is as follows:

where k represents the number of sample pairs, d_iRepresenting the Euclidean distance, Y, between two descriptors_iIndicates whether the labels of the ith pair of samples match, Y_i1 denotes match, Y_i0 denotes no match, margin denotes a set distance threshold;

and obtaining a descriptor mapping function D after the network training is finished, and constructing the manifolds of different parameterized part families offline through the mapping function in the feature vector space.

In step S3, each manifold represents a part family, and each point on the manifold surface represents a specific parameterized part of the part family, including part parameter information.

In step S4, according to the target point cloud and the target template, a closest point iterative algorithm (ICP algorithm) is used for registration, which specifically includes the following processes:

a1) searching a nearest point of each point in the initial point cloud in a template point cloud, wherein the template point cloud is a part obj file obtained based on three-dimensional modeling, and the point cloud file is obtained by uniformly sampling and sampling the farthest point on the surface of the part obj file;

a2) using direction vector threshold to eliminate error point and calculating translation vector T_iAnd a rotation matrix R_i；

a3) Finding a translation vector T which minimizes the mean square error of the distance between the target point cloud and the corresponding point of the target template_iAnd a rotation matrix R_i；

a4) According to a translation vector T_iAnd a rotation matrix R_iAfter the initial point cloud is rotated in a rotating and translating way, if the threshold requirement is met or the iteration number reaches the upper limit, the operation is finished, and if not, the operation is finishedReturn to a1) to continue the iteration.

In step S5, according to the 6D pose information of the parameterized part, the robot control system solves the robot motion trajectory through an inverse kinematics algorithm, wherein the end pose during grasping is converted from cartesian space into a manipulator joint space, then calculates the connection trajectory of the initial pose and the end pose of the manipulator in the joint space through a motion planning algorithm, controls the end of the manipulator to reach a target pose along the connection trajectory, closes the gripping jaws, and places the parameterized part at an assigned position.

A vision mechanical arm gripping device applied to parameterization parts comprises at least one memory and at least one processor;

the memory including at least one executable program stored therein;

the executable program, when executed by the processor, implements the method.

The invention has the following beneficial effects:

the invention provides a visual mechanical arm grabbing method applied to parameterized parts, which combines a deep learning technology with a traditional parameterization technology to realize a parameterized part-oriented visual grabbing task, can quickly obtain part parameter information and a template in a mode of mapping a part family to a feature vector space and calculating the closest point of the space, and grabs through template matching, skillfully solves the problem that parts of unknown templates are difficult to grab, and saves a large amount of calculation time. The invention can utilize the deep neural network to have strong learning ability, can learn the parameter information of any part family, generate various types of template data, and the model can be generalized to different types of part families, and the parameterized part retrieval method can be suitable for different types of part families, so that the method has strong robustness and universality.

Compared with the traditional technology, the invention has the following differences and advantages:

1. the traditional technology mainly aims at the grabbing task of a part with a specific size, and the method is suitable for the grabbing task of various parameterized part families.

2. The traditional technology mainly obtains parameter information through information labels such as RFID (radio frequency identification devices) and the like, or obtains part edge information through a traditional detection operator, and the invention obtains the parameter information of parts by using a parameterized deep neural network.

3. The point cloud template matching method in the traditional technology needs to calculate the distance between all points of the point cloud template and the target point cloud, and the calculation amount is large. The invention searches the best matching template by calculating the distance between the characteristic point in the characteristic vector space and the manifold, thereby saving a great deal of time.

4. The traditional technology needs specific preprocessing on different object models, and the method has generalization capability on different object models.

Drawings

Fig. 1 is a flowchart of a method for grabbing a vision manipulator applied to a parameterized part according to an embodiment of the present invention:

FIG. 2 is a diagram of a parameterized deep neural network dataset generation process in one embodiment of the invention;

FIG. 3 is a diagram of a parameterized deep neural network training process in one embodiment of the present invention;

FIG. 4 is a diagram illustrating a process of obtaining a target template by a parameterized deep neural network in an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.

In an automated assembly line, it is required to visually grasp various industrial parts from a material frame and place them in a substantially correct posture on a part posture adjuster. The real-life industrial parts are basically parameterized, namely uniform part templates are provided, but specific dimension values are different. The embodiment of the invention realizes the vision mechanical arm grabbing of the parameterized part based on deep learning.

Fig. 1 is a flowchart of a method for grabbing a vision robot arm applied to a parameterized part according to an embodiment of the present invention. The vision mechanical arm grabbing method provided by the embodiment of the invention can realize the grabbing task of the parameterized part, and comprises the following five steps. The method comprises the steps of firstly, obtaining scene point cloud of a parameterized part, and removing background information through preprocessing to obtain target point cloud. And secondly, inputting the target point cloud into a parameterized point cloud deep neural network, and mapping the target point cloud into a target point in a feature vector space according to the generated descriptor mapping function D. And thirdly, the network pre-establishes the manifold of a part family of a certain category in a feature vector space through off-line training, wherein the manifold is a space with the Euclidean space property locally and is commonly used for describing geometric shapes. In a preferred embodiment, each point on the fluid is a mapping in feature vector space of a specific point cloud template in the library of part family point cloud templates. Therefore, in the step, the specific parameter value and the target template of the parameterized part can be obtained by calculating the characteristic point closest to the target point in the manifold surface. And fourthly, acquiring the 6D pose of the parameterized part through alignment operation according to the target template. And fifthly, the upper computer transmits the 6D position and posture information of the part to a robot control system, and the motion trail of the robot is solved through inverse kinematics to complete the grabbing task.

In the invention, part retrieval is carried out through a parameterized deep neural network, the size information and the category information of parameterized parts are obtained, and a parameterized target template is determined. The deep learning can calculate a descriptor mapping function by utilizing a deep neural network, and the descriptors enable part families in the same category to realize clustering in a feature space; by calculating the distance in the feature vector space, the dimensional parameters of the parameterized part are quickly estimated. And determining a part template according to the size parameters of the parameterized part for pose estimation of the parameterized part. After the parameterized template is determined, template matching is carried out through an alignment algorithm, pose information of parameterized parts is obtained, the robot is controlled, and a robot mechanical arm is guided to realize a grabbing task. The parameterized part retrieval method can be suitable for different part family types and has strong generalization capability.

Specific embodiments of the invention are further described below in conjunction with the following description.

The specific process of the embodiment of the invention comprises the following steps:

(1) after the mechanical arm receives a grabbing instruction, the system acquires scene point clouds of parameterized parts through a depth camera, background elimination is realized through a preprocessing method, and the part point clouds are filtered and down-sampled to a fixed point number to obtain target point clouds;

(2) inputting the target point cloud into a trained parameterized deep neural network, and mapping the target point cloud into a target point in a feature vector space according to the generated descriptor mapping function D;

(3) and according to the manifold representing the part families of different types established in the feature vector space in advance, the specific parameter values of the target object and the corresponding target template are obtained by calculating the feature points of the manifold surface closest to the target point.

(4) Carrying out alignment operation according to the target template to obtain the 6D position and posture information of the parameterized part;

(5) and the upper computer transmits the position and posture information of the parameterized part to a robot control system, calculates a connecting track of an initial pose and a tail end pose in a joint space through a motion planning algorithm, enables the tail end of the mechanical arm to reach a target pose along the connecting track, closes the clamping jaw and places the parameterized part at a specified position, and finishes a grabbing task.

Scene point cloud obtaining and preprocessing method

Inputting: background point cloud and scene point cloud

And (3) outputting: background-removed scene point cloud

The input of the invention is a scene point cloud and a background point cloud provided by a depth camera in the system, or the scene point cloud and the background point cloud generated by depth map conversion, wherein the background point cloud refers to a point cloud map obtained when no part is placed. The scene point cloud comprises a target point cloud and an environment background point cloud of the parameterized part to be grabbed, and the background point cloud needs to be eliminated. The elimination method may be to subtract a background point cloud stored in advance from the scene point cloud.

And constructing the KD tree by taking each point in the background point cloud as a tree node, and searching the nearest neighbor point with the radius smaller than a threshold value by taking each point in the scene point cloud as a search point. After the search is finished, the nearest neighbor point is subtracted from the corresponding tree node, so that the background removal effect is achieved. And then carrying out point cloud filtering processing to eliminate external environment influences such as noise and the like. The filtering methods include but are not limited to: straight-through filtering, voxel filtering, statistical filtering, radius filtering. The point cloud is then sampled to a fixed point number using sampling methods including, but not limited to: random sampling, uniform sampling, farthest point sampling (FPS sampling).

Second, descriptor computation using parameterized deep neural network

Inputting: target point cloud after background removal

And (3) outputting: aiming at a target point cloud file, outputting a target point in a feature vector space according to a descriptor mapping function D

In parameter prediction of parameterized parts, a descriptor mapping function D generated by a metric learning network framework is utilized, so that point cloud information is closer to parts of the same type in a feature vector space and farther from parts of different types after passing through the descriptor mapping function D. The feature extraction networks employed include, but are not limited to: pointnet, Pointnet + +, PointSIFT, and the like.

The data set generation process of the parameterized deep neural network is shown in fig. 2 and is developed according to the following flow:

(1) firstly, obtaining a series of part family obj template libraries through three-dimensional modeling;

(2) then, carrying out uniform sampling of fixed point number and sampling of farthest point on the surface of the obj model of the part family to obtain a series of point cloud template libraries of the part family, wherein each point cloud file in the point cloud template libraries of the part family belongs to the template point cloud;

The training process of the parameterized deep neural network is shown in fig. 3, and the overall loss function is composed of two parts, namely a classification loss function and a contrast loss function. Wherein the classification loss function is used for classification of the part family, including but not limited to a cross entropy loss function. The contrast loss function is used for realizing that parts of the same type are relatively close to each other and parts of different types are relatively far from each other in a feature vector space, and the expression of the contrast loss function is as follows:

where k represents the number of sample pairs, d_iRepresenting the Euclidean distance, Y, between two descriptors_iIndicates whether the labels of the ith pair of samples match, Y_i1 denotes match, Y_i0 indicates a mismatch, and margin indicates a set distance threshold.

In the prior art, a convolutional neural network for point cloud is generally used for classifying the whole point cloud data, i.e. determining which kind of article the input point cloud belongs to. In the embodiment of the invention, point cloud features are extracted by using a point cloud convolution neural network aiming at parameterized parts, and the feature extraction is restricted by adopting a classification loss function and a contrast loss function, so that the part families of the same type are relatively close to each other and the part families of different types are relatively far from each other in a feature vector space, wherein the distance under a measurement feature space is measured by Euclidean distance. And obtaining a descriptor mapping function D after the network training is finished, and constructing the manifolds of different parameterized part families offline through the mapping function in the feature vector space.

Thirdly, obtaining a target template

Inputting: target points and manifolds in feature vector space

And (3) outputting: characteristic point nearest to target point in manifold surface and target template

The process of obtaining the target template by the parameterized deep neural network is shown in fig. 4, and in the feature vector space, manifolds of different types of part families are obtained by offline training in advance, wherein one manifold only represents one part family, and each point on the manifold represents a specific parameterized part of the part family and contains detailed part parameter information. And acquiring specific parameter values of the target object and a corresponding target template by calculating the characteristic point of the manifold surface closest to the target point in the characteristic vector space, wherein the distance is measured by Euclidean distance.

Fourth, parameterized part pose estimation

Inputting: target point cloud and target template of parameterized part after background removal

And (3) outputting: parameterizing position and attitude of parts within a scene

After obtaining the target point cloud and the target template, performing registration by adopting a closest point iterative algorithm (ICP algorithm), wherein the specific operation flow is as follows:

a1) searching a nearest point of each point in the initial point cloud in the template point cloud, wherein the template point cloud is based on a part obj file obtained through three-dimensional modeling, the point cloud file obtained after uniform sampling and farthest point sampling are carried out on the surface of the part obj file can be used for subsequent part pose estimation, and each point cloud file in a part family point cloud template library belongs to the template point cloud;

a4) According to a translation vector T_iAnd a rotation matrix R_iAfter the initial point cloud is rotated by the rotational translation, if the threshold requirement is met or the iteration number reaches the upper limit, ending, otherwise, returning to a1) and continuing the iteration.

The ICP algorithm needs to adjust key parameters such as the maximum correlation distance and the maximum iteration number during operation.

Fifth, mechanical arm motion planning

Inputting: pose information and scene obstacle information of parameterized part

And (3) outputting: motion trail of mechanical arm

Firstly, inverse kinematics solution is required to be carried out, the terminal pose during grabbing is converted into a mechanical arm joint space from a Cartesian space, then the connecting track of the initial pose and the terminal pose in the mechanical arm joint space is calculated through a motion planning algorithm, and the motion planning algorithm comprises but is not limited to: fast read extended random tree algorithm (RRT algorithm), random road marking method (PRM), a-x algorithm. And in the calculation process, a collision detection tool is used for ensuring that the mechanical arm does not collide in the grabbing process. And the tail end of the mechanical arm reaches the target pose along the connecting track, the clamping jaw is closed, the parameterized part is placed at the designated position, and the grabbing task is completed.

The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.

The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims

1. A visual mechanical arm grabbing method applied to parameterized parts is characterized by comprising the following steps:

2. The method of claim 1, wherein in step S1, after the robot receives the grabbing instruction for the parameterized part, a scene point cloud of the parameterized part is acquired by a depth camera.

3. The method according to any one of claims 1 to 2, wherein in step S1, the scene point cloud comprises a target point cloud of a parameterized part to be grabbed and an environmental background point cloud, and the target point cloud of the parameterized part is obtained by subtracting a pre-stored background point cloud from the scene point cloud.

4. The method according to claim 3, wherein in step S1, each point in the pre-stored background point cloud is used as a tree node to construct a KD tree, each point in the scene point cloud is used as a search point to search for a nearest neighbor point with a radius smaller than a set threshold, and after the search is finished, the nearest neighbor point is subtracted from the corresponding tree node to remove the background point cloud.

5. The method according to any one of claims 1 to 4, wherein in step S2, the parameterized deep neural network generates a descriptor mapping function D such that the target point cloud, after passing through the descriptor mapping function D, has parts of the same type within a feature vector space that are closer together and parts of different types that are further apart;

6. The method of any one of claims 1 to 5, wherein the overall loss function in the training process of the parameterized deep neural network comprises a classification loss function and a contrast loss function, wherein the classification loss function is used for classification of part families, the contrast loss function is used for realizing that parts of the same type are closer to each other and parts of different types are farther from each other in a feature vector space, and the expression of the contrast loss function is as follows:

and after the network training is finished, a descriptor mapping function D is obtained, and in a feature vector space, the manifold of different parameterized part families can be constructed offline through the mapping function.

7. A method according to any one of claims 1 to 6, wherein in step S3, each manifold represents a family of parts, each point on the manifold surface representing a particular parameterized part of the family, containing part parameter information.

8. The method according to any one of claims 1 to 7, wherein in step S4, performing registration by using a closest point iterative algorithm according to the target point cloud and the target template, specifically comprising the following steps:

9. The method according to any one of claims 1 to 8, wherein in step S5, based on the position information of the parameterized part in 6D, the robot control system solves the robot motion trajectory by an inverse kinematics algorithm, wherein the end position at the time of grabbing is converted from cartesian space into a robot arm joint space, then calculates the joint trajectory of the robot arm between the initial position and the end position in the joint space by a motion planning algorithm, controls the end of the robot arm to reach the target position along the joint trajectory, closes the clamping jaws, and places the parameterized part at the specified position.

10. A vision mechanical arm gripping device applied to parameterization parts is characterized by comprising at least one memory and at least one processor;

the memory including at least one executable program stored therein;

the executable program, when executed by the processor, implementing the method of any one of claims 1 to 9.